import pandas as pd# read datadf = pd.read_csv("/path/to/file.csv")# alternative: read from exceldf = pd.read_csv("/path/to/file.xlsx", sheet_name="sheet name")# sanity check: view the dataframedf.info()
Automated EDA with pandas-profiling
from ydata_profiling import ProfileReport# in a jupyter notebook cellProfileReport(df)# to use jupyter widgetsreport = ProfileReport(df)report.to_widgets()# to save to diskreport.to_file("/path/to/report.html")
Creating static plots with seaborn and matplotlib
Histogram
Used to view the distribution of a continuous variable
import seaborn as sns# simplest waysns.histplot(data=df, x="variable")# preferred wayimport matplotlib.pyplot as pltfig, ax = plt.subplots()sns.histplot(data=df, x="variable", ax=ax)plt.show()
Box plot
Used to view the distribution of a continuous variable
import seaborn as sns# simplest waysns.boxplot(data=df, x="variable")# preferred wayimport matplotlib.pyplot as pltfig, ax = plt.subplots()sns.boxplot(data=df, x="variable", ax=ax)plt.show()
Scatterplot
Used to explore the relationship between two continuous variables
import seaborn as sns# simplest waysns.scatterplot(data=df, x="independent_variable", y="dependent_variable")# preferred wayimport matplotlib.pyplot as pltfig, ax = plt.subplots()sns.scatterplot(data=df, x="independent_variable", y="dependent_variable", ax=ax)plt.show()
Line plot
Used to explore the relationship between a continuous vs a discrete variable
import seaborn as sns# simplest waysns.lineplot(data=df, x="independent_variable", y="dependent_variable")# preferred wayimport matplotlib.pyplot as pltfig, ax = plt.subplots()sns.lineplot(data=df, x="independent_variable", y="dependent_variable", ax=ax)plt.show()
Bar plot
Used to visualize a continuous variable vs a discrete/categorical variable
import seaborn as sns# simplest waysns.barplot(data=df, x="independent_variable", y="dependent_variable")# preferred wayimport matplotlib.pyplot as pltfig, ax = plt.subplots()sns.barplot(data=df, x="independent_variable", y="dependent_variable", ax=ax)plt.show()