How to generate boxplots with the library seaborn

One of the first steps in analysing a dataset is the Data Exploration. The box plots is a standardized way of displaying the distribution of data. Boxplots gives information about the minimum, first quartile, median, third quartile, and maximum.

In this short post, I will focus on the vizualization of the data and the outliers using BoxPlot. To learn more on Data Exploration, check this very thorough post .

I will be using data from an assignment of the Machine Learning Nanodegree: customer segmentation. The dataset is made of 400 businesses: restaurants, retailers, etc…, with their purchasing pattern for Milk products, Grocery products, and more…

import seaborn as sns
#create "melted" dataframe
df = pd.DataFrame(columns=['variable', 'value'])
for col in log_data.columns:
df = df.append(pd.melt(log_data, value_vars=[col]))
print(df.shape)
# create the boxplot with data points overlay
plt.figure(figsize=(8,6))
sns.boxplot(x="value", y="variable", data=df, color="c")
sns.stripplot(x="value", y="variable", data=df, jitter=True,
size=4, alpha=.4, color=".3", linewidth=0)
sns.despine(trim=True)
view raw boxpplot.py hosted with ❤ by GitHub

You can also check out this post where Matplotlib was used to generate the boxplots. link