👀
Crash Visualization
  • Welcome
  • Preface
    • Who the book is written for
    • How the book is organized
  • 1. Introduction of Data Visualization
    • 1.1 What is data visualization?
    • 1.2 Why does visualization matter?
  • 2. Tricks in Visualization
    • 2.1 Choose Appropriate Chart
    • 2.2 Features of Charts
      • 2.2.1 Table
      • 2.2.2 Column Chart
      • 2.2.3 Line Chart
      • 2.2.4 Pie Chart
      • 2.2.5 Scatter Chart
      • 2.2.6 Map Chart
    • 2.3 Misused Graph
    • 2.4 Tips in Visualization
  • 3. Matplotlib
    • 3.1 Basic Concepts
    • 3.2 Line Chart
    • 3.3 Area Chart
    • 3.4 Column Chart
    • 3.5 Histogram Chart
    • 3.6 Scatter Chart
    • 3.7 Lollipop Chart
    • 3.8 Pie Chart
    • 3.9 Venn Chart
    • 3.10 Waffle Chart
    • 3.11 Animation
  • 4. Seaborn
    • 4.1 Trends
    • 4.2 Ranking
      • 4.2.1 Barplot
      • 4.2.2 Boxplot
    • 4.3 Composition
      • 4.3.1 Stacked Chart
    • 4.4 Correlation
      • 4.4.1 Scatter Plot
      • 4.4.2 Linear Relationship
      • 4.4.3 Heatmap
      • 4.4.4 Pairplot
    • 4.5 Distribution
      • 4.5.1 Boxplot
      • 4.5.2 Violin plot
      • 4.5.3 Histogram plot
      • 4.5.4 Density plot
      • 4.5.5 Joint plot
  • 5. Bokeh
    • 5.1 Basic Plotting
    • 5.2 Data Sources
    • 5.3 Annotations
    • 5.4 Categorical Data
    • 5.5 Presentation and Layouts
    • 5.6 Linking and Interactions
    • 5.7 Network Graph
    • 5.8 Widgets
  • 6. Plotly
    • 6.1 Fundamental Concepts
      • 6.1.1 Plotly Express
      • 6.1.2 Plotly Graph Objects
    • 6.2 Advanced Charts
      • 6.2.1 Advanced Scatter Chart
      • 6.2.2 Advanced Bar Chart
      • 6.2.3 Advanced Pie Chart
      • 6.2.4 Advanced Heatmap
      • 6.2.5 Sankey Chart
      • 6.2.6 Tables
    • 6.3 Statistical Charts
      • 6.3.1 Common Statistical Charts
      • 6.3.2 Dendrograms
      • 6.3.3 Radar Chart
      • 6.3.4 Polar Chart
      • 6.3.5 Streamline Chart
    • 6.4 Financial Charts
      • 6.4.1 Funnel Chart
      • 6.4.2 Candlestick Chart
      • 6.4.3 Waterfall Chart
  • Support
    • Donation
Powered by GitBook
On this page
  • 1. Basic Histogram
  • 2. Univariate Distribution
  • 3. Multivariate Distribution
  • 4. Vertical Distribution
  • 5. Two Dimensional Distribution

Was this helpful?

  1. 4. Seaborn
  2. 4.5 Distribution

4.5.3 Histogram plot

Previous4.5.2 Violin plotNext4.5.4 Density plot

Last updated 4 years ago

Was this helpful?

When dealing with a set of data, often the first thing you’ll want to do is get a sense of how the variables are distributed.

1. Basic Histogram

The most convenient way to take a quick look at a univariate distribution in seaborn is distplot(). By default, this will draw a and fit a (KDE).

# Create a simple dataset
d = np.random.multivariate_normal([1, 1], [[6, 2], [2, 2]], size=3000)
df = pd.DataFrame(d, columns=['S1', 'S2'])

2. Univariate Distribution

sns.distplot(df['S1'])
plt.xlabel('x')

Histograms are likely familiar, and a hist function already exists in matplotlib. A histogram represents the distribution of data by forming bins along with the range of the data and then drawing bars to show the number of observations that fall in each bin. We can try more or fewer bins that may reveal other features in the data.

f,axes = plt.subplots(1,2,figsize = (16,6))
sns.distplot(df['S1'], bins=20, kde=False, rug=True,ax = axes[0],color = 'dodgerblue')
axes[0].set_title('Bins: 20')

sns.distplot(df['S1'], bins=200, kde=False, rug=True,ax = axes[1],color = 'dodgerblue')
axes[1].set_title('Bins: 200')

3. Multivariate Distribution

sns.distplot(df['S1'],color = 'r',label = 'S1')
sns.distplot(df['S2'],color = 'dodgerblue', label = 'S2')

plt.xlabel('x')
plt.ylabel('Probability')
plt.legend()

4. Vertical Distribution

It is quite straightforward to make a vertical histogram with seaborn, just add vertical=True .

sns.distplot(df['S1'],color = 'r',label = 'S1',vertical=True)
sns.distplot(df['S2'],color = 'dodgerblue',label = 'S2',vertical=True)

plt.ylabel('x')
plt.xlabel('Probability')
plt.legend()

5. Two Dimensional Distribution

It is also possible to use the kernel density estimation procedure described above to visualize a bivariate distribution.

Example 1

sns.kdeplot(df, color ='r', shade=True)

Example 2

f, ax = plt.subplots(figsize=(6, 6))
sns.kdeplot(df.S1, df.S2, ax=ax)
sns.rugplot(df.S1, color='r', ax=ax)
sns.rugplot(df.S2, vertical=True, ax=ax)
histogram
kernel density estimate
3.5 Histogram Chart