👀
Crash Visualization
  • Welcome
  • Preface
    • Who the book is written for
    • How the book is organized
  • 1. Introduction of Data Visualization
    • 1.1 What is data visualization?
    • 1.2 Why does visualization matter?
  • 2. Tricks in Visualization
    • 2.1 Choose Appropriate Chart
    • 2.2 Features of Charts
      • 2.2.1 Table
      • 2.2.2 Column Chart
      • 2.2.3 Line Chart
      • 2.2.4 Pie Chart
      • 2.2.5 Scatter Chart
      • 2.2.6 Map Chart
    • 2.3 Misused Graph
    • 2.4 Tips in Visualization
  • 3. Matplotlib
    • 3.1 Basic Concepts
    • 3.2 Line Chart
    • 3.3 Area Chart
    • 3.4 Column Chart
    • 3.5 Histogram Chart
    • 3.6 Scatter Chart
    • 3.7 Lollipop Chart
    • 3.8 Pie Chart
    • 3.9 Venn Chart
    • 3.10 Waffle Chart
    • 3.11 Animation
  • 4. Seaborn
    • 4.1 Trends
    • 4.2 Ranking
      • 4.2.1 Barplot
      • 4.2.2 Boxplot
    • 4.3 Composition
      • 4.3.1 Stacked Chart
    • 4.4 Correlation
      • 4.4.1 Scatter Plot
      • 4.4.2 Linear Relationship
      • 4.4.3 Heatmap
      • 4.4.4 Pairplot
    • 4.5 Distribution
      • 4.5.1 Boxplot
      • 4.5.2 Violin plot
      • 4.5.3 Histogram plot
      • 4.5.4 Density plot
      • 4.5.5 Joint plot
  • 5. Bokeh
    • 5.1 Basic Plotting
    • 5.2 Data Sources
    • 5.3 Annotations
    • 5.4 Categorical Data
    • 5.5 Presentation and Layouts
    • 5.6 Linking and Interactions
    • 5.7 Network Graph
    • 5.8 Widgets
  • 6. Plotly
    • 6.1 Fundamental Concepts
      • 6.1.1 Plotly Express
      • 6.1.2 Plotly Graph Objects
    • 6.2 Advanced Charts
      • 6.2.1 Advanced Scatter Chart
      • 6.2.2 Advanced Bar Chart
      • 6.2.3 Advanced Pie Chart
      • 6.2.4 Advanced Heatmap
      • 6.2.5 Sankey Chart
      • 6.2.6 Tables
    • 6.3 Statistical Charts
      • 6.3.1 Common Statistical Charts
      • 6.3.2 Dendrograms
      • 6.3.3 Radar Chart
      • 6.3.4 Polar Chart
      • 6.3.5 Streamline Chart
    • 6.4 Financial Charts
      • 6.4.1 Funnel Chart
      • 6.4.2 Candlestick Chart
      • 6.4.3 Waterfall Chart
  • Support
    • Donation
Powered by GitBook
On this page
  • Bar Chart
  • 1. Sorted Bar
  • 2. Filled Bar
  • 3. Stacked Bar
  • 4. Grouped Bar
  • 5. Stacked and Grouped Bar
  • 6. Mixed Factors
  • 7. Intervals
  • Scatter Chart

Was this helpful?

  1. 5. Bokeh

5.4 Categorical Data

Previous5.3 AnnotationsNext5.5 Presentation and Layouts

Last updated 4 years ago

Was this helpful?

Bar Chart

1. Sorted Bar

# Here is a list of categorical values (or factors)
names = ['Jane', 'David', 'Julian', 'Jonas', 'Macro', 'Barbara']
scores = [8, 9, 8, 5, 7, 6]
# sorting the bars means sorting the range factors
sorted_names = sorted(names, key=lambda x: scores[names.index(x)])
p = figure(x_range=sorted_names, plot_height=300, title="Sorted Student Scores")

# Categorical values can also be used as coordinates
p.vbar(x=names, top=scores, width=0.75)

# Set some properties to make the plot look better
p.xgrid.grid_line_color = None
p.y_range.start = 0
show(p)

2. Filled Bar

Often times we may want to have bars that are shaded some color. This can be accomplished in different ways. One way is to supply all the colors upfront. This can be done by putting all the data, including the colors for each bar, in a ColumnDataSource.

from bokeh.palettes import Spectral6
source = ColumnDataSource(data=dict(names = names, scores=scores, color=Spectral6))

p = figure(x_range=names, y_range=(0,10), plot_height=400, title="Student Scores")
p.vbar(x='names', top='scores', width=0.75, color='color', legend_field="names", source=source)

p.xgrid.grid_line_color = None
p.legend.orientation = "horizontal"
p.legend.location = "top_center"
show(p)

3. Stacked Bar

Another common operation or bar chart is to stack bars on top of one another. Bokeh makes this easy to do with the specialized hbar_stack() and vbar_stack() .

years = ["2017", "2018", "2019"]
colors = ['yellow','dodgerblue','forestgreen']
data = {'names' : names,
        '2017'   : [2, 3, 1, 3, 2, 3],
        '2018'   : [1, 3, 2, 1, 5, 2],
        '2019'   : [3, 2, 1, 1, 1, 4]}
p = figure(x_range=names, plot_height=400, title="Student Score by Year")

p.vbar_stack(years, x='names', width=0.75, color=colors,alpha = 0.5, source=data,
             legend_label=years)

p.y_range.start = 0
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_center"
p.legend.orientation = "horizontal"
show(p)

4. Grouped Bar

When creating bar charts, it is often desirable to visually display the data according to sub-groups.

from bokeh.models import ColumnDataSource, FactorRange

# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", "2017"), ("Pears", "2015), ... ]
x = [ (name, year) for name in names for year in years ]
counts = sum(zip(data['2017'], data['2018'], data['2019']), ()) # like an hstack

source = ColumnDataSource(data=dict(x=x, counts=counts))
p = figure(x_range=FactorRange(*x), plot_height=400, title="Student Score by Year")
           #toolbar_location=None, tools="")

p.vbar(x='x', top='counts', width=0.75,color ='dodgerblue', source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

Color Mapping

We can also apply a color mapping, similar to the earlier example.


p.vbar(x='x', top='counts', width=0.75, source=source, line_color="white",

       # use the palette to colormap based on the the x[1:2] values
       fill_color=factor_cmap('x', palette=Spectral6, factors=years, start=1, end=2))

Factor Mapping

Another way we can set the color of the bars is to use a transform. Here we use a new one factor_cmap that accepts the name of a column to use for color mapping, as well as the palette and factors that define the color mapping.

Additionally, we can configure it to map just the sub-factors if desired. we want to only shade based on the year. So we pass start=1 and end=2 to specify the slice range of each factor to use when color mapping.

from bokeh.transform import factor_cmap

p = figure(x_range=FactorRange(*x), plot_height=400, title="Student Score by Year")

p.vbar(x='x', top='counts', width=0.75, source=source, line_color="white",

       # use the palette to colormap based on the the x[1:2] values
       fill_color=factor_cmap('x', palette=['salmon', 'forestgreen', 'dodgerblue'], factors=years, start=1, end=2))

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

5. Stacked and Grouped Bar

We've already discussed stacked bar and grouped bar, respectively. In fact, it is also possible to combine these two together, if you have a piece of complicated information to show. Here is an example, let's add a new factor called "city" for the dataset. It will show you Berlin's and Munich's performance in the 4 quarters and each month.

cities = ['Berlin', 'Munich']
source = ColumnDataSource(data=dict(
    x=factors,
    Berlin=[ 5, 5, 6, 5, 5, 4, 5, 6, 7, 8, 6, 9 ],
    Munich=[ 6, 8, 7, 5, 6, 4, 4, 7, 7, 6, 7, 8 ],
))
p = figure(x_range=FactorRange(*factors), plot_height=350)

p.vbar_stack(cities, x='x', width=0.75, alpha=0.5, color=["dodgerblue", "salmon"], source=source,
             legend_label=cities)
p.y_range.start = 0
p.y_range.end = 18
p.x_range.range_padding = 0.2
p.xaxis.major_label_orientation = 0
p.xgrid.grid_line_color = None
p.legend.location = "top_center"
p.legend.orientation = "horizontal"

show(p)

6. Mixed Factors

If you have created a range with nested categories as above, it is possible to plot glyphs using only the "outer" categories. Additionally, if you want to address other information, for example, the average performance of each quarter, you can overlay a line directly.

Let's create a quarterly dataset.

factors = [("Q1", "Jan"), ("Q1", "Feb"), ("Q1", "Mar"),
           ("Q2", "Apr"), ("Q2", "May"), ("Q2", "Jun"),
           ("Q3", "Jul"), ("Q3", "Aug"), ("Q3", "Sep"),
           ("Q4", "Oct"), ("Q4", "Nov"), ("Q4", "Dec")]

p = figure(x_range=FactorRange(*factors), plot_height=350)

x = [ 10, 12, 14, 9, 10, 8, 12, 13, 7, 18, 13, 15 ]
p.vbar(x=factors, top=x, width=0.75, alpha=0.5)

qs, aves = ["Q1", "Q2", "Q3", "Q4"], [12, 9, 13, 14]
p.line(x=qs, y=aves, color="firebrick", line_width=3,legend_label='pretend to be AVG')
p.circle(x=qs, y=aves, line_color="firebrick", fill_color="white", size=10)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"

show(p)

7. Intervals

from bokeh.sampledata.sprint import sprint

sprint.Year = sprint.Year.astype(str)
group = sprint.groupby('Year')
source = ColumnDataSource(group)
p = figure(y_range=group, x_range=(9.5,12.7), plot_width=400, plot_height=550, 
           title="Time Spreads for Sprint Medalists (by Year)")
p.hbar(y="Year", left='Time_min', right='Time_max', height=0.4, source=source,color = 'royalblue')

p.ygrid.grid_line_color = None
p.xaxis.axis_label = "Time (seconds)"
p.yaxis.axis_label = "Year"
p.outline_line_color = None

show(p)

Scatter Chart

Let's continue using the student score dataset and create an intuitive lollipop chart.

names = ['Jane', 'David', 'Julian', 'Jonas', 'Macro', 'Barbara']
scores = [8, 9, 8, 5, 7, 6]
dot = figure( plot_width=400, plot_height=400, title="Student Score: Lollipop",
            y_range=names, x_range=[0,10])

dot.segment(0, names, scores, names, line_width=2, line_color="blue", )
dot.circle(scores, names, size=25, fill_color="pink", line_color="blue", line_width=3 )
show(dot)