👀
Crash Visualization
  • Welcome
  • Preface
    • Who the book is written for
    • How the book is organized
  • 1. Introduction of Data Visualization
    • 1.1 What is data visualization?
    • 1.2 Why does visualization matter?
  • 2. Tricks in Visualization
    • 2.1 Choose Appropriate Chart
    • 2.2 Features of Charts
      • 2.2.1 Table
      • 2.2.2 Column Chart
      • 2.2.3 Line Chart
      • 2.2.4 Pie Chart
      • 2.2.5 Scatter Chart
      • 2.2.6 Map Chart
    • 2.3 Misused Graph
    • 2.4 Tips in Visualization
  • 3. Matplotlib
    • 3.1 Basic Concepts
    • 3.2 Line Chart
    • 3.3 Area Chart
    • 3.4 Column Chart
    • 3.5 Histogram Chart
    • 3.6 Scatter Chart
    • 3.7 Lollipop Chart
    • 3.8 Pie Chart
    • 3.9 Venn Chart
    • 3.10 Waffle Chart
    • 3.11 Animation
  • 4. Seaborn
    • 4.1 Trends
    • 4.2 Ranking
      • 4.2.1 Barplot
      • 4.2.2 Boxplot
    • 4.3 Composition
      • 4.3.1 Stacked Chart
    • 4.4 Correlation
      • 4.4.1 Scatter Plot
      • 4.4.2 Linear Relationship
      • 4.4.3 Heatmap
      • 4.4.4 Pairplot
    • 4.5 Distribution
      • 4.5.1 Boxplot
      • 4.5.2 Violin plot
      • 4.5.3 Histogram plot
      • 4.5.4 Density plot
      • 4.5.5 Joint plot
  • 5. Bokeh
    • 5.1 Basic Plotting
    • 5.2 Data Sources
    • 5.3 Annotations
    • 5.4 Categorical Data
    • 5.5 Presentation and Layouts
    • 5.6 Linking and Interactions
    • 5.7 Network Graph
    • 5.8 Widgets
  • 6. Plotly
    • 6.1 Fundamental Concepts
      • 6.1.1 Plotly Express
      • 6.1.2 Plotly Graph Objects
    • 6.2 Advanced Charts
      • 6.2.1 Advanced Scatter Chart
      • 6.2.2 Advanced Bar Chart
      • 6.2.3 Advanced Pie Chart
      • 6.2.4 Advanced Heatmap
      • 6.2.5 Sankey Chart
      • 6.2.6 Tables
    • 6.3 Statistical Charts
      • 6.3.1 Common Statistical Charts
      • 6.3.2 Dendrograms
      • 6.3.3 Radar Chart
      • 6.3.4 Polar Chart
      • 6.3.5 Streamline Chart
    • 6.4 Financial Charts
      • 6.4.1 Funnel Chart
      • 6.4.2 Candlestick Chart
      • 6.4.3 Waterfall Chart
  • Support
    • Donation
Powered by GitBook
On this page
  • 1. Regplot
  • 2. Lmplot
  • 3. Conditioning Linear Regression

Was this helpful?

  1. 4. Seaborn
  2. 4.4 Correlation

4.4.2 Linear Relationship

Previous4.4.1 Scatter PlotNext4.4.3 Heatmap

Last updated 4 years ago

Was this helpful?

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response and one or more explanatory variables.

A can be a helpful tool in determining the strength of the relationship between two variables. If there appears to be no association between the proposed explanatory and dependent variables, then fitting a linear regression model to the data probably will not provide a useful model.

Before attempting to fit a linear model to observed data, a modeler should first determine whether or not there is a relationship between the variables of interest. This does not necessarily imply that one variable causes the other, but that there is some significant association between the two variables.

There are two main functions that can visualize a linear relationship as determined through regression. regplot()performs a simple linear regression model fit and plot. lmplot() combines regplot() and FacetGrid. Compared to the first one,lmplot() is more computationally intensive and is intended as a convenient interface to fit regression models across conditional subsets of a dataset.

1. Regplot

import seaborn as sns
tips = sns.load_dataset("tips")
sns.regplot(x = 'total_bill', y = 'tip',data = tips)

2. Lmplot

sns.lmplot(x="total_bill", y="tip", data=tips)

It’s also possible to fit a linear regression when one of the variables takes discrete values. One option is to add some random noise (“jitter”) to the discrete values to make the distribution of those values more clear. The other is to collapse over the observations in each discrete bin to plot an estimate of central tendency along with a confidence interval.

sns.lmplot(x="size", y="tip", data=tips, x_estimator=np.mean)

3. Conditioning Linear Regression

How does the relationship between these two variables change as a function of a third variable?

The best practice is to plot both levels on the same axes and to use color to distinguish them.

sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips,
           markers=["o", "x"])

To add another variable, we can draw multiple “facets” which each level of the variable appearing in the rows or columns of the grid.

sns.lmplot(x="total_bill", y="tip", hue="smoker", col="time", data=tips)
sns.lmplot(x="total_bill", y="tip", hue="smoker",
           col="time", row="sex", data=tips)

In the figure below, the two axes don’t show the same relationship conditioned on two levels of a third variable; rather, PairGrid() is used to show multiple relationships between different pairings of the variables in a dataset.

We can use height and aspect to control the height and width of each facet.

sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],
             height=5, aspect=.8, kind="reg")
sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],
             hue="smoker", height=5, aspect=.8, kind="reg")

While always shows a single relationship, lmplot() combines regplot() with FacetGrid to provide an easy interface to show a linear regression on “faceted” plots that allow you to explore interactions with up to three additional categorical variables.

regplot()
scatterplot
Regplot
lmplot