Linear Regression in LaTeX using TikZ

  • When you are working with experimental data it’s usual to get disperse information. It means that the experiment doesn’t fit perfectly to the theory. In this cases the researcher needs a mathematical model that fits the better the scatter data. The most simplest way to find a mathematical model that fits experimental data is through a linear regression. In this tutorial, we will learn how to compute and plot a linear regression for a series of scatter data using the TikZ package.
compute and plot Linear regression in LaTeX TikZ Pgfplots

Illustrative Example

Suppose we have a series of data obtained from an experiment that measure position (x) versus time (t) of a particle that moves with constant velocity. The data can be stored in a file named result.dat and saved in the same folder of your main project.

 
t	      x
0	      1
1.2	    1.78
2.3	    4.495
3.4	    5.21
4.1	    4.665
5.6	    5.64
6.5	    7.225
7.2	    7.68
8.1	    6.265
9.3	    8.045
10.7	  8.955 

Plot data from a file in LaTeX

To plot this data in LaTeX, we can use the \addplot command along with the table option and specify the name of the columns we want to plot, like follows:

 \documentclass{standalone}

% Required package
\usepackage{pgfplots}
\usepackage{pgfplotstable}
\pgfplotsset{compat = newest}

\begin{document}

\begin{tikzpicture}

\begin{axis}[
	xmin = 0, xmax = 11,
	ymin = 0, ymax = 11,
	width = \textwidth,
	height = 0.75\textwidth,
	xtick distance = 1,
	ytick distance = 1,
	grid = both,
	minor tick num = 1,
	major grid style = {lightgray},
	minor grid style = {lightgray!25},
]
% plot data line code
\addplot[teal, only marks] table[x = t, y = x] {result.dat};

\end{axis}

\end{tikzpicture}

\end{document} 

Compiling the above code yields:

Plot data in LaTeX using PGFplots TikZ

Plot data in LaTeX using Pgfplots package

Notice that in the preamble we have included the pgfplotstable package. This package allows us to use the table command, and more important, it will help us to compute the linear regression for our data. In the previous code, we have also included some extra options in the axis environment that changes the style of the grid, the limits of the plots. Also notice that in the \addplot command we have included the only marks options to get a scatter plot. For more details, I invite you to read this post about plotting functions and data in LaTeX.

Compute and Plot Linear Regression  in LaTeX

Now here comes the interesting part. Usually to plot a linear regression we use third party software like Excel. But with the pgfplotstable package we can compute the linear fitting inside the LaTeX document. We just need to add the next sentence to the code:

\addplot[options] table[
  x = column_name,
  y = {create col/linear regression = {y = column_name}}
] {data_file_name.dat};


Here we specify the name of the column in the x axis, in our case it should be t, and for the values for the y axis we pass the command create col/linear regression, which computes a linear regression for the data_file_name.dat.

The next code shows how to implement the code to plot the line that better fits the whole scatter data:

 \documentclass{standalone}

% Required package
\usepackage{pgfplots}
\usepackage{pgfplotstable}
\pgfplotsset{compat = newest}

\begin{document}

\begin{tikzpicture}
\begin{axis}[
	xmin = 0, xmax = 11,
	ymin = 0, ymax = 11,
	width = \textwidth,
	height = 0.75\textwidth,
	xtick distance = 1,
	ytick distance = 1,
	grid = both,
	minor tick num = 1,
	major grid style = {lightgray},
	minor grid style = {lightgray!25},
	xlabel = {Time ($t$)},
	ylabel = {Position ($x$)},
	legend cell align = {left},
	legend pos = north west
]

% Plot data
\addplot[
	teal, 
	only marks
] table[x = t, y = x] {result.dat};

% Linear regression
\addplot[
	thick,
	orange
] table[
	x = t,
	y = {create col/linear regression={y=x}}
] {result.dat};

% Add legend
\addlegendentry{Data}
\addlegendentry{
	Linear regression: $ x =
	\pgfmathprintnumber{\pgfplotstableregressiona}
	\cdot t
	\pgfmathprintnumber[print sign]{\pgfplotstableregressionb}$
};

\end{axis}

\end{tikzpicture}

\end{document} 
compute and plot Linear regression in LaTeX TikZ Pgfplots

Compute and plot Linear regression in LaTeX

Comments:

Now we can see the trend line plotted. But that's not the most important feature of the pgfplotstable package. In this situations it's not only important to plot the trend line, but also to find its equation.

It's well known that the equation of a linear regression looks like:

Now we need to compute the constants a and b. The slope parameter a can be computed using the following command:

\pgfmathprintnumber{\pgfplotstableregressiona}

And to compute b we use:

\pgfmathprintnumber[print sign]{\pgfplotstableregressionb}

  • In both cases we have used the \pgfmathprintnumber command, which converts a number to string, so you can use it in the legend as text.

 The \pgfplotstableregressiona and \pgfplotstableregressionb commands returns the values for the constants a and b respectively. 

For our illustrative example, we got:

Which represents the movement equation of the particle.

  • You may wonder if this method can be applied for quadratic or general polynomial fitting. Unfortunately not. The pgfplotstable package allows only linear regression. If you want any other type or regression, you may compute it externally.
  • We reached the end of today's tutorial. If you have any questions or remarks, leave me a comment below or reach me via e-mail at admin@latexdraw.com, I will be happy to hear from you!


2 thoughts on “Linear Regression in LaTeX using TikZ”

    • Many thanks Walson for your feedback, I really appreciate it 😊!
      Yes, we can change the data file or simply change the \addplot options. We specify the first line labels of the data file: table[x = t, y = x] instead of table[x = t, y = r].
      Thanks again for you remark!

Comments are closed.