Sharing is caring

# Linear Regression in LaTeX using TikZ

• When you are working with experimental data it’s usual to get disperse information. It means that the experiment doesn’t fit perfectly to the theory. In this cases the researcher needs a mathematical model that fits the better the scatter data. The most simplest way to find a mathematical model that fits experimental data is through a linear regression. In this tutorial, we will learn how to compute and plot a linear regression for a series of scatter data using the TikZ package.

## Illustrative Example

Suppose we have a series of data obtained from an experiment that measure position (x) versus time (t) of a particle that moves with constant velocity. The data can be stored in a file named result.dat and saved in the same folder of your main project.


t	      x
0	      1
1.2	    1.78
2.3	    4.495
3.4	    5.21
4.1	    4.665
5.6	    5.64
6.5	    7.225
7.2	    7.68
8.1	    6.265
9.3	    8.045
10.7	  8.955 

## Plot data from a file in LaTeX

To plot this data in LaTeX, we can use the \addplot command along with the table option and specify the name of the columns we want to plot, like follows:

 \documentclass{standalone}

% Required package
\usepackage{pgfplots}
\usepackage{pgfplotstable}

\begin{document}

\begin{tikzpicture}

\begin{axis}[
xmin = 0, xmax = 11,
ymin = 0, ymax = 11,
width = \textwidth,
height = 0.75\textwidth,
xtick distance = 1,
ytick distance = 1,
grid = both,
minor tick num = 1,
major grid style = {lightgray},
minor grid style = {lightgray!25},
]
% plot data line code
\addplot[teal, only marks] table[x = t, y = x] {result.dat};

\end{axis}

\end{tikzpicture}

\end{document} 

Compiling the above code yields:

Plot data in LaTeX using Pgfplots package

Notice that in the preamble we have included the pgfplotstable package. This package allows us to use the table command, and more important, it will help us to compute the linear regression for our data. In the previous code, we have also included some extra options in the axis environment that changes the style of the grid, the limits of the plots. Also notice that in the \addplot command we have included the only marks options to get a scatter plot. For more details, I invite you to read this post about plotting functions and data in LaTeX.

## Compute and Plot Linear Regression  in LaTeX

Now here comes the interesting part. Usually to plot a linear regression we use third party software like Excel. But with the pgfplotstable package we can compute the linear fitting inside the LaTeX document. We just need to add the next sentence to the code:

x = column_name,
y = {create col/linear regression = {y = column_name}}
] {data_file_name.dat};

Here we specify the name of the column in the x axis, in our case it should be t, and for the values for the y axis we pass the command create col/linear regression, which computes a linear regression for the data_file_name.dat.

The next code shows how to implement the code to plot the line that better fits the whole scatter data:

 \documentclass{standalone}

% Required package
\usepackage{pgfplots}
\usepackage{pgfplotstable}

\begin{document}

\begin{tikzpicture}
\begin{axis}[
xmin = 0, xmax = 11,
ymin = 0, ymax = 11,
width = \textwidth,
height = 0.75\textwidth,
xtick distance = 1,
ytick distance = 1,
grid = both,
minor tick num = 1,
major grid style = {lightgray},
minor grid style = {lightgray!25},
xlabel = {Time ($t$)},
ylabel = {Position ($x$)},
legend cell align = {left},
legend pos = north west
]

% Plot data
teal,
only marks
] table[x = t, y = x] {result.dat};

% Linear regression
thick,
orange
] table[
x = t,
y = {create col/linear regression={y=x}}
] {result.dat};

Linear regression: $x = \pgfmathprintnumber{\pgfplotstableregressiona} \cdot t \pgfmathprintnumber[print sign]{\pgfplotstableregressionb}$
};

\end{axis}

\end{tikzpicture}

\end{document} 

Compute and plot Linear regression in LaTeX

Now we can see the trend line plotted. But that's not the most important feature of the pgfplotstable package. In this situations it's not only important to plot the trend line, but also to find its equation.

It's well known that the equation of a linear regression looks like:

x = a\cdot t + b

Now we need to compute the constants a and b. The slope parameter a can be computed using the following command:

\pgfmathprintnumber{\pgfplotstableregressiona}

And to compute b we use:

\pgfmathprintnumber[print sign]{\pgfplotstableregressionb}

• In both cases we have used the \pgfmathprintnumber command, which converts a number to string, so you can use it in the legend as text.

The \pgfplotstableregressiona and \pgfplotstableregressionb commands returns the values for the constants a and b respectively.

For our illustrative example, we got:

x = 0.69\cdot t + 1.87

Which represents the movement equation of the particle.

• You may wonder if this method can be applied for quadratic or general polynomial fitting. Unfortunately not. The pgfplotstable package allows only linear regression. If you want any other type or regression, you may compute it externally.
• We reached the end of today's tutorial. If you have any questions or remarks, leave me a comment below or reach me via e-mail at admin@latexdraw.com, I will be happy to hear from you!