 # Compute and plot linear regression in LaTeX using TikZ

When you are working with experimental data it’s usual to get disperse information. It means that the experiment doesn’t fit perfectly to the theory. In this cases the researcher needs a mathematical model that fits the better the scatter data. The most simplest way to find a mathematical model that fits experimental data is through a linear regression. Let’s learn how to do it in  $\LaTeX$. Suppose you get a series of data from a experiment that measure position ($r$) versus time ($t$) of a particle that moves with constant velocity. The data can be stored in a file named $\verb|r_vs_t.dat|$ and be allocated in the same folder of your main project.

t r
0 1
1.2 1.78
2.3 4.495
3.4 5.21
4.1 4.665
5.6 5.64
6.5 7.225
7.2 7.68
8.1 6.265
9.3 8.045
10.7 8.955 % Data file


## 1. Plot data from external file in $\LaTeX$

To plot this data, we can use the $\verb|\addplot|$ command along with the $\verb|table|$ option and specify the name of the columns we want to plot, as follows (more details):

\documentclass{standalone}

\usepackage{tikz}
\usepackage{pgfplots}
\usepackage{pgfplotstable}

\begin{document}
\begin{tikzpicture}
\begin{axis}[
xmin = 0, xmax = 11,
ymin = 0, ymax = 11,
width = \textwidth,
height = 0.75\textwidth,
xtick distance = 1,
ytick distance = 1,
grid = both,
minor tick num = 1,
major grid style = {lightgray},
minor grid style = {lightgray!25},
legend cell align = {left},
legend pos = north west
]
\addplot[teal, only marks] table[x = t, y = r] {r_vs_t.dat};
\end{axis}
\end{tikzpicture}
\end{document}


Notice that in the preamble we have included the $\verb|pgfplotstable|$ package. This package allows us to use the $\verb|table|$ command, and more important, it will help us to compute the linear regression for our data. In the previous code we have also included some extra options in the $\verb|axis|$ environment that changes the visualisation of the grid, the limits of the plots and the position of the legend. Also notice that in the $\verb|\addplot|$ command we have included the $\verb|only marks|$ options to get a scatter plot.

## 2. Linear regression in \LaTeX

Now here comes the interesting part. Usually to plot a linear regression we use third party software like $\verb|Excel|$ or $\verb|Calc|$. But with the $\verb|pgfplotstable|$ package we can compute the linear fitting inside the $\LaTeX$ document. We just need to add the next sentece to the code:
\addplot[options] table[
x = column_name,
y = {create col/linear regression = {y = column_name}}
] {data_file_name.dat};

Here we specify the name of the column in the $x$ axis, in our case it should be $\verb|t|$, and for the values for the $y$ axis we pass the command $\verb|create col/linear regression|$, which computes a linear regression for the $\verb|data_file_name.dat|$. The next code shows how to implement the code to plot the line that better fits the whole scatter data.
\documentclass{standalone}

\usepackage{tikz}
\usepackage{pgfplots}
\usepackage{pgfplotstable}

\begin{document}
\begin{tikzpicture}
\begin{axis}[
xmin = 0, xmax = 11,
ymin = 0, ymax = 11,
width = \textwidth,
height = 0.75\textwidth,
xtick distance = 1,
ytick distance = 1,
grid = both,
minor tick num = 1,
major grid style = {lightgray},
minor grid style = {lightgray!25},
xlabel = {Time ($t$)},
ylabel = {Position ($r$)},
legend cell align = {left},
legend pos = north west
]
\addplot[teal, only marks] table[x = t, y = r] {r_vs_t.dat};
x = t,
y = {create col/linear regression={y=r}}
] {r_vs_t.dat};
Linear regression:
$r = \pgfmathprintnumber{\pgfplotstableregressiona} \cdot t \pgfmathprintnumber[print sign]{\pgfplotstableregressionb}$
};
\end{axis}
\end{tikzpicture}
\end{document} ## 3. Print the equation of the linear regression

Now we can see the trend line plotted. But that’s not the most important feature of the $\verb|pgfplotstable|$ package. In this situations it’s not only important to plot the trend line, but also to find its equation. It’s well known that the equation of a linear regression looks like: $$y = a\cdot x + b$$
In this example, we got: $$r = 0.69\cdot t + 1.87$$ Which gives an idea about the movement equation of the particle. If we set the time in seconds and the position in meters, we have that the velocity is $0.69m/s$ and the initial position is $1.87m$. This equation is the mathematical model of the experiment.
Now we have to compute the constants $a$ and $b$. To compute $a$ we use the next command:$$\verb|\pgfmathprintnumber{\pgfplotstableregressiona}|$$ And to compute $b$ we use: $$\verb|\pgfmathprintnumber[print sign]{\pgfplotstableregressionb}|$$
In both cases we have used the $\verb|\pgfmathprintnumber|$ command, which converts a number to string, so you can use it in the legend as text. The $\verb|\pgfplotstableregressiona|$ and $\verb|\pgfplotstableregressionb|$ commands returns the values for the constants $a$ and $b$ respectively.

### This Post Has 2 Comments

1. There’s a typo in the first paragraph: in “It meas that the experiment doesn’t fit perfectly to the theory.” the word “meas” should be “means”

1. Many thanks Jason, the typo is corrected.