

Trend Analysis With Excel

Overview of Excel Regression Tools Link to workbook
Excel provides a number of functions that allow users to perform regression analysis of times series/ trend data. This post shows 6 ways to use Excel for regression analysis, from simple to advanced:
1. Add trend line with display of regression and r^{2} to chart
2. Prepare manual calculations
3. Use Excel Functions (INTERCEPT, SLOPE, RSQ)
4. Use Excel Function  LINEST
5. Use Excel's Analysis Toolpak addin
6. Use Combination of Excel Statistical Functions to get equivalent report as Analysis Toolpak
We will use the same data set for each approach, annual average temperatures, ^{ o}C, in Vaexjoe, Sweden between 1918 and 2001. These annual values have been calculated from daily observations archived by the European Climate Assessment and Dataset (ECAD) site.
The data and exhibits are all available in the Excel_Regression_tools.xls at this link.
Excel Regression Approaches 
Approach 
Description 
Example  (Click to enlarge)

Chart with Trend Line 
The simplest way to get the regression formula for your data is to create a simple XY chart and to add the Trendline formula and r2 values from the Options dialogue.
This simple technique only provides the equation and r^{2} values as text on the chart. If you want to use the equation or conduct significance test, you need to use one of the other techniques.


Manual Calculations 
Since Excel can reproduce just about any calculation that you want, you can add the necessary statistical formulas from a statistical text and produce any statistical parameter you want.
While this technique gives you maximum flexibility, you may want to evaluate Excel's builtin statistical functions before you reinvent the wheel


Excel's SLOPE, INTERCEPT and RSQ Functions 
Excel's builtin SLOPE, INTERCEPT and RSQ functions allow the user to calculate these values directly.
The functions return the same results as adding the trendline to the chart technique. In this case, however, the statistical parameters are in a user specified cell so they can be used for further analysis.


Excel's LINEST Function 
The LINEST function returns ten statistical parameters for a simple linear regression:
o Regression coefficients for a_{1} and a_{0
}o _{}Standard error values for coefficients
o r^{2} and Standard error of the Y estimate
o F observed value
o Degrees of freedom
o Regression sum of squares
o Residual sum of squares
The LINEST function returns an array of values so that it must be entered as an array formula. Rather than use the entire array, the user can return individual LINEST parameters to a cell by using the Index function, as shown in this formula to return the F value:
=INDEX(LINEST(y,x,1,1),r,c)
Where:
r = row of LINEST array (4 for F value)
c = column of LINEST array (1 for F value)


Analysis Toolpak Regression Procedure 
Microsoft's Analysis Toolpak addin includes a number of advanced analysis data analysis tools, including Regression. The Regression procedure provides regression statistics, ANOVA, regression coefficients, their standard errors, t stat's, p values and upper and lower confidence values.
The Regression procedure results are comparable to output from statistical packages.


Combination of Excel Statistical Functions 
In addition to the 10 regression measures provided by LINEST, Excel provides T and F value test functions so that users can reproduce the Analysis Toolpak Regression procedure results in a series of cell formulas that use a combination of LINEST, T and F test functions.
The picture to the right shows a screen print of an worksheet that allows the user to interactively select the regression period, then, prepares the regression analysis using a combination of Excel LINEST, T and F tests functions as well as conditional formatting to evaluate the regression.


Link to Excel Regression Tools workbook
Top

Interactive Regression Link to Interactive Regression Workbook
This post shows how to build an interactive regression worksheet that allows you to adjust X value start and end points, automatically compute regression values using Excel LINEST function, assess goodness of fit, plot the results, display the regression line, r^{2} and significance of F test on the plot. This workbook provides equivalent regression results to the Analysis Toolpak Regression procedure using standard Excel functions and formulas.
We will use the same data set we used for the Excel Regression Tools post, annual average temperatures, ^{ o}C, in Vaexjoe, Sweden between 1918 and 2001. These annual values have been calculated from daily observations archived by the European Climate Assessment and Dataset (ECAD) site.
The data and interactive regression tool are available in the Interactive_Regression.xls at this link.
Regression Input and Output: The interactive regression input and output is shown in the figure below. Scroll bars are used to select start and end years. The output uses Excel LINEST function to calculate regression properties. The chart shows all data points in grey and the selected period in dark blue. The display shows the regression formula, r2 as well as significance of F test.
Dynamic Named Ranges: The workbook includes two active sheets, Data and Interactive_Regress. The data sheet is organized simply as a time series with year in Column A and mean annual temperature in Col. B. Dynamic range names are used to simplify the interactive features:
Dynamic Range Name 
Comment 
Name Refers to Formula 
all_x 
Includes all original data X values 
=OFFSET(Data!$A$2,0,0,COUNTA(Data!$A$2:$A$1000)) 
all_y 
Includes all original Y values 
=OFFSET(all_x,0,1) 
end_row 
Calculated with Match formula 
='Interactive Regr'!$N$4 
end_year 
Last year for user selected period  Obtained from Scroll bar link cell 
='Interactive Regr'!$M$4 
slope 
Calculated with Slope function 
='Interactive Regr'!$B$14 
st_row 
Calculated with Match formula 
='Interactive Regr'!$N$3 
st_year 
First year for user selected time period  Obtained from scroll bar link cell 
='Interactive Regr'!$M$3 
x 
Dynamic range of X values 
=OFFSET(Data!$A$1,st_row1,0,end_row(st_row1),1) 
y 
Dynamic range of Y values 
=OFFSET(x,0,1) 
As the user moves the start and end scroll bars, the X and Y dynamic ranges are automatically adjusted; the regression calculations and chart are also automatically adjusted.
Regression Calculations: The Interactive Regression workbook uses Excel's LINEST, FDIST, TDIST and TINV functions to provides the key regression and ANOVA parameters. The parameters, Excel functions and cell formulas are shown below.
Regression Functions and Formulas 
Parameter 
Excel Function 
Formula 
r^{2}

LINEST 
=INDEX(LINEST(y,x,1,1),3,1) 
SE 
LINEST 
=INDEX(LINEST(y,x,1,1),3,2) 
Slope 
LINEST 
=INDEX(LINEST(y,x,1,1),1,1) 
Slope SE 
LINEST 
=INDEX(LINEST(y,x,1,1),2,1) 
Intercept 
LINEST 
=INDEX(LINEST(y,x,1,1),1,2) 
Intercept SE 
LINEST 
=INDEX(LINEST(y,x,1,1),2,2) 
SSregr 
LINEST 
=INDEX(LINEST(y,x,1,1),5,1) 
SS residuals 
LINEST 
=INDEX(LINEST(y,x,1,1),5,2) 
DF residuals 
LINEST 
=INDEX(LINEST(y,x,1,1),4,2) 
F 
LINEST 
=INDEX(LINEST(y,x,1,1),4,1) 
Signif of F 
FDIST 
=FDIST(E9,1,B10) 
Slope p value 
TDIST 
=TDIST(ABS(D14),B10,2) 
Intercept p value 
TDIST 
=TDIST(ABS(D13),B11,2) 
Slope Lower 95% CL 
TINV 
=B14TINV(B1,B10)*C14 
Slope Upper 95% CL 
TINV 
=B14+TINV(B1,B10)*C14 
Intercept Lower 95% CL 
TINV 
=B13(TINV(B1,B10)*C13) 
Intercept Upper 95% CL 
TINV 
=B13+TINV(B1,B10)*C13 
Chart: The chart shows all data as well as the dynamic X and Y ranges. This way, the user can evaluate trends in a portion of the data set and still see the entire data set. The regression line, r2 and significance of F are reported on the chart as well as the start and end years.
Link to Interactive Regression Workbook
Top

Segmented  Piecewise Regression
" Linear regression is done to detect a mathematical relationship between several series of measured things... Linear regression can be done with all the data .. or by parts (segments) of the data base. The segments are introduced to see if there are abrupt changes in the relation under investigation. ..The boundaries between the segments may be called breakpoints." Wikipedia
The Fargo, ND chart shows a segmented regression analysis of the trend in beginning day of snowmelt prepared by Tamino for his Open Mind blog.
Tamino described an iterative process that he used to find the best fit for his segmented regression. I wanted to see if I could create an automated way to find the best breakpoint using Excel. This post shows both an interactive method as well as a Solver based method.
This link provides a tutorial by Sandra Ryan and Laurie Porth of the USDA. In their example, they show regression lines for two segments and how to calculate the breakpoint, C, where the two lines meet.
Point C is the intersection, hinge point or breakpoint between the two regression lines. We want to find C so that we can define the left and right regressions.
Ryan and Porth solution is reproduced below:
Left Line: Y_{1 } = a_{1}+b_{1}X for X <= C
Right Line: Y_{2} = a_{2} + b_{2}X for X > C
For the left and right regressions to be continuous at point C, Y_{1} = Y_{2.
} a_{1} + b_{1}C = a_{2} + b_{2}C
a_{2} = a_{1} + C(b_{1}  b_{2})
We can use either an iterative approach or an minimization approach to find the value of C that generates the lowest least squares value.
I have created a workbook that calculates segmented  piecewise regression using both an interactive approach and a minimization approach, with Solver. This link provides a zip copy of the Seg_Piecewsie_regr.xls file.
Top

Multiple Period Regression
There are situations where you want to look at regressions for multiple periods over your entire data set. This Excel based chart shows an example of a multiple period regression analysis, in this case the Northern Hemisphere temperature anomaly  F for the period 1880  2006. Trend lines have been added to compare trends over the 126 year period.

Trend Line Change Rate & r^{2} 
Period 
Change Rate ^{o}F/Year

r^{2} 
1880  1921 
0.007 
0.11 
1922  1938 
0.036 
0.44 
1939  1972 
 0.017 
0.38 
1972  2006 
0.054 
0.79 


There have been at least 4 distinct N Hemisphere temperature anomaly periods in the 1880  2006 period. The rate of change has varied from a low to 0.017 ^{o}F per year in the 1939  1972 period to a high of 0.054 ^{o}F in the 1973  2006 period.
The Excel workbook is available here.
Top

CuSum and Change Point Analysis
CuSum charts and change point analysis provide comparative information that can be useful in analysis of trend data to find potential trend shifts in several series. Dr. Wayne Taylor ( Variation.Com) has developed an Excel addin that combines the use of CuSum charts and a bootstrapping technique to compute 1,000 or more iterations of the CuSum chart. The analysis output provides information on changes in trend means.
As an example, I conducted a change point analysis of 5 Swiss temperature station data sets. The 5 Swiss station CuSum charts with noted change point years are shown to the right. Notice that all 5 series had change points in 1988. Geneva had change points 4 years, 1920, 1943, 1962 in addition to 1988. Lugano had change points in 1919, 1940, and 1988, Saentis had change points in 1920 and 1988, and Zurich had change points in 1943, 1951 as well as 1988.
While the 5 stations are physically quit distant, and the temperature values also differ, the patterns of the 5 CuSum charts are comparable. For example, the 1920 change points in Saentis and Geneva are close to the 1919 change point for Lugano. Zurich's 1919  1920 CuSum chart shows a change in direction, it is simply to small a change to be classified as a change point.
This table shows change point years for each station.
5 Swiss Station Temperature Series (1901  2004)
Change Point Analysis Summary: 
Change Pt Year 
Basel 
Geneva 
Lugano 
Saentis 
Zurich 
1919 


Y 


1920 

Y 

Y 

1940 


Y 


1943 

y 


Y 
1951 




Y 
1962 

Y 



1988 
Y 
Y 
Y 
Y 
Y 
No CP Yrs 
1 
4 
3 
2 
3 

The Excel workbook, with the CuSum and change point analysis for these 5 Swiss stations, is available at this link.
Top

Time Series Analysis Resources and Links
Link 
Author 
Description 
MM

Michael R Middletown 
Data Analysis Using Microsoft Excel

HB

Harvey J Brightman 
Data Analysis in Plain English with Microsoft Excel

CP

Physics Dept., Clemson Univ.

Linear Regression and Excel  Simple tutorial on how to calculated regression statistics in Excel. 
SSC

Statistical ServiceCentre  Univ of Reading

Using Excel for Statistics  SSC is an excellent source of information on statistical analysis and the use of Excel for statistical analysis. 
TM

Tushar Mehta

Trendline Coefficients  an excellent paper that covers Excel regression from A to Z. Great place to learn or relearn about Excel regression 
BL1

Bernard Liengme

Regression Analysis  Confidence Interval of the Line of Best Fit  post and downloadable workbook show you how to calculate and plot confidence interval for predicted Y value for a given value of the independent variable X 
BL2

Bernard Liengme

Regression Analysis  Confidence Interval of the Slope  post and downloadable workbook show you how to calculate the confidence interval for the slope of your regression. 
Top

