|
| |
Trend Analysis With Excel
|
|
Overview of Excel Regression
Tools
Link to
workbook Excel provides a number of functions
that allow users to perform regression analysis of times
series/ trend data. This post shows 6 ways to
use Excel for regression analysis, from simple to advanced:
1. Add trend line with display of
regression and r2 to chart 2. Prepare manual
calculations 3. Use Excel Functions (INTERCEPT,
SLOPE, RSQ) 4. Use Excel Function - LINEST 5.
Use Excel's Analysis Toolpak add-in 6. Use
Combination of Excel Statistical Functions to
get equivalent report as Analysis Toolpak
We will use the same data set for each approach, annual
average temperatures, oC, in Vaexjoe, Sweden between 1918 and
2001. These annual values have been calculated from daily
observations archived by the European Climate Assessment and Dataset (ECAD)
site.
The data and exhibits are all available in the
Excel_Regression_tools.xls at this
link.
| Excel Regression
Approaches |
| Approach |
Description |
Example -
(Click to enlarge) |
| Chart with Trend
Line |
The simplest way to get the regression formula for
your data is to create a simple XY chart and to add
the Trendline formula and r2 values from the Options
dialogue. This simple technique only provides the
equation and r2 values as text on the
chart. If you want to use the equation or conduct
significance test, you need to use one of the other
techniques. |
 |
| Manual
Calculations |
Since
Excel can reproduce just about any calculation that
you want, you can add the necessary statistical
formulas from a statistical text and produce any
statistical parameter you want. While this
technique gives you maximum flexibility, you may
want to evaluate Excel's built-in statistical
functions before you re-invent the wheel |
 |
| Excel's SLOPE,
INTERCEPT and RSQ Functions |
Excel's
built-in SLOPE, INTERCEPT and RSQ functions allow
the user to calculate these values directly.
The functions return the same results as adding
the trendline to the chart technique. In this case,
however, the statistical parameters are in a user
specified cell so they can be used for further
analysis. |
 |
| Excel's LINEST
Function |
The LINEST
function returns ten statistical parameters for a
simple linear regression:
o Regression coefficients for a1 and a0
o Standard error values for coefficients
o r2 and Standard error of the Y estimate
o F observed value
o Degrees of freedom
o Regression sum of squares
o Residual sum of squaresThe LINEST function returns an array of values
so that it must be entered as an array formula.
Rather than use the entire array, the user can
return individual LINEST parameters to a cell by
using the Index function, as shown in this formula
to return the F value:
=INDEX(LINEST(y,x,1,1),r,c)
Where:
r = row of LINEST array (4 for F value)
c = column of LINEST array (1 for F value)
|
 |
| Analysis Toolpak
Regression Procedure |
Microsoft's Analysis Toolpak add-in includes a
number of advanced analysis data analysis tools,
including Regression. The Regression procedure
provides regression statistics, ANOVA, regression
coefficients, their standard errors, t stat's, p
values and upper and lower confidence values. The
Regression procedure results are comparable to
output from statistical packages. |
 |
| Combination of
Excel Statistical Functions |
In
addition to the 10 regression measures provided by
LINEST, Excel provides T and F value test functions
so that users can reproduce the Analysis Toolpak
Regression procedure results in a series of cell
formulas that use a combination of LINEST, T and F
test functions.
The picture to the right shows a screen print of
an worksheet that allows the user to interactively
select the regression period, then, prepares the
regression analysis using a combination of Excel
LINEST, T and F tests functions as well as
conditional formatting to evaluate the regression. |
 |
Link to Excel
Regression Tools workbook
Top |
|
Interactive Regression
Link to
Interactive Regression Workbook This post shows how to
build an interactive regression worksheet that allows you to
adjust X value start and end points, automatically
compute regression values using Excel LINEST function,
assess goodness of fit, plot the results, display the
regression line, r2 and significance of F test on
the plot.
This
workbook provides equivalent regression results to the
Analysis Toolpak Regression procedure using standard Excel
functions and formulas.
We will use the same data set we used for the Excel
Regression Tools post, annual
average temperatures, oC, in Vaexjoe, Sweden between 1918 and
2001. These annual values have been calculated from daily
observations archived by the European Climate Assessment and Dataset (ECAD)
site.
The data and interactive regression tool are available in the
Interactive_Regression.xls at this
link.
Regression Input and Output: The interactive
regression input and output is shown in the figure below.
Scroll bars are used to select start and end years. The
output uses Excel LINEST function to calculate regression
properties. The chart shows all data points in grey and the
selected period in dark blue. The display shows the
regression formula, r2 as well as significance of F test.
Dynamic Named Ranges: The workbook includes two
active sheets, Data and Interactive_Regress. The data sheet is organized
simply as a time series with year in Column A and mean
annual temperature in Col. B. Dynamic range names are used
to simplify the interactive features:
| Dynamic Range
Name |
Comment |
Name Refers to Formula |
| all_x |
Includes all original data X
values |
=OFFSET(Data!$A$2,0,0,COUNTA(Data!$A$2:$A$1000)) |
| all_y |
Includes all original Y values |
=OFFSET(all_x,0,1) |
| end_row |
Calculated with Match formula |
='Interactive Regr'!$N$4 |
| end_year |
Last year for user selected
period - Obtained from Scroll bar link cell |
='Interactive Regr'!$M$4 |
| slope |
Calculated with Slope function |
='Interactive Regr'!$B$14 |
| st_row |
Calculated with Match formula |
='Interactive Regr'!$N$3 |
| st_year |
First year for user selected
time period - Obtained from scroll bar link cell |
='Interactive Regr'!$M$3 |
| x |
Dynamic range of X values |
=OFFSET(Data!$A$1,st_row-1,0,end_row-(st_row-1),1) |
| y |
Dynamic range of Y values |
=OFFSET(x,0,1) |
As the user moves the start and end scroll bars, the X
and Y dynamic ranges are automatically adjusted; the
regression calculations and chart are also automatically
adjusted.
Regression Calculations: The Interactive
Regression workbook uses Excel's LINEST, FDIST, TDIST and
TINV functions to provides the key regression and
ANOVA parameters. The parameters, Excel functions and cell
formulas are shown below.
|
Regression Functions and Formulas |
|
Parameter |
Excel Function |
Formula |
|
r2 |
LINEST |
=INDEX(LINEST(y,x,1,1),3,1) |
|
SE |
LINEST |
=INDEX(LINEST(y,x,1,1),3,2) |
|
Slope |
LINEST |
=INDEX(LINEST(y,x,1,1),1,1) |
|
Slope SE |
LINEST |
=INDEX(LINEST(y,x,1,1),2,1) |
|
Intercept |
LINEST |
=INDEX(LINEST(y,x,1,1),1,2) |
|
Intercept SE |
LINEST |
=INDEX(LINEST(y,x,1,1),2,2) |
|
SSregr |
LINEST |
=INDEX(LINEST(y,x,1,1),5,1) |
|
SS residuals |
LINEST |
=INDEX(LINEST(y,x,1,1),5,2) |
|
DF residuals |
LINEST |
=INDEX(LINEST(y,x,1,1),4,2) |
|
F |
LINEST |
=INDEX(LINEST(y,x,1,1),4,1) |
|
Signif of F |
FDIST |
=FDIST(E9,1,B10) |
|
Slope p value |
TDIST |
=TDIST(ABS(D14),B10,2) |
|
Intercept p value |
TDIST |
=TDIST(ABS(D13),B11,2) |
|
Slope Lower 95% CL |
TINV |
=B14-TINV(B1,B10)*C14 |
|
Slope Upper 95% CL |
TINV |
=B14+TINV(B1,B10)*C14 |
|
Intercept Lower 95% CL |
TINV |
=B13-(TINV(B1,B10)*C13) |
|
Intercept Upper 95% CL |
TINV |
=B13+TINV(B1,B10)*C13 |
Chart: The chart shows all data as well as the
dynamic X and Y ranges. This way, the user can evaluate
trends in a portion of the data set and still see the entire
data set. The regression line, r2 and significance of F are
reported on the chart as well as the start and end years.
Link to
Interactive Regression Workbook
Top |
Segmented - Piecewise Regression
" Linear regression is done to detect a mathematical
relationship between several series of measured things...
Linear regression can be done with all the data .. or by
parts (segments) of the data base. The segments are
introduced to see if there are abrupt changes in the
relation under investigation. ..The boundaries between the
segments may be called breakpoints."
Wikipedia
The Fargo, ND chart shows a segmented regression analysis
of the trend in beginning day of snowmelt prepared by Tamino
for his
Open Mind blog.
Tamino described an iterative process that he used to
find the best fit for his segmented regression. I wanted to
see if I could create an automated way to find the best
breakpoint using Excel. This post shows both an interactive
method as well as a Solver based method.
This
link
provides a tutorial by Sandra Ryan and Laurie Porth of the
USDA. In their example, they show regression lines for two
segments and how to calculate the breakpoint, C, where the
two lines meet.

Point C is the intersection, hinge point or breakpoint
between the two regression lines. We want to find C so that
we can define the left and right regressions.
Ryan and Porth solution is reproduced below:
Left Line:
Y1 = a1+b1X
for X <= C
Right Line:
Y2 = a2 + b2X
for X > C
For the left and right regressions to be continuous at
point C, Y1 = Y2.
a1 + b1C = a2 +
b2C
a2 = a1 + C(b1 - b2)
We can use either an iterative approach or an
minimization approach to find the value of C that generates
the lowest least squares value.
I have created a workbook that calculates segmented -
piecewise regression using both an interactive approach and
a minimization approach, with Solver. This
link provides a
zip copy of the Seg_Piecewsie_regr.xls file.
Top |
Multiple Period Regression
There are situations where you want to look at regressions
for multiple periods over your entire data set.
This Excel based chart shows an example of a multiple period
regression analysis, in this case the Northern Hemisphere
temperature anomaly - F for the period 1880 - 2006. Trend
lines have been added to compare trends over the 126 year
period.
 |
|
Trend Line Change Rate & r2 |
|
Period |
Change Rate oF/Year |
r2 |
| 1880
- 1921 |
0.007 |
0.11 |
| 1922
- 1938 |
0.036 |
0.44 |
| 1939
- 1972 |
-
0.017 |
0.38 |
| 1972 -
2006 |
0.054 |
0.79 |
|
|
There have been at least 4 distinct N Hemisphere temperature
anomaly periods in the 1880 - 2006 period. The rate of change has
varied from a low to -0.017 oF per year in the 1939 -
1972 period to a high of 0.054 oF in the 1973 - 2006
period.
The Excel workbook is available
here.
Top |
|
CuSum and
Change Point Analysis CuSum charts and change point analysis provide comparative
information that can be useful in analysis of trend data to find potential
trend shifts in several series. Dr. Wayne Taylor (
Variation.Com) has developed an Excel add-in that combines the use of
CuSum charts and a bootstrapping technique to compute 1,000 or
more iterations of the CuSum chart. The analysis output
provides information on changes in trend means.
As an example, I conducted a change point analysis of 5 Swiss
temperature station data sets. The 5 Swiss station CuSum charts with
noted change point years are shown to the right. Notice
that all 5 series had change points in 1988. Geneva had change
points 4 years, 1920, 1943, 1962 in addition to 1988. Lugano
had change points in 1919, 1940, and 1988, Saentis had
change points in 1920 and 1988, and Zurich had change points in
1943, 1951 as well as 1988.
While the 5 stations are physically quit distant, and the
temperature values also differ, the patterns of the 5 CuSum charts
are comparable. For example, the 1920 change points in Saentis and
Geneva are close to the 1919 change point for Lugano. Zurich's 1919
- 1920 CuSum chart shows a change in direction, it is simply to
small a change to be classified as a change point.
This table shows change point years for each station.
5 Swiss Station Temperature Series (1901 - 2004)
Change Point Analysis Summary: |
|
Change Pt Year |
Basel |
Geneva |
Lugano |
Saentis |
Zurich |
|
1919 |
|
|
Y |
|
|
|
1920 |
|
Y |
|
Y |
|
|
1940 |
|
|
Y |
|
|
|
1943 |
|
y |
|
|
Y |
|
1951 |
|
|
|
|
Y |
|
1962 |
|
Y |
|
|
|
|
1988 |
Y |
Y |
Y |
Y |
Y |
|
No CP Yrs |
1 |
4 |
3 |
2 |
3 |
|
The Excel workbook, with the CuSum and change point analysis for these 5 Swiss stations,
is available
at this link.
Top |
Time Series Analysis Resources and Links
| Link |
Author |
Description |
|
MM |
Michael R Middletown |
Data Analysis Using Microsoft Excel
 |
|
HB |
Harvey J Brightman |
Data Analysis in Plain English with Microsoft
Excel |
|
CP |
Physics Dept., Clemson Univ. |
Linear Regression and Excel - Simple
tutorial on how to calculated regression statistics
in Excel. |
|
SSC |
Statistical ServiceCentre - Univ
of Reading |
Using Excel for Statistics - SSC is an
excellent source of information on statistical
analysis and the use of Excel for statistical
analysis. |
|
TM |
Tushar Mehta |
Trendline Coefficients - an excellent
paper that covers Excel regression from A to Z.
Great place to learn or relearn about Excel
regression |
|
BL1 |
Bernard Liengme |
Regression Analysis - Confidence Interval of
the Line of Best Fit - post and downloadable
workbook show you how to calculate and plot
confidence interval for predicted Y value for a
given value of the independent variable X |
|
BL2 |
Bernard Liengme |
Regression Analysis - Confidence Interval
of the Slope - post and downloadable workbook show
you how to calculate the confidence interval for the
slope of your regression. |
Top |
|