ProcessTrends.Com
Home
Chart Gallery
Video Tutorials
Trend Analysis
Using R
Data Visualization
 Global Warming
Map Excel Data with Google Earth
Chart Doctor
Advanced Chart Techniques
Links
What's New
Downloads
  About      
 Search Site RSS
   04/4/2008 4:30 PM

 

Trend Analysis With Excel
 
Trend/ Control Charts Excel Regression Tools Interactive Regression Segmented - Piecewise Regression Temperature Trends Multiple Period Regression CuSum and Change Point Analysis Time Series Analysis Resources and Links
Overview of Excel Regression Tools                        Link to workbook

Excel provides a number of functions that allow users to perform regression analysis of times series/ trend data.  This post shows  6 ways to use Excel for regression analysis, from simple to advanced:

1. Add trend  line with display of regression and r2 to chart
2. Prepare manual calculations
3. Use Excel Functions (INTERCEPT, SLOPE, RSQ)
4. Use Excel Function - LINEST
5. Use Excel's Analysis Toolpak add-in
6. Use Combination of Excel Statistical Functions to get equivalent report as Analysis Toolpak

We will use the same data set for each approach, annual average temperatures,  oC,  in Vaexjoe, Sweden between 1918 and 2001. These annual values have been calculated from daily observations archived by the European Climate Assessment and Dataset (ECAD) site.

The  data and exhibits are all available in the Excel_Regression_tools.xls at this link.

Excel Regression Approaches
Approach Description Example - (Click to enlarge)
Chart with Trend Line The simplest way to get the regression formula for your data is to create a simple XY chart and to add the Trendline formula and r2 values from the Options dialogue.

This simple technique only provides the equation and r2 values as text on the chart. If you want to use the equation or conduct significance test, you need to use one of the other techniques.

Manual Calculations Since Excel can reproduce just about any calculation that you want, you can add the necessary statistical formulas from a statistical text and produce any statistical parameter you want.

While this technique gives you maximum flexibility, you may want to evaluate Excel's built-in statistical functions before you re-invent the wheel

Excel's SLOPE, INTERCEPT and RSQ Functions Excel's built-in SLOPE, INTERCEPT and RSQ functions allow the user to calculate these values directly.

The functions return the same results as adding the trendline to the chart technique. In this case, however, the statistical parameters are in a user specified cell so they can be used for further analysis.

Excel's LINEST Function The LINEST function returns ten statistical parameters for a simple linear regression:
 o Regression coefficients for a1 and a0
 
o Standard error values for coefficients
 o r2 and Standard error of the Y estimate
 o F observed value
 o Degrees of freedom
 o Regression sum of squares
 o Residual sum of squares

The LINEST function returns an array of values so that it must be entered as an array formula. Rather than use the entire array, the user can return individual LINEST parameters to a cell by using the Index function, as shown in this formula to return the F value:

     =INDEX(LINEST(y,x,1,1),r,c)
Where:
       r = row of LINEST array (4 for F value)
       c = column of LINEST array (1 for F value)

 

Analysis Toolpak Regression Procedure Microsoft's Analysis Toolpak add-in includes a number of advanced analysis data analysis tools, including Regression. The Regression procedure provides regression statistics, ANOVA, regression coefficients, their standard errors, t stat's, p values and upper and lower confidence values.

The Regression procedure results are comparable to output from statistical packages.

Combination of Excel Statistical Functions In addition to the 10 regression measures provided by LINEST, Excel provides T and F value test functions so that users can reproduce the Analysis Toolpak Regression procedure results in a series of cell formulas that use a combination of LINEST, T and F test functions.

The picture to the right shows a screen print of an worksheet that allows the user to interactively select the regression period, then, prepares the regression analysis using a combination of Excel LINEST, T and F tests functions as well as conditional formatting to evaluate the regression. 

Link to Excel Regression Tools workbook

Top

Interactive Regression                               Link to Interactive Regression Workbook

This post shows how to build an interactive regression worksheet that allows you to adjust X value start and end points, automatically compute regression values using Excel LINEST function, assess goodness of fit, plot the results, display the regression line, r2 and significance of F test on the plot. This workbook provides equivalent regression results to the Analysis Toolpak Regression procedure using standard Excel functions and formulas.

We will use the same data set we used for the Excel Regression Tools post, annual average temperatures,  oC,  in Vaexjoe, Sweden between 1918 and 2001. These annual values have been calculated from daily observations archived by the European Climate Assessment and Dataset (ECAD) site.

The  data and interactive regression tool are available in the Interactive_Regression.xls at this link.

Regression Input and Output: The interactive regression input and output is shown in the figure below. Scroll bars are used to select start and end years. The output uses Excel LINEST function to calculate regression properties. The chart shows all data points in grey and the selected period in dark blue. The display shows the regression formula, r2 as well as significance of F test.

Dynamic Named Ranges: The workbook includes two active sheets, Data and Interactive_Regress. The data sheet is organized simply as a time series with year in Column A and mean annual temperature in Col. B. Dynamic range names are used to simplify the interactive features:

Dynamic Range Name Comment Name Refers to Formula
all_x Includes all original data X values =OFFSET(Data!$A$2,0,0,COUNTA(Data!$A$2:$A$1000))
all_y Includes all original Y values =OFFSET(all_x,0,1)
end_row Calculated with Match formula ='Interactive Regr'!$N$4
end_year Last year for user selected period - Obtained from Scroll bar link cell ='Interactive Regr'!$M$4
slope Calculated with Slope function ='Interactive Regr'!$B$14
st_row Calculated with Match formula ='Interactive Regr'!$N$3
st_year First year for user selected time period - Obtained from scroll bar link cell ='Interactive Regr'!$M$3
x Dynamic range of X values =OFFSET(Data!$A$1,st_row-1,0,end_row-(st_row-1),1)
y Dynamic range of Y values =OFFSET(x,0,1)

As the user moves the start and end scroll bars, the X and Y dynamic ranges are automatically adjusted; the regression calculations and chart are also automatically adjusted.

Regression Calculations: The Interactive Regression workbook uses Excel's LINEST, FDIST, TDIST and TINV  functions to provides the key regression and ANOVA parameters. The parameters, Excel functions and cell formulas are shown below.

Regression Functions and Formulas
Parameter Excel Function Formula
r2 LINEST =INDEX(LINEST(y,x,1,1),3,1)
SE LINEST =INDEX(LINEST(y,x,1,1),3,2)
Slope LINEST =INDEX(LINEST(y,x,1,1),1,1)
Slope SE LINEST =INDEX(LINEST(y,x,1,1),2,1)
Intercept LINEST =INDEX(LINEST(y,x,1,1),1,2)
Intercept SE LINEST =INDEX(LINEST(y,x,1,1),2,2)
SSregr LINEST =INDEX(LINEST(y,x,1,1),5,1)
SS residuals LINEST =INDEX(LINEST(y,x,1,1),5,2)
DF residuals LINEST =INDEX(LINEST(y,x,1,1),4,2)
F LINEST =INDEX(LINEST(y,x,1,1),4,1)
Signif of F FDIST =FDIST(E9,1,B10)
Slope p value TDIST =TDIST(ABS(D14),B10,2)
Intercept p value TDIST =TDIST(ABS(D13),B11,2)
Slope Lower 95% CL TINV =B14-TINV(B1,B10)*C14
Slope Upper 95% CL TINV =B14+TINV(B1,B10)*C14
Intercept Lower 95% CL TINV =B13-(TINV(B1,B10)*C13)
Intercept Upper 95% CL TINV =B13+TINV(B1,B10)*C13

Chart: The chart shows all data as well as the dynamic X and Y ranges. This way, the user can evaluate trends in a portion of the data set and still see the entire data set. The regression line, r2 and significance of F are reported on the chart as well as the start and end years.

Link to Interactive Regression Workbook

Top

Segmented - Piecewise Regression

" Linear regression is done to detect a mathematical relationship between several series of measured things...   Linear regression can be done with all the data .. or by parts (segments) of the data base. The segments are introduced to see if there are abrupt changes in the relation under investigation. ..The boundaries between the segments may be called breakpoints." Wikipedia

The Fargo, ND chart shows a segmented regression analysis of the trend in beginning day of snowmelt prepared by Tamino for his Open Mind blog.

Tamino described an iterative process that he used to find the best fit for his segmented regression. I wanted to see if I could create an automated way to find the best breakpoint using Excel. This post shows both an interactive method as well as a Solver based method.

This link provides a tutorial by Sandra Ryan and Laurie Porth of the USDA. In their example, they show regression lines for two segments and how to calculate the breakpoint, C, where the two lines meet.

Point C is the intersection, hinge point or breakpoint between the two regression lines. We want to find C so that we can define the left and right regressions.

Ryan and Porth solution is reproduced below:

   Left Line:                Y1     = a1+b1X         for X <= C
   Right Line:              Y2   = a2 + b2X        for X >   C

For the left and right regressions to be continuous at point C, Y1 = Y2. 
                                             
 a1 + b1C = a2 + b2C
                                   a2 = a1 + C(b1 - b2)

We can use either an iterative approach or an minimization approach to find the value of C that generates the lowest least squares value.

I have created a workbook that calculates segmented - piecewise regression using both an interactive approach and a minimization approach, with Solver. This link provides a zip copy of the Seg_Piecewsie_regr.xls file.

Top

Multiple Period Regression

There are situations where you want to look at regressions for multiple periods over your entire data set. This Excel based chart shows an example of a multiple period regression analysis, in this case the Northern Hemisphere temperature anomaly - F for the period 1880 - 2006. Trend lines have been added to compare trends over the 126 year period.
 
Trend Line Change Rate & r2
Period

Change  Rate oF/Year

r2
1880 - 1921 0.007 0.11
1922 - 1938 0.036 0.44
1939 - 1972 - 0.017 0.38
1972 - 2006 0.054 0.79

 

There have been at least 4 distinct N Hemisphere temperature anomaly periods in the 1880 - 2006 period. The rate of change has varied from a low to -0.017 oF per year in the 1939 - 1972 period to a high of 0.054 oF in the 1973 - 2006 period.

The Excel workbook is available here.

Top

CuSum and Change Point Analysis

CuSum charts and change point analysis provide comparative information that can be useful in analysis of trend data to find potential trend shifts in several series. Dr. Wayne Taylor ( Variation.Com) has developed an Excel add-in  that combines the use of CuSum charts and a bootstrapping technique to compute 1,000 or more iterations of the CuSum chart.  The analysis output provides information on changes in trend means.

As an example, I conducted a change point analysis of 5 Swiss temperature station data sets. The 5 Swiss station CuSum charts with noted change point years are shown to the right. Notice that all 5 series had change points in 1988. Geneva had change points 4 years, 1920, 1943, 1962 in addition to 1988.  Lugano had change points in 1919,  1940, and 1988,  Saentis had change points in 1920 and 1988, and Zurich had change points in 1943, 1951 as well as 1988.

While the 5 stations are physically quit distant, and the temperature values also differ, the patterns of the 5 CuSum charts are comparable. For example, the 1920 change points in Saentis and Geneva are close to the 1919 change point for Lugano. Zurich's 1919 - 1920 CuSum chart shows a change in direction, it is simply to small a change to be classified as a change point.

 

 

 

 

 


 

This table shows change point years for each station.

5 Swiss Station Temperature Series (1901 - 2004)
Change Point Analysis Summary:
Change Pt Year Basel Geneva Lugano Saentis Zurich
1919     Y    
1920   Y   Y  
1940     Y    
1943   y     Y
1951         Y
1962   Y      
1988 Y Y Y Y Y
No CP Yrs 1 4 3 2 3

The Excel workbook, with the CuSum and change point analysis for these 5 Swiss stations, is available at this link.

Top

Time Series Analysis Resources and Links
Link Author Description

MM

Michael R Middletown Data Analysis Using Microsoft Excel  Data Analysis Using Microsoft Excel: Michael R. Middleton

HB

Harvey J Brightman Data Analysis in Plain English with Microsoft ExcelData Analysis In Plain English with Microsoft Excel

CP

Physics Dept., Clemson Univ.

Linear Regression and Excel - Simple tutorial on how to calculated regression statistics in Excel.

SSC

Statistical ServiceCentre - Univ of Reading

Using Excel for Statistics - SSC is an excellent source of information on statistical analysis and the use of Excel for statistical analysis.

TM

Tushar Mehta

Trendline Coefficients - an excellent paper that covers Excel regression from A to Z. Great place to learn or relearn about Excel regression

BL1

Bernard Liengme

Regression Analysis - Confidence Interval of the Line of Best Fit - post and downloadable workbook show you how to calculate and plot confidence interval for predicted Y value for a given value of the independent variable X

BL2

Bernard Liengme

Regression Analysis - Confidence Interval of the Slope - post and downloadable workbook show you how to calculate the confidence interval for the slope of your regression.

Top