|
Using R for Advanced Charts
When you need to build charts beyond Excel's
capabilities |
|
Should I Use R? (See
comparison of Excel & R Trellis Charts:
video) In pursuing my interests in
data visualization, I have found several authors who have really had an
impact on my graphical analysis thinking:
Edward Tufte
- I first started my data visualization journey when I by read Tufte's
Visual Display of Quantitative Information. While comfortable
with Tufte's concepts, I found I needed more concrete examples and
tools. This lead me to Naomi Robbins's book.
Naomi Robbins'
Creating More Effective Graphs book opened my eyes to a
wide array of practical charting techniques not available in Excel. I was
particularly interested in dot plots and trellis plots. Robbins' book is where I
first learned about the R, S,
and S Plus statistical and
graphics languages. Her chart examples and discussion
lead me to William Cleveland's and Paul Murrell's books.
William Cleveland'
s Elements of Graphing Data
and Visualizing Data
books showed me that I was just scratching the surface of graphical
analysis with default Excel charts. In addition to trellis displays,
Cleveland's books showed me important concepts like
banking to 45o and locally weighted
regression (Lowess).
Paul Murrrell's
R Graphics showed me the power of the R graphics
package.
As I read Robbins, Cleveland and Murrell's books, I saw a number of data analysis
and charting capabilities that I wanted to add to my repertoire:
While trying to make these advanced Excel based
charts, I often asked myself the question "...should I use R
for this chart instead of Excel". Based on Robbins' book, I
knew that
these charts were readily available in R.
Reluctant to take on the R learning curve, I developed several of the
advanced charts in Excel and
VBA. I knew it was taking more time than it would to use a
high quality statistical analysis and graphing package. I felt that it was better to
use a tool that I knew and avoid the learning curve of a package like R.
I made good progress building dot plots, box plots and
banking to 45o tools in Excel. Trellis charts and
Lowess smoothing proved a real challenge in Excel. Trellis charts allow the user to show multivariate data
much more effectively than Excel's default charts. I looked
into R again and considered using R for my trellis charting needs.
Rather than switch to R at that time, I started developing horizontal
and vertical panel charts in Excel, wanting to be able to automate the
development of a full trellis chart.
I searched high and low for a reasonable Lowess
algorithm that I could incorporate into Excel. No luck.
I finally developed a set of VBA procedures to build a
full trellis chart. When I completed the Excel - VBA trellis
chart, I reconsidered R because I now know what it will take for me to develop the
trellis advanced charting that was already available in R.
Why reinvent trellis charts in Excel when they were
readily available in R? To answer this question, I needed to find
out how difficult the R learning curve was, I already knew what it takes
to develop these tools in Excel - VBA.
Trellis Chart in Excel and R
This video shows my Excel VBA trellis chart tool as well
as the results of my 2 day R learning curve.
Based on my Excel - VBA trellis chart development time
versus R trellis chart learning curve time, I will be using R for
graphics that is not part of Excel's standard tools kit. I'd rather use
a proven tool by a R programmer than reinvent my own graphic wheel in
Excel.
I have a long way to go in mastering data analysis and
time series analysis. I'd rather spend my time learning and using
analysis techniques rather than coaxing Excel to do what R can
already do.
R Resources
(top)
R Lattice Beats Excel's
Stacked Area Trend Chart
See Charts & Graphs blog for my post on why R lattice
plots are much more effective than Excel's stacked area trend charts.
Here's the link to the source
data file.
Here's the R script I used.
### Script to work with BP
Oil's Oil Consumption - M Barrels/day Data
# D Kelly O'Day - Oct. 3, 2008; http://processtrends.com & http://chartsgraphs.wordpress.com
library(lattice)
my_ar <- c(1) # User changeable aspect ratio
# *** Please edit for correct path to your source data file
my_data <- read.table("c:/R_home/CG/oil_cons_trend.csv", sep =",",
header=TRUE)
# *****
tp1 <- xyplot(Value ~ Year |Parameter, data = my_data, type = "l",
main="World Oil Consumption Trends - million barrels per day \n 1965 to
2007 ",
sub = "Data Source: BP Statistical Review of World Energy 2008",
xlab="",
ylab="Million barrels per day",
par.settings=list(axis.text=list(cex=0.8), fontsize=list(text=10)),
par.strip.text=list(cex=0.8))
update(tp1, aspect = my_ar)
(top) |