Most of the examples will use a data file called mpg. This data file contains 234 observations and 11 variables from a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 – this was used as a proxy for the popularity of the car. This data set is included in the R package ggplot2.
Details (11 variables)
- displ. engine displacement, in litres
- cyl. number of cylinders
- trans. type of transmission
- drv. f = front-wheel drive, r = rear wheel drive, 4 = 4wd
- cty. city miles per gallon
- hwy. highway miles per gallon
In addition, two new binary variables are created for analysis:
- cyl_six. cyl_six = 1 if cyl >= 6 or cyl_six = 0 if cyl < 6
- drv_front. drv_front = 1 if drv = ‘f’ or drv_front = 0 if drv ne ‘f’
You can get the mpg file as a SAS version data file by clicking here . And for those who use R, you can simply using the following code to generate the data set.
library(ggplot2) # Copy the mpg file into global environment mpg <- mpg # Create new variable cyl_six mpg$cyl_six <- NA mpg[which(mpg$cyl >= 6), ]$cyl_six <- 1 mpg[which(mpg$cyl < 6), ]$cyl_six <- 0 # Create new variable drv_front mpg$drv_front <- NA mpg[which(mpg$drv == 'f'), ]$drv_front <- 1 mpg[which(mpg$drv != 'f'), ]$drv_front <- 0 attach(mpg)