seasonal 1.3: A Better Way to Seasonal Adjustment Diagnostics

The R package seasonal makes it easy to use X-13ARIMA-SEATS, the seasonal adjustment software by the United States Census Bureau. Thanks to the x13binary package, installing it from CRAN is now as easy as installing any other R package:

install.packages("seasonal")

The latest version 1.3 comes with a new udg function and a customizable summary method, which give power users of X-13 a convenient way to check the statistics that are of their interest. For a full list of changes, see the package NEWS.

A generalized way to access diagnostics

Version 1.3 offers a generalized way to access diagnostic statistics. In seasonal, it was always possible to use all options of X-13 and access all output series. Now it is easy to access all diagnostics as well.

The main new function is udg, named after the X-13 output file which it is reading. Consider a simple call to seas (the main function of the seasonal package) that uses the X-11 seasonal adjustment method:

m <- seas(AirPassengers, x11 = "")
udg(m)

The udg function returns a list containing 357 named diagnostics. They are properly type-converted, so they can be directly used for further analysis within R.

If we ask for a specific statistic, such as the popular X-11 M statistics, the result will be simplified to a numeric vector (see ?udg for additional options):

udg(m, c("f3.m01", "f3.m02", "f3.m03", "f3.m04"))

## f3.m01 f3.m02 f3.m03 f3.m04 
##  0.041  0.042  0.000  0.283

There are also some new wrappers for commonly used statistics, such as AICBIC, logLik or qs, which use the udg function.

A customizable summary

The new functionality paves the way for a customizable summary method for seas objects. For example, if we want to add the M statistics for X-11 adjustments to the summary, we can write:

summary(m, stats = c("f3.m01", "f3.m02", "f3.m03", "f3.m04"))

## Call:
## seas(x = AirPassengers, x11 = "")
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## Weekday           -0.0029497  0.0005232  -5.638 1.72e-08 ***
## Easter[1]          0.0177674  0.0071580   2.482   0.0131 *  
## AO1951.May         0.1001558  0.0204387   4.900 9.57e-07 ***
## MA-Nonseasonal-01  0.1156204  0.0858588   1.347   0.1781    
## MA-Seasonal-12     0.4973600  0.0774677   6.420 1.36e-10 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## 
## X11 adj.  ARIMA: (0 1 1)(0 1 1)  Obs.: 144  Transform: log
## AICc: 947.3, BIC: 963.9  QS (no seasonality in final):    0  
## Box-Ljung (no autocorr.): 26.65   Shapiro (normality): 0.9908  
## f3.m01: 0.041  f3.m02: 0.042  f3.m03: 0  f3.m04: 0.283 

Note the new line at the end, which shows the M statistics.

If we want to routinely consider these statistics in our summary, we can set the seas.stats option accordingly:

options(seas.stats = c("f3.m01", "f3.m02", "f3.m03", "f3.m04"))

This will change the default behavior, and

summary(m)

will return the same output as above. To restore the default behavior, set the option back to NULL.

options(seas.stats = NULL)

Like peanut butter and jelly: x13binary and seasonal

This post was written by Dirk Eddelbuettel and Christoph Sax and posted by both author’s respective blogs.

The seasonal package by Christoph Sax brings a very featureful and expressive interface for working with seasonal data to the R environment. It uses the standard tool of the trade: X-13ARIMA-SEATS. This powerful program is provided by the statisticians of the US Census Bureau based on their earlier work (named X-11 and X-12-ARIMA) as well as the TRAMO/SEATS program by the Bank of Spain. X-13ARIMA-SEATS is probably the best known tool for de-seasonalization of timeseries, and used by statistical offices around the world.

Sadly, it also has a steep learning curve. One interacts with a basic command-line tool which users have to download, install and properly reference (by environment variables or related means). Each model specification has to be prepared in a special ‘spec’ file that uses its own, cumbersome syntax.

As seasonal provides all the required functionality to use X-13ARIMA-SEATS from R — see the very nice seasonal demo site — it still required the user to manually deal with the X-13ARIMA-SEATS installation.

So we decided to do something about this. A pair of GitHub repositories provide both the underlying binary in a per-operating system form (see x13prebuilt) as well as a ready-to- use R package (see x13binary) which uses the former to provide binaries for R. And the latter is now on CRAN as package x13binary ready to be used on Windows, OS-X or Linux. And the seasonal package (in version 1.2.0 – now on CRAN – or later) automatically makes use of it. Installing seasaonal and x13binary in R is now as easy as:

install.packages("seasonal")

which opens the door for effortless deployment of powerful deasonalization. By default, the principal function of the package employs a number of automated techniques that work well in most circumstances. For example, the following code produces a seasonal adjustment of the latest data of US retail sales (by the Census Bureau) downloaded from Quandl:

library(seasonal) 

url <- "https://www.quandl.com/api/v3/datasets/USCENSUS/BI_MARTS_44000_SM.csv?order=asc"
rs <- ts(read.csv(url)$Value/1e3, start = c(1992, 1), frequency = 12)

m1 <- seas(rs)

plot(m1, main = "Retail Trade: U.S. Total Sales", 
     ylab = "USD (in Billions)")

This tests for log-transformation, performs an automated ARIMA model search, applies outlier detection, tests and adjusts for trading day and Easter effects, and invokes the SEATS method to perform seasonal adjustment. And this is how the adjusted series looks like:

USRetailSales

Of course, you can access all available options of X-13ARIMA-SEATS as well. Here is an example where we adjust the latest data for Chinese exports (as tallied by the US FED), taking into account the different effects of Chinese New Year before, during and after the holiday:

url <- "https://www.quandl.com/api/v3/datasets/FRED/VALEXPCNM052N.csv?order=asc"
xp <- ts(read.csv(url)$VALUE/1e9, start = c(1981, 1), frequency = 12)

m2 <- seas(window(xp, start = 2000), 
  xreg = cbind(
    genhol(cny, start = -7, end = -1, center = "calendar"),
    genhol(cny, start = 0, end = 7, center = "calendar"), 
    genhol(cny, start = 8, end = 21, center = "calendar")),
  regression.aictest = c("td", "user"),
  regression.usertype = "holiday")

plot(m2, main = "Goods, Value of Exports for China", 
     ylab = "USD (in Billions)")

which generates the following chart demonstrating a recent flattening in export activity measured in USD.

ChineseExports

We hope this simple examples illustrates both how powerful a tool X-13ARIMA-SEATS is, but also just how easy it is to use X-13ARIMA-SEATS from R now that we provide the x13binary package automating its installation.

 

www.seasonal.website

Shiny-based Online Tool for X-13 Seasonal Adjustment: New Features

The R package seasonal makes it easy to use X-13ARIMA-SEATS, the seasonal adjustment software by the U.S. Census Bureau. In a previous post, I wrote about www.seasonal.website, a Shiny-based website showcasing the use of seasonal. Even if you are not using R, the website allows you to upload and adjust your own series, without the need for any software installation.

The latest version of www.seasonal.website comes with several new features:

Live Parsing of X-13 spc Files

The main new feature is a live parser of X-13 spc files. Changes in the Options, triggered by the pull-down menus, or changes in the R Call, are reflected in an updated X-13 Call. On the other hand, changes in the X-13 Call will be reflected in updates in the Options and the R Call.

manipulate the X-13 spec file

Interactively manipulate the X-13 spec file or the R call

This brings interesting new possibilities:

  • Non-R-users may use the website to generate spc files, which they can use in any software that includes X-13ARIMA-SEATS.
  • People familiar with X-13 may use the spc syntax to learn about the syntax of the R-package seasonal.
  • People familiar with the R-package seasonal may use it learn about the spc syntax.

New Upload/Download Dialog

The upload/download feature has been reworked. A button on the top-right corner opens a new upload and download dialog.

New upload/download dialog

New upload/download dialog

Both XLSX and CSV formats are supported. You can upload and adjust your own monthly or quarterly time series. All data will be permanently deleted after your session.

Nice Summary

The summary, previously just the printed output of the R-function summary, has been overhauled. Colored flags indicate the significance level of the coefficients, reddish colors indicate warning signs from the tests.

New Summary

New Summary

Seasonal Adjustment, Online

New Online Tool for Seasonal Adjustment

Seasonal adjustment of time series can be a hassle. The softwares used by statistical agencies (X-13, X-12, TRAMO-SEATS) have tons of fantastic options, but the steep learning curve prevents users from taking advantage of the functionality of these packages, or from using them at all.

The R package seasonal simplifies the task by providing an interface to X-13, the newest seasonal adjustment software by the US Census Bureau. It combines and extends the capabilities of the older X-12ARIMA and TRAMO-SEATS software packages. The most simple use of seasonal requires the application of the main function to a time series, which invokes automated procedures that work well in many circumstances:

seas(AirPassengers)

A new shiny based website is showcasing the use of seasonal and allows for online adjustment of time series, without the need to download and install seasonal. The AirPassengers series is set as the default series, but can be replaced by any uploaded series. There are other demo series that show the use of the software to adjust Indian Diwali or Chinese New Year effects.

The site allows to adjust most parameters of X-13, and to view and download a substantial part of its output. Frequently used options can be chosen from a drag and drop menu, while less often used options can be chosen by manipulating the R-Call itself.

Here are some of the most interesting features of the website:

Frequently Used Options

Frequently used options of X-13 can be modified using the drop down selectors. Each change will result in a re-estimation of the seasonal adjustment model. The R-call, the output and the summary are updated accordingly.

Frequently Used Options

Choosing the Output

A substantial part of the output of X-13ARIMA-SEATS can be shown on the website. Click and drag to zoom into the graph. Double click to restore the original view.

A substantial part of the output of X-13ARIMA-SEATS can be shown on the website.

Manipulating the R-Call

The R-Call to seasonal can be modified and run online. In the picture below, the ARIMA model has been adjusted to include an autoregressive parameter of order 2. Press the button to execute the modified call.

Manipulating the R Call

Upload and Download

User defined series can be uploaded, importing from Excel or CSV. Also, all viewable series can be downloaded as Excel or CSV.

Upload and Download

Chinese New Year, Indian Diwali

Chinese New Year or Indian Diwali support is included out of the box and can be selected from the drop down menu. Adjustment for these holidays is as easy as adjusting Easter effects.

Adjusting Chinese New Year or Indian Diwali Effects

Running X-13 Examples Online

The examples from the official manual of X-13 can be run directly on the website. The collection of examples can found here.

Examples of X 13ARIMA SEATS in R

Try it out!

Adjusting Chinese New Year Effects in R is Easy

The Spring Festival is the most important holiday in China and many other Asian countries. Traditionally, the holiday starts on Chinese New Year’s Eve, and lasts to the Lantern Festival on the 15th day of the first month of the lunisolar calendar. The Chinese New Year is celebrated either in January or in February of the Gregorian calendar.

Because of its importance, Chinese New Year seriously distorts monthly time series, which are usually reported according to the Gregorian calendar. Unlike Easter, Chinese New Year does not affect quarterly time series, as it always falls in the first quarter.

The standard software packages for seasonal adjustment, X-12-ARIMA and X-13-ARIMA-SEATS (developed by the U.S. Census Bureau) or Tramo Seats (developed by the Bank of Spain) have a built-in adjustment procedure for Easter holiday, but not for Chinese New Year. However, all packages allow for the inclusion of user defined variables, and the Chinese New Year can be modeled as such.

The R package seasonal

With the R package seasonal, generating and including such a series is easy. We will use it in the following to seasonally adjust and remove Chinese New Year effects from the nominal dollar value of imports to China. seasonal is an interface to X-13ARIMA-SEATS; for more information and installation details, see here.

Chinese imports are included as an example series in seasonal. As the series has a very different seasonal pattern before 2000, we focus on the later period. (Adjusting the whole series in one step is possible, but for good results one should manually model the seasonal break.)

library(seasonal)
data(cntrade)  # contains imports ('imp') and exports ('exp') of China
imp <- window(imp, start = 2000)  # this shortens the series

seasonal includes the genhol() function, a R version of the equally named software utility by the U.S. Census Bureau. Using the dates of the Chinese New Year as an input, it produces a time series with the deviations from the monthly means. Here we are assuming that the holiday starts on New Year’s Eve and lasts for one week.

data(holiday)  # dates of Chinese New Year and Easter, included in seasonal
cny.ts <- genhol(cny, start = -1, end = 6, center = "calendar")

In 2014, only two days in January were affected by the holiday (New Year’s Eve and New Year’s Day). 75% of the holiday fell into February. Thus, January was affected slightly less than average, February slightly more. This is very different from 2012, when the holiday completely fell into January.

       Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
2011 -0.26  0.26  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
2012  0.74 -0.74  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
2013 -0.26  0.26  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
2014 -0.01  0.01  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00

Including user defined regressors

The time series cny.ts can be included in the main seasonal adjustment. The automated procedures of X-13ARIMA-SEATS can be applied to the imp series in the following way:

m1 <- seas(imp, xreg = cny.ts, regression.usertype = "holiday", x11 = list())
summary(m1)

## 
## Call:
## seas(x = imp, xreg = cny.ts, regression.usertype = "holiday", 
##     x11 = list())
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## cny.ts            -0.18104    0.01548  -11.70  < 2e-16 ***
## Weekday            0.00514    0.00104    4.94  7.8e-07 ***
## LS2008.Nov        -0.37584    0.04745   -7.92  2.3e-15 ***
## MA-Nonseasonal-01  0.39776    0.07202    5.52  3.3e-08 ***
## MA-Seasonal-12     0.72749    0.06428   11.32  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ARIMA structure: (0 1 1)(0 1 1)   Number of obs.: 169   Transform: log
## AICc: 1.6e+03, BIC: 1.62e+03   QS seas. test (adj. series):   0  
## Box-Ljung (no autocorr.): 33.6 .  Shapiro (normality): 0.978 **

With xreg, arbitrary user defined regressors can be included, regression.usertype = "holiday" ensures that the final series does not include the regression effect. We also have chosen X11 as the decomposition method.

Unsurprisingly, the summary reveals a highly significant Chinese New Year effect. As the automatic model has been estimated on the logarithmic series, the coefficient of -0.18 indicates that New Year in 2012 has lowered imports by approximately 0.74 * 18 = 13%. The automatic procedure has also detected weekday effects and a level shift during the financial crisis.

Multiple regressors

We can do even better by using more than one user defined regressors, one for the pre-New-Year period and one for the post-New-Year period (thanks, Freya Beamish):

pre_cny <- genhol(cny, start = -6, end = -1, frequency = 12, center = "calendar")
post_cny <- genhol(cny, start = 0, end = 6, frequency = 12, center = "calendar")
m2 <- seas(x = imp, xreg = cbind(pre_cny, post_cny), regression.usertype = "holiday", 
           x11 = list())
summary(m2)

## 
## Call:
## seas(x = imp, xreg = cbind(pre_cny, post_cny), regression.usertype = "holiday", 
##     x11 = list())
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## pre_cny            0.070843   0.019199    3.69  0.00022 ***
## post_cny          -0.241043   0.020816  -11.58  < 2e-16 ***
## Weekday            0.005233   0.000943    5.55  2.9e-08 ***
## LS2008.Nov        -0.357887   0.045790   -7.82  5.5e-15 ***
## MA-Nonseasonal-01  0.331626   0.073967    4.48  7.3e-06 ***
## MA-Seasonal-12     0.687479   0.065740   10.46  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ARIMA structure: (0 1 1)(0 1 1)   Number of obs.: 169   Transform: log
## AICc: 1.59e+03, BIC: 1.61e+03   QS seas. test (adj. series):0.75  
## Box-Ljung (no autocorr.): 37.6 *  Shapiro (normality): 0.984 *

plot(m2)
adjusted and unadjusted series

Chinese imports: adjusted and unadjusted series

There are actually two kind of New Year effects: Until New Year’s Eve, import activity is higher than usual. During the holiday, it is lower. By including two regressors, these opposite effects can be modeled. Note that the negative effect is more pronounced than the positive one.

Manual refinements

The model could be further refined. With the static() function, a non-automatic version of the previous call can be extracted. It can be copy-pasted and used for further manipulations.

static(m2)

## seas(x = imp, xreg = cbind(pre_cny, post_cny), regression.usertype = "holiday", 
##     x11 = list(), regression.variables = c("td1coef", "ls2008.Nov"
##     ), arima.model = "(0 1 1)(0 1 1)", regression.aictest = NULL, 
##     outlier = NULL, transform.function = "log")

The inspect() function opens an interactive window that allows for the manipulation of a number of arguments. With each change, the adjustment process and the visualizations are recalculated. (This only works with R Studio.)

inspect(m)

After some playing around, we would probably stay with the two regressor adjustment model from above:

m2 <- seas(x = imp, xreg = cbind(pre_cny, post_cny), regression.usertype = "holiday", 
           x11 = list(), regression.variables = c("td1coef", "ls2008.Nov"), 
           arima.model = "(0 1 1)(0 1 1)", regression.aictest = NULL, 
           outlier = NULL, transform.function = "log")

It’s far form perfect. Normality statistics are bad, and there may be some traces of autocorrelation. On the other hand, the seasonal component is stable and revisions are small.

Comparing the series

Was it worth the pain? The following graph shows the same seasonal adjustment with and without the Chinese New Year adjustment:

m3 <- seas(x = imp, x11 = list(), regression.variables = c("td1coef", "ls2008.Nov"), 
           arima.model = "(0 1 1)(0 1 1)", regression.aictest = NULL, outlier = NULL, 
           transform.function = "log")

ts.plot(diff(log(cbind(final(m2), final(m3)))), col = c("red", "blue"), 
        lwd = c(2, 1))
Comparison of adjusted and unadjusted time series

Not adjusting Chinese New Year seriously distorts the time series

In 2012, we would have concluded that imports have plumped in January, soared in February and plumped again in March (blue line). With the adjustment, we rightly conclude that there was no such craziness (red line).

ts.plot(final(m2), final(m1), col = c("red", "blue"), lwd = c(2, 1))
Stagnating Imports

Chinese imports have stagnated this January

How useful is the two regressor model? Most of the time, the single regressor model performs reasonably well and leads to results similar to the two regressors model. This year, however, the Lunar New Year fell on January 31. As people were importing more in the pre-New-Year period, January imports were actually affected by a positive New Year effect. The right adjustment would be to correct the numbers downward! With the one regressor model, we would wrongly conclude that imports have soared (blue line). In fact, they have actually stagnated (red line).

Quickly Explore the Penn World Tables in R

The Penn World Tables are one of the greatest source of worldwide macroeconomic data, but dealing with its web interface is somewhat cumbersome. Fortunately, the data is also available as a R package on CRAN. Having some tools at hand to quickly draw some nice graphs from the data is useful in many circumstances. Here are three exciting packages that help you to quickly explore the PWT data in R. All of them are on CRAN.

First, get the data (in PWT 8, you need to calculate per capita measures yourself):

library(pwt8)
pwt8.0$cap <- pwt8.0$rgdpe / pwt8.0$pop

1. ggplot2

Use ggplot2 to draw time series for a variable. Exchange the isocodes in the list or the variable name to alter the graph.

library(ggplot2)
dta <- subset(pwt8.0, isocode %in% c("USA", "JPN", "KOR", "CHN"))

qplot(dta$year,
      dta[, 'cap'], 
      geom = "line", 
      group = dta$isocode,
      color = as.factor(dta$isocode)
     ) + 
theme(legend.title = element_blank()) +
scale_y_log10()

and that’s what you will get:

ggplot

2. manipulate

Use ggplot2 and manipulate to be even more flexible. (You need RStudio for that, but you should have it anyway!)

library(manipulate)
dta <- subset(pwt8.0, isocode %in% c("USA", "JPN", "KOR", "CHN"))

manipulate(
  qplot(dta$year,
        dta[, vars], 
        geom = "line", 
        group = dta$isocode,
        color = as.factor(dta$isocode)
       ) + theme(legend.title = element_blank()),
 vars = picker(as.list(colnames(pwt8.0)[-(1:3)]))
)

you will get a nice menu to choose the variable of your interest.

manipulate

3. googleVis

googleVis let’s you use the Google Visualisation API.

library(googleVis)

First, get some pretty colors from colorbrewer2.org:

colorbrewer <- "{maxValue: 40000, colors:['#F7FCF0', '#E0F3DB', '#CCEBC5', '#A8DDB5', '#7BCCC4', '#4EB3D3', '#2B8CBE', '#0868AC', '#084081']}"

And now draw a map with your data (make a screenshot if you want to use it)

plot(gvisGeoChart(subset(pwt8.0, year == 2011), "country", "cap",
                   options=list(colorAxis = colorbrewer)))

That’s the result:

gvis1

Or simply show all data, in a very similar way as the Google Public Data Explorer does.

plot(gvisMotionChart(pwt8.0, idvar="country", timevar="year")))

gvis2