tsbox 0.2: supporting additional time series classes

The tsbox package makes life with time series in R easier. It is built around a set of functions that convert time series of different classes to each other. They are frequency-agnostic, and allow the user to combine time series of multiple non-standard and irregular frequencies. A detailed overview of the package functionality is given in the documentation page (or in a previous blog-post).

Version 0.2 is now on CRAN and provides a larger number of bugfixes. Non-standard column names are now handled correctly, and non-standard column orders are treated consistently.

New Classes

There are two more time series classes supported: tis time series, from the tis package, and irts time series, from the tseries package.

In order to create an object of these classes, it is sufficient to use the appropriate converter.

E.g., for tis time series:

library(tsbox)
ts_tis(fdeaths) 
##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1974  901  689  827  677  522  406  441  393  387  582  578  666 
## 1975  830  752  785  664  467  438  421  412  343  440  531  771 
## 1976  767 1141  896  532  447  420  376  330  357  445  546  764 
## 1977  862  660  663  643  502  392  411  348  387  385  411  638 
## 1978  796  853  737  546  530  446  431  362  387  430  425  679 
## 1979  821  785  727  612  478  429  405  379  393  411  487  574 
## class: tis 

Or for irts time series:

head(ts_irts(fdeaths)) 
## 1974-01-01 00:00:00 GMT 901 
## 1974-02-01 00:00:00 GMT 689 

Conversion works from all classes to all classes, and we can easily convert these objects to any other time series class, or to a data frame:

x.tis <- ts_tis(fdeaths) 
head(ts_df(x.tis)) 
##         time value 
## 1 1974-01-01   901 
## 2 1974-02-01   689 
## 3 1974-03-01   827 
## 4 1974-04-01   677 
## 5 1974-05-01   522 
## 6 1974-06-01   406 

Class-agnostic functions

Because coercion works reliably and is well tested, we can use it to make functions class-agnostic. If a class-agnostic function works for one class, it works for all:

ts_pc(ts_tis(fdeaths)) 
ts_pc(ts_irts(fdeaths)) 
ts_pc(ts_df(fdeaths)) 
ts_pc(fdeaths) 

ts_pc calculates percentage change rates towards the previous period. It works like a ‘generic’ function: You can apply it on any time series object, and it will return an object of the same class as its input.

So, whether we want to smooth, scale, differentiate, chain-link, forecast, regularize or seasonally adjust a series, we can use the same commands to all time series classes. tsbox offers a comprehensive toolkit for the basics of time series manipulation. Here are some additional examples:

ts_pcy(fdeaths)                # pc., comp. to same period of prev. year 
ts_forecast(fdeaths)           # forecast, by exponential smoothing 
ts_seas(fdeaths)               # seasonal adjustment, by X-13 
ts_frequency(fdeaths, "year")  # convert to annual frequency 
ts_span(fdeaths, "-1 year")    # limit time span to final year 

There are many more. Because they all start with ts_, you can use auto-complete to see what’s around. Most conveniently, there is a time series plot function that works for all classes and frequencies:

ts_plot(
   `Airline Passengers` = AirPassengers,
   `Lynx trappings` = ts_tis(lynx),
   `Deaths from Lung Diseases` = ts_xts(fdeaths),
   title = "Airlines, trappings, and deaths",
   subtitle = "Monthly passengers, annual trappings, monthly deaths" 
) 

time series plot

tsbox 0.1: class-agnostic time series

The R ecosystem knows a vast number of time series classes: ts, xts, zoo, tsibble, tibbletime or timeSeries. The plethora of standards causes confusion. As different packages rely on different classes, it is hard to use them in the same analysis. tsbox provides a set of tools that make it easy to switch between these classes. It also allows the user to treat time series as plain data frames, facilitating the use with tools that assume rectangular data.

comic by xkcd

The tsbox package is built around a set of functions that convert time series of different classes to each other. They are frequency-agnostic, and allow the user to combine multiple non-standard and irregular frequencies. Because coercion works reliably, it is easy to write functions that work identically for all classes. So whether we want to smooth, scale, differentiate, chain-link, forecast, regularize or seasonally adjust a time series, we can use the same tsbox-command for any time series class.

This blog gives a short overview of the changes introduced in 0.1. A detailed overview of the package functionality is given in the documentation page (or in a previous blog-post).

Keeping explicit missing values

Version 0.1, now on CRAN, brings a large number of bug fixes and improvements. A substantial change involves the treatment of NA values in data frames. Previously, all NAs in data frames were treated as implicit, and were only made explicit by a call to ts_regular.

This has changed now. If you convert a ts object to a data frame, all NA values will be preserved. To replicate previous behavior, apply the ts_na_omit function:

library(tsbox)
x.ts <- ts_c(mdeaths, austres)
x.ts
ts_df(x.ts)
ts_na_omit(ts_df(x.ts))

ts_span extends outside of series span

This lays the groundwork for ts_span to be extensible. With extend = TRUEts_span extends a regular series with NA values, up to the specified limits, similar to base window. Like all functions in tsbox, this is frequency-agnostic. For example, in the following, the monthly series mdeaths is extended by monthly NA values, while the quarterly series austres is extended by quarterly NA values.

x.df <- ts_df(ts_c(mdeaths, austres))
ts_span(x.df, end = "1999-12-01", extend = TRUE)

ts_default standardizes column names in a data frame

In rectangular data structures, i.e., in a data.frame, a data.table, or a tibble, tsbox stores one or multiple time series in the ‘long’ format. By default, tsbox detects a value, a time and zero, one or several id columns. Alternatively, the time column and the value column can be explicitly named time and value. If explicit names are used, the column order will be ignored.

While automatic column name detection is useful in interactive mode, it produces unnecessary overhead in longer workflows. The helper function ts_default detects and renames the time and the value column, so that auto-detection will be turned off in subsequent steps (note that the names of the id columns are not affected):

x.df <- ts_df(ts_c(mdeaths, austres))
names(x.df) <- c("a fancy id name", "date", "count")
ts_plot(x.df)  # tsbox is fine with that
ts_default(x.df)

ts_summary summarizes time series

ts_summary provides a frequency agnostic summary of a ts-boxable object:

ts_summary(ts_c(mdeaths, austres))
#>        id obs    diff freq      start        end
#> 1 mdeaths  72 1 month   12 1974-01-01 1979-12-01
#> 2 austres  89 3 month    4 1971-04-01 1993-04-01

ts_summary returns a plain data frame that can be used for any purpose. It is also recommended for the extraction of various time series properties, such as start, freq or id:

ts_summary(austres)$id
#> [1] "austres"
ts_summary(austres)$start
#> [1] "1971-04-01"

And a cheatsheet!

Finally, we fabricated a tsbox cheat sheet that summarizes most functionality. Print and enjoy working with time series.

Time Series of the World, Unite!

The R ecosystem knows a ridiculous number of time series classes. So, I decided to create a new universal standard that finally covers everyone’s use case… Ok, just kidding!

tsbox, now freshly on CRAN, provides a set of tools that are agnostic towards existing time series classes. It is built around a set of converters, which convert time series stored as ts, xts, data.frame, data.table, tibble, zoo, tsibble or timeSeries to each other.

To install the stable version from CRAN:

install.packages("tsbox")

To get an idea how easy it is to switch from one class to another, consider this:

library(tsbox)
x.ts <- ts_c(mdeaths, fdeaths)
x.xts <- ts_xts(x.ts)
x.df <- ts_df(x.xts)
x.tbl <- ts_tbl(x.df)
x.dt <- ts_tbl(x.tbl)
x.zoo <- ts_zoo(x.dt)
x.tsibble <- ts_tsibble(x.zoo)
x.timeSeries <- ts_timeSeries(x.tsibble)

We jump form good old ts objects toxts, store our time series in various data frames and convert them to some highly specialized time series formats.

tsbox is class-agnostic

Because these converters work nicely, we can use them to make functions class-agnostic. If a class-agnostic function works for one class, it works for all:

ts_scale(x.ts)           
ts_scale(x.xts)
ts_scale(x.df)
ts_scale(x.dt)
ts_scale(x.tbl)

ts_scale normalizes one or multiple series, by subtracting the mean and dividing by the standard deviation. It works like a ‘generic’ function: You can apply it on any time series object, and it will return an object of the same class as its input.

So, whether we want to smooth, scale, differentiate, chain-link, forecast, regularize or seasonally adjust a series, we can use the same commands to whatever time series at hand. tsbox offers a comprehensive toolkit for the basics of time series manipulation. Here are some additional operations:

ts_pc(x.ts)                 # percentage change rates 
ts_forecast(x.xts)          # forecast, by exponential smoothing
ts_seas(x.df)               # seasonal adjustment, by X-13
ts_frequency(x.dt, "year")  # convert to annual frequency
ts_span(x.tbl, "-1 year")   # limit time span to final year

tsbox is frequency-agnostic

There are many more. Because they all start with ts_, you can use auto-complete to see what’s around. Most conveniently, there is a time series plot function that works for all classes and frequencies:

ts_plot(
  `Airline Passengers` = AirPassengers, 
  `Lynx trappings` = ts_df(lynx), 
  `Deaths from Lung Diseases` = ts_xts(fdeaths),
  title = "Airlines, trappings, and deaths",
  subtitle = "Monthly passengers, annual trappings, monthly deaths"
)

unnamed-chunk-2-1

There is also a version that uses ggplot2 and has the same syntax.

Time series in data frames

You may have wondered why we treated data frames as a time series class. The spread of dplyr and data.table has given data frames a boost and made them one of the most popular data structures in R. So, storing time series in a data frame is an obvious consequence. And even if you don’t intend to keep time series in data frames, this is still the format in which you import and export your data. tsbox makes it easy to switch from data frames to time series and back.

Make existing functions class-agnostic

tsbox includes tools to make existing functions class-agnostic. To do so, the ts_ function can be used to wrap any function that works with time series. For a function that works on "ts" objects, this is as simple as that:

ts_rowsums <- ts_(rowSums)
ts_rowsums(ts_c(mdeaths, fdeaths))

Note that ts_ returns a function, which can be used with or without a name.

In case you are wondering, tsbox uses data.table as a backend, and makes use of its incredibly efficient reshaping facilities, its joins and rolling joins. And thanks to anytime, tsbox will be able to recongnize almost any date format without manual intervention.

So, enjoy some relieve in R’s time series class struggle.

Website: www.tsbox.help

screenshot of www.dataseries.org

Forecasting GDP with R and dataseries.org

The website dataseries.org aims to be Switzerland’s FRED – a free comprehensive database of Swiss time series. Powered by R and written in Shiny (also using a bit of JavaScript) it allows you to quickly search and explore a large number of data series.

Switzerland’s time series in one place

Similarly to the United States, public data in Switzerland is produced by a large number of different offices, which makes it hard to find any particular series. dataseries.org provides a structured and automatically updated collection of most of these series. We are still working on the data input, but are pretty much complete in the field of Economics.

You can download the data as spreadsheets or graphs, or embed interactive widgets in your website. Alternatively, you can import the data directly into R, using the dataseries package. Install the package from CRAN,

install.packages("dataseries")

and run the ds function with the id argument that you find on the website:

plot(dataseries::ds("GDP.PBRTT.A.R", "ts"), 
     ylab = "mio CHF, at 2010 prices, s. adj.", 
     main = "Gross Domestic Product")

fig-1

This will give you an R plot of Switzerland’s GDP. (The data is cached, so calling the function again will not re-download until you restart the R session.)

Live Import of Series to R

In the following, I will use data from dataseries.org to produce a live forecast of Switzerland’s GDP. Each day the model is run, it will be ensured that the latest data is used. That way it is possible to produce a transparent and up-to-date forecast. For the following exercise, I will only use tools from R base, but it is of course possible to use the same data in a more advanced modeling framework.

In order to produce a reasonable forecast, we want to track early information on the business cycle, which is mostly survey data. We will use a question from the SECO Consumer Confidence Survey on current economic performance, the Credit Suisse / Procure Purchasing Managers’ Index and the ETHZ KOF Barometer.

Transforming the data

Getting these indicators from dataseries.org directly into R is easy. Because these data are measured at different frequencies, we need to convert them to the same quarterly frequency as GDP. There are many packages that offer functions for that (e.g., the tempdisagg package has functions to move both to higher or lower frequencies), but I will stick to basic R here:

# Aggregating months to quarters (post updated on May 6, 2017)
to_quarterly <- function(x){
 aggregate(x, nfrequency = 4, FUN = mean)
}

pmi <- to_quarterly(dataseries::ds("PMI.SA.PM", "ts"))
kof <- to_quarterly(dataseries::ds("KOF.KFBR", "ts"))
csent <- dataseries::ds("CCI.GEPC", "ts")

A plot of these series shows the common trend in these variables and gives you an indication of the business cycle, which may have turned upward in recent months.

plot(cbind(pmi, kof, csent), main = "Business Cycle Indicators")

fig-2

Since these series are stationary, our left hand side variable should be stationary as well. This is accomplished by calculating percentage change rates of GDP:

gdp.level <- dataseries::ds("GDP.PBRTT.A.R", "ts")
gdp <- (gdp.level / lag(gdp.level, -1)) - 1

ARIMA modelling

R’s workhorse for time series modeling is the arima function, which allows you to construct a univariate or multivariate model of GDP growth. Since the data is seasonally adjusted, a simple autoregressive process (AR1) offers a good benchmark:

# AR1
m0 <- arima(gdp, order = c(1, 0, 0))
fct0 <- predict(m0, n.ahead = 1)$pred
# GDP Growth Q1: +0.3 

If you need advice on which ARIMA model to choose, the information criterions, accessed by the R functions AIC or BIC, can help you to choose a model. Simply take the model with lowest information criterion. The auto.arima function from the forecast package also allows you to do the selection automatically.

We can include our series individually or jointly and estimate a range of different models. A good model (in terms of the AIC information criterion) is the following, which uses PMI and KOF data (but not consumer sentiment data):

# PMI, KOF
dta <- window(cbind(pmi, kof), start = start(pmi), end = end(pmi))
m1 <- arima(window(gdp, start = start(dta)), 
            xreg = window(dta, end = end(gdp)))
fct1 <- predict(m1, n.ahead = 1, 
                newxreg = window(dta, start = tsp(gdp)[2] + 0.25,
                                 end = tsp(gdp)[2] + 0.25)
                )$pred
# GDP Growth Q1: +0.7

The model’s forecast for the first quarter of 2017 is 0.7 – a value that hasn’t been reached for more than two years.

A factor model

If you have multiple indicators at hand, a common problem is multicollinearity, the fact that indicators are correlated, and therefore too many indicators deteriorate the quality of the model estimation.

An easy fix is to use a factor model, where the indicators are summarized in a few factors, which can be calculated by principal components (see Stock and Watson 2002):

# PMI, KOF, Consumer Sentiment, first Principal Component
pca <- prcomp(window(cbind(pmi, kof, csent), start = start(pmi), 
                     end = tsp(gdp)[2] + 0.25),
             scale. = TRUE)
dta.pca <- ts(pca$x[, 'PC1'], start = start(pmi), frequency = 4)

m2 <- arima(window(gdp, start = start(dta)), 
            xreg = window(dta.pca, end = end(gdp)))
fct2 <- predict(m2, n.ahead = 1, 
                newxreg = window(dta.pca, start = tsp(gdp)[2] + 0.25)
                )$pred
# GDP Growth Q1: +0.7

Again, we get a forecast value of 0.7. Overall, survey data indicates that the economy is well on track. Let’s do a graphical comparison of our forecasts:

# skeletons to include forecasts 
gdp.fct0 <- window(gdp, extend = TRUE, end = tsp(gdp)[2] + 0.25)
gdp.fct1 <- gdp.fct2 <- gdp.fct0

# plug forecasts into skeletons
window(gdp.fct0, start = end(gdp.fct0)) <- fct0
window(gdp.fct1, start = end(gdp.fct1)) <- fct1
window(gdp.fct2, start = end(gdp.fct2)) <- fct2

ts.plot(window(cbind(gdp, gdp.fct0, gdp.fct1, gdp.fct2), 
               start = 2010), 
        col = 1:4, ylab = "quarterly growth rates, s. adj.", 
        main = "GDP Forecasts")
legend("topright", legend = c("GDP Growth Rate", "AR 1 Forecast", 
                              "PMI, KOF", "Principal Component"), 
       lty = 1, col = 1:4, bty = "n")

fig-3

Publication of first quarter GDP is on June 1, 2017. See you in a month!

seasonal 1.3: A Better Way to Seasonal Adjustment Diagnostics

The R package seasonal makes it easy to use X-13ARIMA-SEATS, the seasonal adjustment software by the United States Census Bureau. Thanks to the x13binary package, installing it from CRAN is now as easy as installing any other R package:

install.packages("seasonal")

The latest version 1.3 comes with a new udg function and a customizable summary method, which give power users of X-13 a convenient way to check the statistics that are of their interest. For a full list of changes, see the package NEWS.

A generalized way to access diagnostics

Version 1.3 offers a generalized way to access diagnostic statistics. In seasonal, it was always possible to use all options of X-13 and access all output series. Now it is easy to access all diagnostics as well.

The main new function is udg, named after the X-13 output file which it is reading. Consider a simple call to seas (the main function of the seasonal package) that uses the X-11 seasonal adjustment method:

m <- seas(AirPassengers, x11 = "")
udg(m)

The udg function returns a list containing 357 named diagnostics. They are properly type-converted, so they can be directly used for further analysis within R.

If we ask for a specific statistic, such as the popular X-11 M statistics, the result will be simplified to a numeric vector (see ?udg for additional options):

udg(m, c("f3.m01", "f3.m02", "f3.m03", "f3.m04"))

## f3.m01 f3.m02 f3.m03 f3.m04 
##  0.041  0.042  0.000  0.283

There are also some new wrappers for commonly used statistics, such as AICBIC, logLik or qs, which use the udg function.

A customizable summary

The new functionality paves the way for a customizable summary method for seas objects. For example, if we want to add the M statistics for X-11 adjustments to the summary, we can write:

summary(m, stats = c("f3.m01", "f3.m02", "f3.m03", "f3.m04"))

## Call:
## seas(x = AirPassengers, x11 = "")
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## Weekday           -0.0029497  0.0005232  -5.638 1.72e-08 ***
## Easter[1]          0.0177674  0.0071580   2.482   0.0131 *  
## AO1951.May         0.1001558  0.0204387   4.900 9.57e-07 ***
## MA-Nonseasonal-01  0.1156204  0.0858588   1.347   0.1781    
## MA-Seasonal-12     0.4973600  0.0774677   6.420 1.36e-10 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## 
## X11 adj.  ARIMA: (0 1 1)(0 1 1)  Obs.: 144  Transform: log
## AICc: 947.3, BIC: 963.9  QS (no seasonality in final):    0  
## Box-Ljung (no autocorr.): 26.65   Shapiro (normality): 0.9908  
## f3.m01: 0.041  f3.m02: 0.042  f3.m03: 0  f3.m04: 0.283 

Note the new line at the end, which shows the M statistics.

If we want to routinely consider these statistics in our summary, we can set the seas.stats option accordingly:

options(seas.stats = c("f3.m01", "f3.m02", "f3.m03", "f3.m04"))

This will change the default behavior, and

summary(m)

will return the same output as above. To restore the default behavior, set the option back to NULL.

options(seas.stats = NULL)

Like peanut butter and jelly: x13binary and seasonal

This post was written by Dirk Eddelbuettel and Christoph Sax and posted by both author’s respective blogs.

The seasonal package by Christoph Sax brings a very featureful and expressive interface for working with seasonal data to the R environment. It uses the standard tool of the trade: X-13ARIMA-SEATS. This powerful program is provided by the statisticians of the US Census Bureau based on their earlier work (named X-11 and X-12-ARIMA) as well as the TRAMO/SEATS program by the Bank of Spain. X-13ARIMA-SEATS is probably the best known tool for de-seasonalization of timeseries, and used by statistical offices around the world.

Sadly, it also has a steep learning curve. One interacts with a basic command-line tool which users have to download, install and properly reference (by environment variables or related means). Each model specification has to be prepared in a special ‘spec’ file that uses its own, cumbersome syntax.

As seasonal provides all the required functionality to use X-13ARIMA-SEATS from R — see the very nice seasonal demo site — it still required the user to manually deal with the X-13ARIMA-SEATS installation.

So we decided to do something about this. A pair of GitHub repositories provide both the underlying binary in a per-operating system form (see x13prebuilt) as well as a ready-to- use R package (see x13binary) which uses the former to provide binaries for R. And the latter is now on CRAN as package x13binary ready to be used on Windows, OS-X or Linux. And the seasonal package (in version 1.2.0 – now on CRAN – or later) automatically makes use of it. Installing seasaonal and x13binary in R is now as easy as:

install.packages("seasonal")

which opens the door for effortless deployment of powerful deasonalization. By default, the principal function of the package employs a number of automated techniques that work well in most circumstances. For example, the following code produces a seasonal adjustment of the latest data of US retail sales (by the Census Bureau) downloaded from Quandl:

library(seasonal) 

url <- "https://www.quandl.com/api/v3/datasets/USCENSUS/BI_MARTS_44000_SM.csv?order=asc"
rs <- ts(read.csv(url)$Value/1e3, start = c(1992, 1), frequency = 12)

m1 <- seas(rs)

plot(m1, main = "Retail Trade: U.S. Total Sales", 
     ylab = "USD (in Billions)")

This tests for log-transformation, performs an automated ARIMA model search, applies outlier detection, tests and adjusts for trading day and Easter effects, and invokes the SEATS method to perform seasonal adjustment. And this is how the adjusted series looks like:

USRetailSales

Of course, you can access all available options of X-13ARIMA-SEATS as well. Here is an example where we adjust the latest data for Chinese exports (as tallied by the US FED), taking into account the different effects of Chinese New Year before, during and after the holiday:

url <- "https://www.quandl.com/api/v3/datasets/FRED/VALEXPCNM052N.csv?order=asc"
xp <- ts(read.csv(url)$VALUE/1e9, start = c(1981, 1), frequency = 12)

m2 <- seas(window(xp, start = 2000), 
  xreg = cbind(
    genhol(cny, start = -7, end = -1, center = "calendar"),
    genhol(cny, start = 0, end = 7, center = "calendar"), 
    genhol(cny, start = 8, end = 21, center = "calendar")),
  regression.aictest = c("td", "user"),
  regression.usertype = "holiday")

plot(m2, main = "Goods, Value of Exports for China", 
     ylab = "USD (in Billions)")

which generates the following chart demonstrating a recent flattening in export activity measured in USD.

ChineseExports

We hope this simple examples illustrates both how powerful a tool X-13ARIMA-SEATS is, but also just how easy it is to use X-13ARIMA-SEATS from R now that we provide the x13binary package automating its installation.

 

www.seasonal.website

Shiny-based Online Tool for X-13 Seasonal Adjustment: New Features

The R package seasonal makes it easy to use X-13ARIMA-SEATS, the seasonal adjustment software by the U.S. Census Bureau. In a previous post, I wrote about www.seasonal.website, a Shiny-based website showcasing the use of seasonal. Even if you are not using R, the website allows you to upload and adjust your own series, without the need for any software installation.

The latest version of www.seasonal.website comes with several new features:

Live Parsing of X-13 spc Files

The main new feature is a live parser of X-13 spc files. Changes in the Options, triggered by the pull-down menus, or changes in the R Call, are reflected in an updated X-13 Call. On the other hand, changes in the X-13 Call will be reflected in updates in the Options and the R Call.

manipulate the X-13 spec file

Interactively manipulate the X-13 spec file or the R call

This brings interesting new possibilities:

  • Non-R-users may use the website to generate spc files, which they can use in any software that includes X-13ARIMA-SEATS.
  • People familiar with X-13 may use the spc syntax to learn about the syntax of the R-package seasonal.
  • People familiar with the R-package seasonal may use it learn about the spc syntax.

New Upload/Download Dialog

The upload/download feature has been reworked. A button on the top-right corner opens a new upload and download dialog.

New upload/download dialog

New upload/download dialog

Both XLSX and CSV formats are supported. You can upload and adjust your own monthly or quarterly time series. All data will be permanently deleted after your session.

Nice Summary

The summary, previously just the printed output of the R-function summary, has been overhauled. Colored flags indicate the significance level of the coefficients, reddish colors indicate warning signs from the tests.

New Summary

New Summary