---
title: 3. Manipulating multiple signals
description: Download multiple signals at once, and aggregate and manipulate them in various ways.
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{3. Manipulating multiple signals}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


Various analyses involve working with multiple signals at once. The covidcast
package provides some helper functions for fetching multiple signals from the
API, and aggregating them into one data frame for various downstream uses.

## Fetching multiple signals

To load confirmed cases and deaths at the state level, in a single function
call, we can use `covidcast_signals()` (note the plural form of "signals"):


``` r
library(covidcast)

start_day <- "2020-06-01"
end_day <- "2020-10-01"

signals <- covidcast_signals(data_source = "jhu-csse",
                             signal = c("confirmed_7dav_incidence_prop",
                                        "deaths_7dav_incidence_prop"),
                             start_day = start_day, end_day = end_day,
                             geo_type = "state", geo_values = "tx")

summary(signals[[1]])
```

```
A `covidcast_signal` dataframe with 123 rows and 15 columns.

data_source : jhu-csse
signal      : confirmed_7dav_incidence_prop
geo_type    : state

first date                          : 2020-06-01
last date                           : 2020-10-01
median number of geo_values per day : 1
```

``` r
summary(signals[[2]])
```

```
A `covidcast_signal` dataframe with 123 rows and 15 columns.

data_source : jhu-csse
signal      : deaths_7dav_incidence_prop
geo_type    : state

first date                          : 2020-06-01
last date                           : 2020-10-01
median number of geo_values per day : 1
```

This returns a list of `covidcast_signal` objects. The argument structure for
`covidcast_signals()` matches that of `covidcast_signal()`, except the first
four arguments (`data_source`, `signal`, `start_day`, `end_day`) are allowed to
be vectors. See the `covidcast_signals()` documentation for details.

## Aggregating signals, wide format

To aggregate multiple signals together, we can use the `aggregate_signals()`
function, which accepts a list of `covidcast_signal` objects, as returned by
`covidcast_signals()`. With all arguments set to their default values,
`aggregate_signals()` returns a data frame in "wide" format:


``` r
library(dplyr)

aggregate_signals(signals) %>% head()
```

```
  geo_value time_value value+0:jhu-csse_confirmed_7dav_incidence_prop
1        tx 2020-06-01                                       3.393256
2        tx 2020-06-02                                       3.644320
3        tx 2020-06-03                                       3.723629
4        tx 2020-06-04                                       6.985028
5        tx 2020-06-05                                       7.920192
6        tx 2020-06-06                                       8.034533
  value+0:jhu-csse_deaths_7dav_incidence_prop
1                                   0.0856342
2                                   0.0953654
3                                   0.0909864
4                                   0.0977982
5                                   0.1002310
6                                   0.0909864
```

In "wide" format, only the latest issue of data is retained, and the columns
`data_source`, `signal`, `issue`, `lag`, `stderr`, `sample_size` are all dropped
from the returned data frame. Each unique signal---defined by a combination of
data source name, signal name, and time-shift---is given its own column, whose
name indicates its defining quantities.

As hinted above, `aggregate_signals()` can also apply time-shifts to the given
signals, through the optional `dt` argument. This can be either be a single
vector of shifts or a list of vectors of shifts, this list having the same
length as the list of `covidcast_signal` objects (to apply, respectively, the
same shifts or a different set of shifts to each `covidcast_signal` object).
Negative shifts translate into in a *lag* value and positive shifts into a
*lead* value; for example, if `dt = -1`, then the value on June 2 that gets
reported is the original value on June 1; if `dt = 0`, then the values are left
as is.


``` r
aggregate_signals(signals, dt = c(-1, 0)) %>%
  head()
```

```
  geo_value time_value value-1:jhu-csse_confirmed_7dav_incidence_prop
1        tx 2020-06-01                                             NA
2        tx 2020-06-02                                       3.393256
3        tx 2020-06-03                                       3.644320
4        tx 2020-06-04                                       3.723629
5        tx 2020-06-05                                       6.985028
6        tx 2020-06-06                                       7.920192
  value+0:jhu-csse_confirmed_7dav_incidence_prop
1                                       3.393256
2                                       3.644320
3                                       3.723629
4                                       6.985028
5                                       7.920192
6                                       8.034533
  value-1:jhu-csse_deaths_7dav_incidence_prop
1                                          NA
2                                   0.0856342
3                                   0.0953654
4                                   0.0909864
5                                   0.0977982
6                                   0.1002310
  value+0:jhu-csse_deaths_7dav_incidence_prop
1                                   0.0856342
2                                   0.0953654
3                                   0.0909864
4                                   0.0977982
5                                   0.1002310
6                                   0.0909864
```

``` r
aggregate_signals(signals, dt = list(0, c(-1, 0, 1))) %>%
  head()
```

```
  geo_value time_value value+0:jhu-csse_confirmed_7dav_incidence_prop
1        tx 2020-06-01                                       3.393256
2        tx 2020-06-02                                       3.644320
3        tx 2020-06-03                                       3.723629
4        tx 2020-06-04                                       6.985028
5        tx 2020-06-05                                       7.920192
6        tx 2020-06-06                                       8.034533
  value-1:jhu-csse_deaths_7dav_incidence_prop
1                                          NA
2                                   0.0856342
3                                   0.0953654
4                                   0.0909864
5                                   0.0977982
6                                   0.1002310
  value+0:jhu-csse_deaths_7dav_incidence_prop
1                                   0.0856342
2                                   0.0953654
3                                   0.0909864
4                                   0.0977982
5                                   0.1002310
6                                   0.0909864
  value+1:jhu-csse_deaths_7dav_incidence_prop
1                                   0.0953654
2                                   0.0909864
3                                   0.0977982
4                                   0.1002310
5                                   0.0909864
6                                   0.0885536
```

Finally, `aggregate_signals()` also accepts a single data frame (instead of a
list of data frames), intended to be convenient when applying shifts to a single
`covidcast_signal` object:


``` r
aggregate_signals(signals[[1]], dt = c(-1, 0, 1)) %>%
  head()
```

```
  geo_value time_value value-1:jhu-csse_confirmed_7dav_incidence_prop
1        tx 2020-06-01                                             NA
2        tx 2020-06-02                                       3.393256
3        tx 2020-06-03                                       3.644320
4        tx 2020-06-04                                       3.723629
5        tx 2020-06-05                                       6.985028
6        tx 2020-06-06                                       7.920192
  value+0:jhu-csse_confirmed_7dav_incidence_prop
1                                       3.393256
2                                       3.644320
3                                       3.723629
4                                       6.985028
5                                       7.920192
6                                       8.034533
  value+1:jhu-csse_confirmed_7dav_incidence_prop
1                                       3.644320
2                                       3.723629
3                                       6.985028
4                                       7.920192
5                                       8.034533
6                                       7.957171
```

## Aggregating signals, long format

We can also use `aggregate_signals()` in "long" format, with one observation
per row:


``` r
aggregate_signals(signals, format = "long") %>%
  head()
```

```
  data_source                        signal geo_value time_value   source
1    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse
2    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse
3    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse
4    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-04 jhu-csse
5    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-05 jhu-csse
6    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-06 jhu-csse
  geo_type time_type      issue  lag missing_value missing_stderr
1    state       day 2023-03-03 1005             0              5
2    state       day 2023-03-03 1004             0              5
3    state       day 2023-03-03 1003             0              5
4    state       day 2023-03-03 1002             0              5
5    state       day 2023-03-03 1001             0              5
6    state       day 2023-03-03 1000             0              5
  missing_sample_size stderr sample_size dt    value
1                   5     NA          NA  0 3.393256
2                   5     NA          NA  0 3.644320
3                   5     NA          NA  0 3.723629
4                   5     NA          NA  0 6.985028
5                   5     NA          NA  0 7.920192
6                   5     NA          NA  0 8.034533
```

``` r
aggregate_signals(signals, dt = c(-1, 0), format = "long") %>%
  head()
```

```
  data_source                        signal geo_value time_value   source
1    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse
2    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse
3    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse
4    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse
5    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse
6    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse
  geo_type time_type      issue  lag missing_value missing_stderr
1    state       day 2023-03-03 1005             0              5
2    state       day 2023-03-03 1005             0              5
3    state       day 2023-03-03 1004             0              5
4    state       day 2023-03-03 1004             0              5
5    state       day 2023-03-03 1003             0              5
6    state       day 2023-03-03 1003             0              5
  missing_sample_size stderr sample_size dt    value
1                   5     NA          NA -1       NA
2                   5     NA          NA  0 3.393256
3                   5     NA          NA -1 3.393256
4                   5     NA          NA  0 3.644320
5                   5     NA          NA -1 3.644320
6                   5     NA          NA  0 3.723629
```

``` r
aggregate_signals(signals, dt = list(-1, 0), format = "long") %>%
  head()
```

```
  data_source                        signal geo_value time_value   source
1    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse
2    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse
3    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse
4    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-04 jhu-csse
5    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-05 jhu-csse
6    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-06 jhu-csse
  geo_type time_type      issue  lag missing_value missing_stderr
1    state       day 2023-03-03 1005             0              5
2    state       day 2023-03-03 1004             0              5
3    state       day 2023-03-03 1003             0              5
4    state       day 2023-03-03 1002             0              5
5    state       day 2023-03-03 1001             0              5
6    state       day 2023-03-03 1000             0              5
  missing_sample_size stderr sample_size dt    value
1                   5     NA          NA -1       NA
2                   5     NA          NA -1 3.393256
3                   5     NA          NA -1 3.644320
4                   5     NA          NA -1 3.723629
5                   5     NA          NA -1 6.985028
6                   5     NA          NA -1 7.920192
```

As we can see, time-shifts work just as before, in "wide" format. However, in
"long" format, all columns are retained, and an additional `dt` column is added
to record the time-shift being used.

Just as before, `covidcast_signals()` can also operate on a single data frame,
to conveniently apply shifts, in "long" format:


``` r
aggregate_signals(signals[[1]], dt = c(-1, 0), format = "long") %>%
  head()
```

```
  data_source                        signal geo_value time_value   source
1    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse
2    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse
3    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse
4    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse
5    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse
6    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse
  geo_type time_type      issue  lag missing_value missing_stderr
1    state       day 2023-03-03 1005             0              5
2    state       day 2023-03-03 1005             0              5
3    state       day 2023-03-03 1004             0              5
4    state       day 2023-03-03 1004             0              5
5    state       day 2023-03-03 1003             0              5
6    state       day 2023-03-03 1003             0              5
  missing_sample_size stderr sample_size dt    value
1                   5     NA          NA -1       NA
2                   5     NA          NA  0 3.393256
3                   5     NA          NA -1 3.393256
4                   5     NA          NA  0 3.644320
5                   5     NA          NA -1 3.644320
6                   5     NA          NA  0 3.723629
```

## Pivoting longer or wider

The package also provides functions for pivoting an aggregated signal data frame
longer or wider. These are essentially wrappers around `pivot_longer()` and
`pivot_wider()` from the `tidyr` package, that set the column structure and
column names appropriately. For example, to pivot longer:


``` r
aggregate_signals(signals, dt = list(-1, 0)) %>%
  covidcast_longer() %>%
  head()
```

```
  data_source                        signal geo_value time_value dt     value
1    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 -1        NA
2    jhu-csse    deaths_7dav_incidence_prop        tx 2020-06-01  0 0.0856342
3    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 -1 3.3932560
4    jhu-csse    deaths_7dav_incidence_prop        tx 2020-06-02  0 0.0953654
5    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 -1 3.6443200
6    jhu-csse    deaths_7dav_incidence_prop        tx 2020-06-03  0 0.0909864
```

And to pivot wider:


``` r
aggregate_signals(signals, dt = list(-1, 0), format = "long") %>%
  covidcast_wider() %>%
  head()
```

```
  geo_value time_value value-1:jhu-csse_confirmed_7dav_incidence_prop
1        tx 2020-06-01                                             NA
2        tx 2020-06-02                                       3.393256
3        tx 2020-06-03                                       3.644320
4        tx 2020-06-04                                       3.723629
5        tx 2020-06-05                                       6.985028
6        tx 2020-06-06                                       7.920192
  value+0:jhu-csse_deaths_7dav_incidence_prop
1                                   0.0856342
2                                   0.0953654
3                                   0.0909864
4                                   0.0977982
5                                   0.1002310
6                                   0.0909864
```

## A sanity check

Lastly, here's a small sanity check, that lagging cases by 7 days using
`aggregate_signals()` and correlating this with deaths using `covidcast_cor()`
yields the same result as telling `covidcast_cor()` to do the time-shifting
itself:


``` r
df_cor1 <- covidcast_cor(x = aggregate_signals(signals[[1]], dt = -7,
                                              format = "long"),
                        y = signals[[2]])

df_cor2 <- covidcast_cor(x = signals[[1]], y = signals[[2]], dt_x = -7)
identical(df_cor1, df_cor2)
```

```
[1] TRUE
```