# Drought assessment has been outpaced by climate change: empirical arguments for a paradigm shift

### Study area and time period of interest

Our study area was defined as the contiguous US, which features a diverse range of climatic conditions and has produced a wealth of long-term meteorological datasets. For this study we focus our analysis and results on the summer time period (June 1–August 31); however, data was considered back to March 4th for certain timescales and dates (e.g. 90 days prior to June 1st, discussed below). We chose this time period for two primary reasons: (1) drought conditions are common and often strongly impactful for agricultural systems during the summer months; and (2) to maximize the number of long-term datasets available for analysis. Drought metric computations are highly sensitive to missing data, especially when computation of drought metrics is based on the summation of data over a given timescale (for example 30 days) as is the case for Standardized Precipitation Index (SPI). Accordingly, we applied very restrictive constraints on whether data from a specific station or year was included in our analysis (described more below).

### Global Historical Climatology Network

We used the Global Historical Climatology Network (GHCN)^{37} as a primary dataset for our analysis. We processed the GHCN data in a stepwise process to ensure quality and completeness of data, while acknowledging that computations based on different timescales (here, 30, 60, and 90 days) require different amounts of complete data. To begin, we filtered the GHCN data for stations that started reporting in 1950 or earlier and were currently reporting as of 2020. This accomplishes two initial constraints, 1. The data record has the potential to have 70 years or more of complete data (considered the minimum number of years to compute the “period-of-record” climatology) and 2. The data from a station is considered “operational” in 2020 and therefore may be used in present-day drought monitoring. Next, we filtered the data for data completeness for different timescales. For example, a 30-day timescale requires complete data for 30 days prior to any day of interest. Here, our period of interest is from June 1st to August 31, therefore to compute a drought metric for June 1st, one will need a complete data record back to May 3rd (30 days including June 1st) for each year included in the analysis. Thus, for a 30-day timescale, completeness of record was evaluated from May 1st to August 31st. Alternatively, for a 90 day timescale, completeness is required to March 4th, thus, completeness of record was evaluated from March 1st to August 31st. Therefore, differences in timescale cause differences in the number of stations ultimately evaluated for additional analysis. All analysis was conducted in the R programming environment^{38}. We accessed GHCN data using the RNOAA package^{39}.

### Statistical analysis

#### Moving window operations

Moving window operations are at the core of the analysis presented in this study. Each computation is conducted without the use of data from the future with respect to the time period of interest. For example, any operation conducted in 2015 would only consider data in 2015 and prior. This is important in order to emulate the analysis that would have been conducted in 2015 (conventional drought monitoring does not use future projections in analysis). Although, it is important to acknowledge that it is common practice to retrospectively compute drought metrics (or anomalies) based on a reference period, that includes data that is in the future from a period of interest.

#### Gamma probability distribution estimation

Fitting probabilistic distributions to empirical data is one of the most important steps in computing drought metrics. Estimated distributions contextualize conditions with respect to expected outcomes in order to normalize data into anomalies. In this study, we fit a gamma probability distribution to an aggregated (summed) precipitation time series. To do so, raw daily precipitation time series were first aggregated over three timescales, 30, 60 and 90 days, to compute the annual summed precipitation time series for a day of interest (for example, June 1st). Once an aggregated precipitation time series was computed for a given site and day, we computed the unbiased sample probability-weighted moments of the aggregated precipitation data, which were then converted to L-moments (linear combination of traditional moments). Estimates of the gamma distribution parameters (rate and shape) were then computed given the L-moments of the data. This procedure was conducted using the lmomco package^{40}.

#### Standardized precipitation index computation

The Standardized Precipitation (SPI) was designed by McKee and others^{15} to standardize precipitation time series across a record in order to normalize precipitation anomalies in both time and space while still accounting for non-normal distributions. To calculate the SPI, we computed the cumulative distribution function (CDF) of the aggregated precipitation time series using the associated gamma distribution fit (described above). The CDF values were then evaluated within an inverse Gaussian function with a mean of zero and a standard deviation of one to obtain the final SPI value. This “normalization” of the data centers CDF values of 0.5 (average timescale summed precipitation) about an SPI value of zero. Wet[dry] conditions have CDF values > [<] 0.5 and SPI values > [<] 0.

#### Drought metric bias

To compute the average difference (bias) between climatologies of differing lengths, we computed the median difference between daily summer time (June 1st—August 31st) SPI values for a long and short period of record. Our two time periods of record were: 1. The “period-of-record’ climatology which is composed of at least 70 years of complete data and 2. The most current 30-year record (“contemporary” record) which was composed of at least 25 years of complete data within the 30-year moving window (i.e. for 2020, 1991–2020, or for 2019, 1990–2019, and so on). The final bias was then computed as the median daily period-of-record SPI value minus the contemporary SPI value. Negative [positive] values represent locations where the period-of-record SPI value is on average lesser [greater] then the contemporary SPI value (dry [wet] bias). We computed the average bias for all observations in the study (Fig. S2). To evaluate if bias varied as a function of wetness/dryness state, we also computed the bias for breaks in the period-of-record SPI timeseries. Breaks were defined as period-of-record SPI > 2, > 2 SPI > 1, 1 > SPI > −1, −1 > SPI > −2 and −2 > SPI (from wet to dry, respectively, Fig. 5, S3, S4, S5, S6).

To aid in visually evaluating spatial patterns in drought metric bias we computed krigged maps of bias for all observations together and for each of the wetness/dryness categories separately. To generate these krigged maps we fit a variogram to each of the datasets using the automap^{41} and gstat^{42} packages in the R programming environment. Once a variogram was fit to the data we predicted the krigged surface of bias using the gstat package across a 1/3° x 1/3° raster in the WGS84 (EPSG 4326) coordinate projection system.

#### Monte Carlo analysis

In order to evaluate the absolute SPI error associated with parameterizing the gamma probability distribution with differing climatology lengths we ran three separate Monte Carlo simulations. The three simulations were meant to capture a range of scenarios under which SPI may be computed. The three simulations were: (1) simulation of a single stationary distribution with known gamma distribution parameters, (2) simulation of many stationary distributions with known gamma distribution parameters (parameter pairs sampled from the observed distribution of gamma parameters, Fig. S1), and (3) Simulation of a non-stationary distribution with known gamma distribution parameters that vary through time (parameter pair time series derived from GHCN site data, for example, Fig. 2). For all simulations, CDF and SPI error was assessed based on the most contemporary observation (“today’s” value). This methodology most closely mimics real-time drought monitoring processes.

#### Stationary distribution Monte Carlo analysis (single parameter pair)

To evaluate how the absolute SPI error varied as a function of the number of observations (years) in each climatology, we conducted an iterative experiment (Monte Carlo simulation). We defined a rate and shape parameter pair from which we generated random samples. The random samples come from a known distribution, thus we computed the true CDF and SPI values associated with that random sample. Probabilistic CDF and SPI values were computed based on fitting a gamma distribution to the randomly generated data of differing lengths, from 1 to 100 samples in 1 sample increments. The associated gamma distribution parameters were also stored to evaluate the convergence of estimated parameters towards the known gamma distribution parameters for differing climatology lengths. The absolute SPI and CDF error were computed by subtracting the probabilistic value from the true value for the most current observation (synonymous with the 2020 value used in operational drought monitoring). We repeated this process 1000 times, generating new data for each simulation. Finally, we summarized the results of the 1000 simulations by computing the median and interquartile range (IQR) of the gamma distribution parameters as well as the absolute CDF and SPI error for each climatology length (Fig. 3a).

#### Stationary distribution Monte Carlo analysis (multiple parameter pairs)

Following the method described above, but focusing on the absolute SPI error, we replicated the Monte Carlo simulation using 100 randomly sampled parameter pairs based on the full parameter space of observed parameters captured in this study (Fig. S1, white scatter points; June 1–August 31; 1991–2020, 30-, 60- and 90-day timescales, parameter space *n* = 4,907,001). This Monte Carlo simulation was meant to evaluate if the error estimation described above is dependent on the gamma distribution parameter pair evaluated. For example, it is possible that distributions with greater variability may require more samples to adequately fit the probability distribution when compared to a distribution with lesser variability. Thus, we replicated the analysis above with 1000 simulations per parameter pair. In the same manner as above, we computed the median and IQR absolute SPI error for each individual parameter pair and for all simulations and parameters together (Fig. 3b).

#### Non-stationary distribution Monte Carlo analysis

To evaluate the effect of a non-stationary distribution on the absolute SPI error, we adapted the method described above using parameter pairs that are dynamic in time. Therefore, for each of the climatology lengths (1−100 samples), random data was generated from a new distribution. The distributional parameter pairs used in this simulation were derived from the 30-year moving window analysis of observed data at 11 GHCN sites (Fig. 4). These 11 sites represent locations with 100 years or more of complete precipitation records (as defined above) for a given timescale. Therefore, the annual changes in parameter values in the simulation are representative of true, observed distributional shifts (for example, Fig. 2; Fig S7; Fig S8). In order to simulate annual shifts in the gamma distribution, we infilled any missing rate and shape parameters using a spline function (shown in Fig. 2). However this was only done if there were missing values (<2% of data), thus we used the real 30-year moving window parameter pairs whenever possible.

First, we used the gamma distribution pairs from GHCN site USC00381770, located at Clemson University, South Carolina (Fig. 2). For each simulation, we generated a random sample from the time-specific gamma distribution for each of the climatology length values (1–100). Probabilistic SPI values were computed by fitting a gamma distribution to the randomly generated samples from each of the generative distributions. Therefore, at a climatology length of 30 observations (years), the probabilistic value was computed by fitting a gamma distribution to the randomly generated 30 observations from the 30 generative distributions (replicating an analysis from 1991 to 2020). Similarly, for a climatology length of 90 observations, the probabilistic gamma distribution was fit to the data from the 90 previous generative distributions (effectively 1931 to 2020). The absolute SPI error for the most recent observation (effectively 2020) was computed by subtracting the probabilistic value from the true value computed using the most current gamma distribution (2020). This analysis was conducted for climatology lengths from 1 to 100 and the simulation was conducted 1000 times. These results were summarized by computing the median and IQR of the absolute SPI error (Fig. 4a). We replicated this analysis for 10 additional GHCN sites and for the 30-, 60- and 90-day timescales (Fig. 4b).