13 Prediction, & ML

Code

library(rethinkpriorities)
source(here("code", "methods_setup.R"))

Error in here("code", "methods_setup.R"): could not find function "here"

13.1 Consult modeling vignette(s)

See Vignette: Modeling donations in EA Survey data with Tidymodels and workflow

13.2 Prediction models for insights?

Discussion: why build ‘predictive models of donations’, for example?

We focus on predicting the individual’s donation in a year, focusing on the same set of outcomes used in the previous section. For this model to be useful for an actual prediction problem going forward, it would need to rely on ‘ex-ante’ characteristic that were already observable at the time of a career/EtG/pledge decision.¹ These might include immutable demographics, career plans, and pledges previously taken, and consider year and trend effects.

Although we have these models in mind, this is not what we are doing here. We are not posing a specific ‘prediction problem’ per se. Instead we are using machine learning tools built for prediction problems to generate ‘data-driven insights’ about factors related to EA donation behavior. Here, we do not than directly specifying all of the included components of the model (features, interaction terms, etc.). Instead we provide a large set of possible ‘inputs’ and use ML techniques to train models that should predict well outside of the data they are trained on. These models should do a good job of accomplishing the task: ‘if you gave me a set of features of an EA, I should have a fairly accurate guess at what they will donate.’

The insights from these models should also be treated with caution. Again, they may not be deriving causal relationships. Furthermore, the parameters derived from model-fitting ML procedures are not in general unbiased or consistent, and it is difficult to derive proper confidence intervals for these parameters.

Still, the benefit of this exercise may be considered ‘the derivation of robust and predictive relationships in the data that are mainly driven by the data itself, rather than our preconcieved ideas.’ These models may also be useful building blocks towards future predictive work.

13.3 Penalized regression models

Discussion: ‘elastic net’ models:

In brief, the elastic net models involve linear models (log-linear in our case), i.e., ‘regressions’, that carefully ‘penalize’ the (squared) magnitude of coefficients, in effect shrinking these towards zero. The penalties are specifically ‘tuned’ and ‘validated’ to maximize the predictive power of the model. As these are essentially regression approaches, we can report the sign and magnitude of the coefficients used in the ‘optimally tuned’ predictive model. (However, we should be careful about interpreting these parameters, and statistical inference is challenging. See e.g., (Mullainathan2017?) for a detailed discussion.)

The GLMnet approach combines ‘ridge (L2 norm) and lasso (L2 norm)’, tuning the mix of each, as well as tuning the penalization parameters within each.

Both the sum of absolute value coefficients (L1 norm) and the summed square of these (L2 norm) ma y be penalized. Thus, these will be ‘shrunk towards zero’ relative to a comparable OLS (standard linear model) estimate. How much do we ‘charge’ for the absolute or squared sums? This is ‘tuned’, i.e., we see what combinations perform best in the cross-validation exercise.

In some cases (because of the L1 norm), coefficients may be dropped entirely, i.e., ‘shrunk all the way to zero’.²

13.4 Metrics of fit

We may want to consider ‘how successful’ our predictive models are at making practically useful predictions. In other words, ‘how far off’ are the predictions and classifications on average, from the actual outcomes. This procedure considers the fit on randomly-drawn set-aside ‘testing data’, data that has not been used in ‘training’ (or ‘fitting’) the model.

In the vignette here we consider and discuss the root-mean-square-error (RMSE) and mean-absolute-error (MAE) metrics in the context of predicting donatino outcomes.

In order to assess the usefulness of each predictive regression model we consider both root-mean-square-error (RMSE) and mean-absolute-error (MAE). RMSE (aka RMSD) can be interpreted as the average ‘Euclidean distance’ between the actual values and the model’s prediction. For each observation (in the set-aside ‘testing sample’), to construct RMSE we:

Measure the differences between the actual and predicted outcome (e.g., donation)
Square these differences
Take the average of these squared differences across all observations
Take the square root of this

To construct mean-absolute-error (MAE) we simply

Measure the absolute-value differences between the actual and predicted outcome (e.g., donation) for each observation
Take the average of these across all observations

MAE has a much more straightforward interpretation: it simply asks ‘how far off are we, on average?’

While the RMSE is used in the model fitting for various reasons, it is arguably less-interpretable and less-relevant than MAE in judging the model’s fit in cases like this one. RMSE error negatively assesses the model fit based on squared deviations, and is thus very sensitive to ‘large mistakes’. This may be relevant where ‘large errors are much much worse than small ones’ – here, this is not so clearly the case. In the presence of data with large outlying observations, prediction will tend to be poor by this measure.

Note that when considering models where the outcome is transformed (e.g., log(donations)) we construct the RMSE and MAE by exponentiating to generate predictions for the level outcomes, and then measure the deviations on the level scale.³

13.5 Resources and use-cases

Oska ML notes (Notion)
Predictive models of donations in EA Survey
- Code to fit models in this folder
- … uses tidymodels including workflow_set and ‘recipes’ and the parsnip package
General intuitive discussion: “Data Science for Business by Foster Provost and Tom Fawcett (2013)”; DR notes here and here

An alternate project might try to predict future total EA donations in total in subsequent years and decades. This could embody both a prediction problem for individuals and uncertainties at the aggregate level. This is even further from what we are doing here, but seems worthwhile for future work, combining the EA survey with other measures and assessments.↩︎
Note that the Lasso is often chosen because it sometimes drops coefficients, making the model more ‘parsimonious’, thus perhaps easier to describe and use. But the Lasso is not specifically designed as a ‘variable selection procedure’. Running Lasso and then naively using the ‘variables not shrunk to zero’ in further standard linear models or other procedures may not be optimal. But see the ‘lazy lasso’ approach.↩︎
When considering predicted outcomes on the logarithmic scale, both RMSE and MAE indicate roughly ‘how many exponential orders of magnitude our predictions for the non-logged outcomes are off. E.g., a MSE of 1.5 for ’log donation’ suggests an we are off by about $exp(1.5) =$ 4.48 times in terms of donations, getting them off by a factor of about 5. This conversion avoid such complications.↩︎

# Prediction, & ML {-#modeling} ```{r} library(rethinkpriorities) source(here("code", "methods_setup.R")) ``` ## Consult modeling vignette(s) See [Vignette: Modeling donations in EA Survey data with Tidymodels and workflow](../vignettes/eas_ml_modeling_vignette.html) ## Prediction models for insights? {#pred-insights} ::: {.callout-note collapse="true"} ## Discussion: why build 'predictive models of donations', for example? We focus on predicting the individual's donation in a year, focusing on the same set of outcomes used in the previous section. For this model to be useful for an *actual* prediction problem going forward, it would need to rely on 'ex-ante' characteristic that were already observable at the time of a career/EtG/pledge decision.^[An alternate project might try to predict future total EA donations in total in subsequent years and decades. This could embody both a prediction problem for individuals and uncertainties at the aggregate level. This is even further from what we are doing here, but seems worthwhile for future work, combining the EA survey with other measures and assessments.] These might include immutable demographics, career plans, and pledges previously taken, and consider year and trend effects. Although we have these models in mind, this is not what we are doing here. We are not posing a specific 'prediction problem' per se. Instead we are using machine learning tools built for prediction problems to generate 'data-driven insights' about factors related to EA donation behavior. Here, we do not than directly specifying all of the included components of the model (features, interaction terms, etc.). Instead we provide a large set of possible 'inputs' and use ML techniques to train models that should predict well *outside of the data they are trained on*. These models should do a good job of accomplishing the task: 'if you gave me a set of features of an EA, I should have a fairly accurate guess at what they will donate.' The insights from these models should also be treated with caution. Again, they may not be deriving causal relationships. Furthermore, the parameters derived from model-fitting ML procedures are not in general unbiased or consistent, and it is difficult to derive proper confidence intervals for these parameters. Still, the benefit of this exercise may be considered 'the derivation of robust and predictive relationships in the data that are mainly driven by the data itself, rather than our preconcieved ideas.' These models may also be useful building blocks towards future predictive work. ::: ## Penalized regression models {#penalized-reg} ::: {.callout-note collapse="true"} ## Discussion: 'elastic net' models: In brief, the *elastic net* models involve linear models (log-linear in our case), i.e., '*regressions*', that carefully 'penalize' the (squared) magnitude of coefficients, in effect shrinking these towards zero. The penalties are specifically 'tuned' and 'validated' to maximize the predictive power of the model. As these are essentially regression approaches, we can report the sign and magnitude of the coefficients used in the 'optimally tuned' predictive model. (However, we should be careful about interpreting these parameters, and statistical inference is challenging. See e.g., @Mullainathan2017 for a detailed discussion.) The GLMnet approach combines 'ridge (L2 norm) and lasso (L2 norm)', tuning the mix of each, as well as tuning the penalization parameters *within* each. Both the sum of absolute value coefficients (L1 norm) and the summed square of these (L2 norm) ma y be penalized. Thus, these will be 'shrunk towards zero' relative to a comparable OLS (standard linear model) estimate. How much do we 'charge' for the absolute or squared sums? This is 'tuned', i.e., we see what combinations perform best in the cross-validation exercise. In some cases (because of the L1 norm), coefficients may be dropped entirely, i.e., 'shrunk all the way to zero'.^[Note that the Lasso is often chosen because it sometimes drops coefficients, making the model more 'parsimonious', thus perhaps easier to describe and use. But the Lasso is *not* specifically designed as a 'variable selection procedure'. Running Lasso and then naively using the 'variables not shrunk to zero' in further standard linear models or other procedures may not be optimal. But see the 'lazy lasso' approach.] ::: ## Metrics of fit We may want to consider 'how successful' our predictive models are at making practically useful predictions. In other words, 'how far off' are the predictions and classifications on average, from the actual outcomes. This procedure considers the fit on randomly-drawn *set-aside* 'testing data', data that has not been used in 'training' (or 'fitting') the model. In the vignette [here](https://rethinkpriorities.github.io/methodology-statistics-design/vignettes/eas_ml_modeling_vignette.html#illustrate-metrics-of-fit) we consider and discuss the root-mean-square-error (RMSE) and mean-absolute-error (MAE) metrics in the context of predicting donatino outcomes. In order to assess the usefulness of each predictive regression model we consider both root-mean-square-error (RMSE) and mean-absolute-error (MAE). RMSE (aka [RMSD](https://en.wikipedia.org/wiki/Root-mean-square_deviation)) can be interpreted as the average 'Euclidean distance' between the actual values and the model's prediction. For each observation (in the set-aside 'testing sample'), to construct RMSE we: 1. Measure the differences between the actual and predicted outcome (e.g., donation) 2. Square these differences 3. Take the average of these squared differences across all observations 4. Take the square root of this To construct mean-absolute-error (MAE) we simply 1. Measure the *absolute-value* differences between the actual and predicted outcome (e.g., donation) for each observation 2. Take the average of these across all observations MAE has a much more straightforward interpretation: it simply asks 'how far off are we, on average?' While the RMSE is used in the model *fitting* for various reasons, it is arguably less-interpretable and less-relevant than MAE in *judging* the model's fit in cases like this one. RMSE error negatively assesses the model fit based on *squared* deviations, and is thus very sensitive to 'large mistakes'. This may be relevant where 'large errors are much much worse than small ones' -- here, this is not so clearly the case. In the presence of data with large outlying observations, prediction will tend to be poor by this measure. Note that when considering models where the outcome is transformed (e.g., log(donations)) we construct the RMSE and MAE by exponentiating to generate predictions for the *level* outcomes, and then measure the deviations on the level scale.^[ When considering predicted outcomes on the *logarithmic* scale, both RMSE and MAE indicate roughly 'how many exponential orders of magnitude our predictions for the *non-logged outcomes* are off. E.g., a MSE of 1.5 for 'log donation' suggests an we are off by about $exp(1.5) =$ `r op(exp(1.5))` times in terms of donations, getting them off by a factor of about 5. This conversion avoid such complications. ] ## Resources and use-cases - Oska ML notes (Notion) - [Predictive models of donations in EA Survey](https://rethinkpriorities.github.io/ea_data_public/eas_donations.html#predictive-models) - Code to fit models in [this folder](https://github.com/rethinkpriorities/ea-data/tree/master/analysis/predictive) - ... uses [tidymodels](https://www.tidymodels.org/) including [workflow_set](https://workflowsets.tidymodels.org/) and 'recipes' and the [parsnip](https://www.tidyverse.org/blog/2018/11/parsnip-0-0-1/) package - General intuitive discussion: "Data Science for Business by Foster Provost and Tom Fawcett (2013)"; DR notes [here](https://daaronr.github.io/metrics_discussion/n-ds4bs.html) and [here](https://daaronr.github.io/metrics_discussion/control-ml.html)