Installing remotes [2.4.2] ...
OK [linked cache]
Installing downloader [0.4] ...
OK [linked cache]
19 Other suggested sections
19.1 ‘Conjoint analysis’
See end of this Slack thread
19.2 (Open and robust science: RP attitudes, discussions, resources)
Integrate from:
Code to do Robustness checks as a ‘specification chart’, and sensitivity analysis
Source: https://pbs.twimg.com/media/ESyDHGjUYAYHfva?format=jpg&name=medium
19.3 (Meta-analysis)
Incorporate and consolidate from Reinsteins meta notes and more
Related: ‘aggregating expert judgment’
Testing out the AggreCAT
package on the ACX prediction contest
Install aggreCAT
install.packages("devtools")
Installing devtools [2.4.5] ...
OK [linked cache]
Install aggreCAT
#install.packages("sessioninfo", dependencies = TRUE)
devtools::install_github("metamelb-repliCATS/aggreCAT")
#Note: this failed initially because dependent packages needed updating. `update.packages()` nay solve it. Nope, it didn't but all sorts of errors in that update
library(aggreCAT)
input_acx_prediction_data
acx <- read_csv(here("sample_data", "acx_2023blindmode_predictions.csv"))
#Improve: source from a hosted file or API
identify my own predictions
acx %<>% mutate(
d_reinstein = ifelse(
`@1.WillVladimirPutinbePresidentofRussia`==82 & `@2.WillUkrainecontrolthecityofSevastopol`==54, TRUE, FALSE)
)
What can aggreCAT
do for us?
First testing this package with their built-in data.
Below we demonstrate how to use the most simple commonly implemented aggregation method ArMean, which takes the arithmetic mean of participant Best Estimates. We first use a small subset of 5 participants for a single claim, 28
Code
data(data_ratings)
set.seed(1234)
participant_subset <- data_ratings %>% #just 5 users with names
distinct(user_name) %>%
sample_n(5) %>%
mutate(participant_name = paste("participant", rep(1:n()))) #numbers rather than id codenames
single_claim <- data_ratings %>%
filter(paper_id == "28") %>% #all data on a single claim
right_join(participant_subset, by = "user_name") #join to the above 5 users (keeping only these 5)
#note this stat only uses the rows with element == `three_point_best`
single_claim %>%
filter(element == "three_point_best") %>%
AverageWAgg(expert_judgements = .,
type = "ArMean")
method | paper_id | cs | n_experts |
---|---|---|---|
ArMean | 28 | 70.8 | 5 |
Can we do similar across multiple claims with this package?
Code
all_claims <- data_ratings %>%
right_join(participant_subset, by = "user_name") #join to the above 5 users (keeping only these 5)
all_claims %>%
filter(element == "three_point_best") %>%
AverageWAgg(expert_judgements = .,
type = "ArMean")
method | paper_id | cs | n_experts |
---|---|---|---|
ArMean | 100 | 75 | 5 |
ArMean | 102 | 34 | 5 |
ArMean | 103 | 59.4 | 5 |
ArMean | 104 | 53.4 | 5 |
ArMean | 106 | 45 | 5 |
ArMean | 108 | 74 | 5 |
ArMean | 109 | 68 | 5 |
ArMean | 116 | 61 | 5 |
ArMean | 118 | 60.8 | 5 |
ArMean | 133 | 58.6 | 5 |
ArMean | 137 | 61 | 5 |
ArMean | 138 | 68.6 | 5 |
ArMean | 145 | 67 | 5 |
ArMean | 168 | 35 | 5 |
ArMean | 176 | 46 | 5 |
ArMean | 186 | 57.6 | 5 |
ArMean | 20 | 70 | 5 |
ArMean | 203 | 41 | 5 |
ArMean | 21 | 60 | 5 |
ArMean | 215 | 45 | 5 |
ArMean | 24 | 22.2 | 5 |
ArMean | 26 | 63 | 5 |
ArMean | 28 | 70.8 | 5 |
ArMean | 38 | 33.4 | 5 |
ArMean | 79 | 31 | 5 |
This seems to work, at least with the data structured as they have it. Now what if we want multiple aggregation methods?
Code
method | n | percent |
---|---|---|
ArMean | 25 | 0.333 |
IntWAgg | 25 | 0.333 |
ShiftWAgg | 25 | 0.333 |
Now let’s see what can be done with our ACX dataset. As ACX doesn’t ask for CIs (and doesn’t have multiple rounds of expert revisions), aggreCAT
may be of limited use here.^[I’d have to estimate these bounds arbitrarily (e.g., make them larger for people with less experience). Only a subset of their measures may be informative. First I’ll check which ones might work, using their data for all aggregations (with baseline settings for each), but only keeping this ‘best prediction’ for each.
Code
all_aggsOK <- list(AverageWAgg, IntervalWAgg, ShiftingWAgg, ExtremisationWAgg)
all_aggs <- list(LinearWAgg, AverageWAgg, BayesianWAgg, IntervalWAgg, ShiftingWAgg, ReasoningWAgg, DistributionWAgg, ExtremisationWAgg)
three_aggs <- list(LinearWAgg, AverageWAgg, BayesianWAgg)
data_ratings %>%
filter(element == "three_point_best") %>%
purrr::map_dfr(.x = all_aggsOK,
.f = ~ .x(data_ratings)) %>%
tabsums(cs, method)
method | nonmissing | mean | sd |
---|---|---|---|
ArMean | 25 | 51.7 | 17.3 |
BetaArMean | 25 | 1 | 0 |
IntWAgg | 25 | 48.6 | 23.4 |
ShiftWAgg | 25 | 50.2 | 17.1 |
All of the aggregators seemed computable even without having elements other than ‘three_point_best’ What’s going on here? E.g,. IntervalWAgg
says ‘the weights are dependent on the lower and upper bounds of three-point elicitation’ … but I left those out. ‘ShiftingWAgg’ is meant to depend on discussion; but here I have no indicators of that. Yet still, these all seem to come up with different means. BetaARMean
which is meant to do some extremisation wotks rather strangely; shifting everything to 1, apparently.
Focusing in on some key interesting methods… I’ve been told ‘Geometric Means’ are good in these sorts of situations
Code
data_ratings %>%
filter(element == "three_point_best") %>%
AverageWAgg(expert_judgements = .,
type = "GeoMean")
method | paper_id | cs | n_experts |
---|---|---|---|
GeoMean | 100 | 67.3 | 25 |
GeoMean | 102 | 26.4 | 25 |
GeoMean | 103 | 60.2 | 25 |
GeoMean | 104 | 40.4 | 25 |
GeoMean | 106 | 28.8 | 25 |
GeoMean | 108 | 71.2 | 25 |
GeoMean | 109 | 71.3 | 25 |
GeoMean | 116 | 60 | 25 |
GeoMean | 118 | 52 | 25 |
GeoMean | 133 | 56.2 | 25 |
GeoMean | 137 | 60.5 | 25 |
GeoMean | 138 | 74.8 | 25 |
GeoMean | 145 | 61.3 | 25 |
GeoMean | 168 | 23.7 | 25 |
GeoMean | 176 | 21.7 | 25 |
GeoMean | 186 | 48.3 | 25 |
GeoMean | 20 | 67.9 | 25 |
GeoMean | 203 | 42.7 | 25 |
GeoMean | 21 | 52.1 | 25 |
GeoMean | 215 | 35.8 | 25 |
GeoMean | 24 | 15.7 | 25 |
GeoMean | 26 | 62.5 | 25 |
GeoMean | 28 | 61.2 | 25 |
GeoMean | 38 | 23.7 | 25 |
GeoMean | 79 | 23.4 | 25 |
LinearWAgg
seems interesting, either suppliying some sort of weights by Participant
(e.g., downweight those who haven’t done forecasting), or with the supplied GranWAgg
– “granularity of best estimates”
Let’s move to trying to use these on the ACX data. First I adjust the data to make it more compatible with the aggreCAT package.
Upweighting and downweighting people – ad-hoc.
Now, I have little time left, so I’ll come up with some ad-hoc weights.
I guess I will do something lame and ad-hoc. Everyone starts out with weight 1.
- Anyone who has no forecasting experience is downweighted fivefold (x0.2),
- Superforecasters are upweighted 20x
- SAT math uprated if above average (unless already a superforecaster)
- PHd upweighted 1.5x
- Downweight x 0.1 if you put 98+ or lt 4% chance of Putin being in power
- Downweight by quantile of share of ‘exact 50% predictions’ (up to about 90% downweighting)
Also, replace all ‘50% predictions’ with the arithmetic average (done later)
Code
# get column names starting with "@"
pred_cols <- colnames(acx)[startsWith(colnames(acx), "@")]
acx_for_ag_wts <- acx
acx_for_ag_wts$share50 <- (rowSums(acx[pred_cols]==50, na.rm=TRUE))/50
# subset data based on column names and calculate row sums
acx_for_ag_wts %<>%
mutate(
user_name = row_number()
) %>%
mutate(
SATscoremath = case_when(
(SATscoremath>800|SATscoremath<100) ~ NA_real_,
TRUE ~ SATscoremath),
SAT_math_rank = percent_rank(SATscoremath),
SAT_math_rank=
case_when(
is.na(SAT_math_rank) ~ mean(0.5, na.rm=TRUE),
TRUE ~ SAT_math_rank),
share50_qtl = percent_rank(share50)
) %>%
mutate(across(c(ForecastingExperience, Superforecaster, Degree), ~replace(., is.na(.), "No"))) %>%
mutate(
weight =
(ifelse(ForecastingExperience=="No",0.2,1))*
(ifelse(Superforecaster=="Yes",20,1))*
(1+((SAT_math_rank>0.5)*(Superforecaster!="Yes")*(SAT_math_rank-0.5))*4)*
(1+(Degree=="Ph D.")*.5)*
(ifelse(`@1.WillVladimirPutinbePresidentofRussia`>=99,0.1,1))*
(ifelse(`@1.WillVladimirPutinbePresidentofRussia`< 5, 0.1,1))*
(1-(share50_qtl)/1.2)
)
#the code above is kind of crappy; better to convert it to logs to allow simple addition?
acx_for_ag_wts_only <- acx_for_ag_wts %>%
select(user_name, weight)
Code
acx_for_ag_wts_long <- acx_for_ag_wts %>%
select(where(~n_distinct(.) > 1)) %>% #delete columns that are identical everywhere
pivot_longer( #each 'question' needs it's own row for each 'user'
contains("@"),
names_to = "paper_id",
values_to = "value"
)
acx_for_ag_wts_long %<>% select(user_name, paper_id, value, everything()) %>% #make all predictions (for an individual) into a row
mutate(element = "three_point_best",
round = "round_2") %>% #aggreCAT needs these things
group_by(paper_id) %>%
mutate(
value =
case_when(
is.na(value) ~ mean(value, na.rm=TRUE), #Sadly the package can't deal with NAs so I cheaply set them to the arithmetic average
value == 0.5 ~ mean(value, na.rm=TRUE),
TRUE ~ value
)
) %>% ungroup
Let’s see if it works
Code
method | paper_id | cs | n_experts |
---|---|---|---|
GeoMean | @1.WillVladimirPutinbePresidentofRussia | 99 | 154 |
Code
method | paper_id | cs | n_experts |
---|---|---|---|
GeoMean | @1.WillVladimirPutinbePresidentofRussia | Inf | 155 |
Strangely, this seems to only work here with 155 or less rows!
Now to do some funny calculations and blend them with my own predictions
acx_for_ag_wts_results
: The weighted arithmetic means
Code
acx_for_ag_wts_results <- acx_for_ag_wts_long %>%
LinearWAgg(expert_judgements = .,
type = "Participant",
weights = acx_for_ag_wts_only
) %>%
mutate(agg = "linear_wt",
value = cs)
Now (weighted?) geometric means
Code
weighted.geomean <- function(x, w, ...)
{
return(prod(x^w, ...)^(1/sum(w, na.rm = TRUE)))
}
weighted.geomean <- function(x, w, ...) exp(weighted.mean(log(x), w, ...))
sumwt <- sum(acx_for_ag_wts_long$weight[acx_for_ag_wts_long$weight>1], na.rm=TRUE)
reg.geomean <- function(x, ...)
{
return(prod(x, ...))
}
acx_for_ag_geom_wt <- acx_for_ag_wts_long %>%
filter(!is.na(weight) & !is.na(value)) %>%
group_by(paper_id) %>%
mutate(
sumwt = sum(weight, na.rm=TRUE)
) %>%
summarize(
weighted_geomean = weighted.geomean(value, weight),
) %>%
mutate(agg = "geom_wt",
value = weighted_geomean)
OK, looks like I forgot about the prediction market data in metaculus HERE on the same thing.
I’ll make a tibble of the weighted linear mean, weighted geometric mean, metaculus data, my own predictions)
Code
#p_load(DataEditR)
#metaculus_data <- acx_for_ag_wts_results %>%
# select(paper_id) %>%
#mutate(value=50)
#data_edit(metaculus_data)
metaculus_data <- read_csv(here("sample_data", "metaculus.csv")) %>%
mutate(agg = "metaculus")
#and me
reinstein_pred <- acx %>%
filter(d_reinstein) %>%
select(starts_with("@")) %>%
pivot_longer(col=everything()) %>%
rename(paper_id = name) %>%
mutate(agg = 'reinstein_pred')
aggregates <- bind_rows(reinstein_pred, metaculus_data, acx_for_ag_geom_wt, acx_for_ag_wts_results) %>%
select(-method, -cs, -n_experts, -weighted_geomean) %>%
pivot_wider(names_from=agg, id_cols = paper_id, values_from=value) %>%
mutate(guided_weight = 0.2*reinstein_pred + 0.4*metaculus + 0.2*linear_wt + 0.2*geom_wt,
final_impulsive_choice = guided_weight)
write_csv(aggregates, here("sample_data", "aggregates.csv"))
#data_edit(aggregates)
Fixed some stuff at end and cleaned it and made some ad-hoc extremizing around a few correlated events