RP: Methods

Surveys/experiment design, analysis; methodology, protocols, templates

Author

David Reinstein (most content atm), Willem Sleegers, Jamey Elsey

1 Introduction/overview

Code

options(knitr.duplicate.label = "allow")

A set of curated resources tied to use-cases, and a space to organize our discussion

NOTE: this is now publicly hosted but not indexed. Please be careful not to share any confidential data or sensitive information.

1.1 The purpose of this resource

A github-hosted ‘quarto’ for structured in-depth discussion

But why do we want or need this?

‘Why do I have a sense this is useful’?

Case against this: When a concept comes up that one doesn’t understand we all have our own resources we can go to, and our tutorials could feel like ‘reinventing the wheel’.

So, why?

our situations are sometimes particular
explaining it in our own words is helpful
keeping track of what we have done,
build a protocol and have a ‘go-to procedure’
to understand which things we value and how they relate to our work
and there isn’t always good applicable material on each topic
this is helpful for onboarding, helpful for showing experts outside of RP to get feedback.

And Common knowledge and language (see next fold).

Another reason: Common knowledge and language

In working together we need to know what each of us knows, understands, and believes about these tools, and how each of us interprets them.

For example, with Factor Analysis, I know little about it and have never actually run it. But I thought it was seen as useful as a dimension reduction descriptive exercise. But William, who knows more about it, seems to disagree.

With Principle Component Analysis, I think/thought I have a good understanding of what it does, and the math and geometry of it. But I do have some nagging questions like … “what is the role and meaning of the different ‘rotations’”?

“I know the components are set to be orthogonal … but how does that relate to the fact that they ‘mainly, but not always’ have non-overlapping variables”? It may be that others do know the answer to this and could easily explain it to me. Or maybe no one knows and there is a gap in our understanding, and perhaps we might miss when ‘things go wrong’. Or perhaps one of us interprets it in one way (like “it is bad when there are overlapping variables in a factor”) and another has a different idea … and we are working at cross-purposes.

Having the “conversation in an organized place” should help us establish common knowledge and fix gaps like these. … hopefully without interminable back and forths, and in a way that we do not ’forget and return to the same discussion in 6 months

What to include, and how (Reinstein’s thoughts)

Don’t share any confidential data or sensitive information
Don’t feel compelled to flesh out all sections with original content. Don’t add content just because “it’s in a typical syllabus’.
Focus on things that we use, have used, want to use, or have been requested to address.
- Secondarily, on things relevant to Effective Altruism-related research in general
Do curate link, and embed resources from elsewhere
Incorporate examples from our work (where these are not sensitive or they where can be made anonymous)
- Top of each chapter/section: A ‘where do we use this’ section or box.
Do put in content that is more in-depth and technical, or involving R-code and tools
- Still, do try to ‘offset’ details (in folding blocks, appendix sections, margin notes), where it would otherwise clutter the book
Start with ‘plain language’ explanations of technical content; for ourselves, and potentially to share with partners in future

I also hope to use this to develop ‘templates and standard practices’ for our work.

What to include and when to dive into the rabbit hole?

The ‘second time it comes up’ rule

In ‘building a knowledge base’, there are some things that are important to include, but others should be excluded. If we are compulsive and auto-nerd-sniped to go down every rabbit hole it will be wasteful. But some rabbit holes will be worth going down, at least partially.

What’s a good rule-of-thumb for knowing if it’s worth a dive? Maybe the ‘second time rule’?

Mark the issue the first time it comes up, perhaps leave a placeholder and a link to the original notes or discussion.

Then the second time an issue comes up it may be safe to assume its an ‘important issue to dive into and document’?

Below, ‘a partial dive into the rabbit hole’, for one possible framework.

What do I mean by a ‘partial dive into the rabbit hole?’

Explain how it applied to our problem (generically if there is a confidentiality issue)
Curated links PeteW-gists-eqsue,
characterize in our own words (but concisely)
give a code example in a ‘vignette’,
check our understanding through communication and discussion with others, flag key issues

1.2 Sections and organization

(Proposed/discussed below – keep updated with actual structure)

Major ‘parts’:

Presentation, code, organization:

Our guidelines, style, approaches to doing and presenting coding and data work (discussions in progress). Discuss things like reproduceability, use of Tidy syntax (or not), Quarto, storing and labeling data, separating build and analysis, etc. Guides to presnting tables, visuals (moving that to ‘how to visualize’, reporting results in text.

Designing surveys and trials

Quantitative, ‘qualitative’, and practical implementation issues in running surveys and various types of experiments. Some overlap with EAMT gitbook methods section.

Statistics and modeling

A range of frameworks: Bayesian, frequentist, descriptive

and cases: prediction, causal inference, statistical inference and updating, dimension reduction and recovering ‘factors’…¹

Aiming at ‘preferred approaches’ (e.g., ‘which tests’) with justifications and code vignettes and

Modeling, Prediction, inference, and machine learning .. some key topics

“Multi-variable ‘regression’ models” and specification choices
Interpreting model results
Predictive modeling and machine learning
Practical Bayesian approaches and interpretations
Psychometrics, especially factor analysis

Other topics and worked examples

Stuff that didn’t fit in previous sections
Practical tools like Fermi Monte-carlo
Worked examples linked in previous sections

MONTE-CARLO ‘FERMI ESTIMATION’ APPROACHES;

The basic ideas
Causal and Guesstimate
Code-based tools

1.3 Favorite resources, references, tools

“Research tools” Reinstein airtable, data relevant view

Pete W’s gists

1.4 Style and ‘collophon’

For more information on how this ‘bookdown’ was created, see our public template here, and also consult the resources at bookdown.org

1.5 R setup and packages

We are using Renv to keep packages aligned. Please install Renv it and snapshot

renv::dependencies should tell us what packages are used/needed

Code

library(dplyr)

dependencies <- renv::dependencies()

Finding R package dependencies ... [20/68] [21/68] [22/68] [23/68] [24/68] [25/68] [26/68] [27/68] [28/68] [29/68] [30/68] [31/68] [32/68] [33/68] [34/68] [35/68] [36/68] [37/68] [38/68] [39/68] [40/68] [41/68] [42/68] [43/68] [44/68] [45/68] [46/68] [47/68] [48/68] [49/68] [50/68] [51/68] [52/68] [53/68] [54/68] [55/68] [56/68] [57/68] [58/68] [59/68] [60/68] [61/68] [62/68] [63/68] [64/68] [65/68] [66/68] [67/68] [68/68] Done!

Code

dependencies[2] %>% unique() %>% as.list()

$Package
  [1] "quarto"              "bayesplot"           "ggplot2"            
  [4] "pacman"              "rmarkdown"           "dplyr"              
  [7] "magrittr"            "stringr"             "tidyr"              
 [10] "knitr"               "rethinking"          "brms"               
 [13] "patchwork"           "tidybayes"           "purrr"              
 [16] "here"                "Lahman"              "stats4"             
 [19] "VGAM"                "rethinkpriorities"   "tibble"             
 [22] "tidyverse"           "labelled"            "tidystats"          
 [25] "furrr"               "rnoodling"           "DT"                 
 [28] "ie2misc"             "ggrepel"             "urbnmapr"           
 [31] "GGally"              "lme4"                "janitor"            
 [34] "remotes"             "base"                "forcats"            
 [37] "gtsummary"           "pryr"                "surveytools2"       
 [40] "vtable"              "readr"               "rex"                
 [43] "assertthat"          "broom"               "glue"               
 [46] "grid"                "huxtable"            "kableExtra"         
 [49] "rlang"               "snakecase"           "stats"              
 [52] "vip"                 "workflowsets"        "arm"                
 [55] "bettertrace"         "binom"               "bookdown"           
 [58] "bslib"               "corrr"               "DescTools"          
 [61] "digest"              "downlit"             "downloader"         
 [64] "gdata"               "gganimate"           "ggpointdensity"     
 [67] "ggpubr"              "ggridges"            "ggthemes"           
 [70] "infer"               "lmtest"              "lubridate"          
 [73] "plotly"              "readstata13"         "sandwich"           
 [76] "santoku"             "scales"              "sjlabelled"         
 [79] "treemapify"          "googlesheets4"       "renv"               
 [82] "conflicted"          "dials"               "doMC"               
 [85] "doParallel"          "dyneval"             "glmnet"             
 [88] "logr"                "parallel"            "parsnip"            
 [91] "ranger"              "recipes"             "rpart"              
 [94] "rsample"             "tidymodels"          "tune"               
 [97] "yardstick"           "ggimage"             "ggstatsplot"        
[100] "pairwiseComparisons" "devtools"            "dynverse"           
[103] "MASS"

1.6 Git repo, shared files, procedure for editing

As always, focus on ‘what repeatedly comes up in our work’, linking the practical cases↩︎

# Introduction/overview ```{r} options(knitr.duplicate.label = "allow") ``` A set of curated resources tied to use-cases, and a space to organize our discussion  **NOTE: this is now publicly hosted but not indexed. Please be careful not to share any confidential data or sensitive information.**  ## The purpose of this resource  A github-hosted '[quarto](https://quarto.org/)' for structured in-depth discussion ::: {.callout-note collapse="true"} ## But why do we want or need this? 'Why do I have a sense this is useful'? Case against this: When a concept comes up that one doesn't understand we all have our own resources we can go to, and our tutorials could feel like 'reinventing the wheel'. So, why? - our situations are sometimes particular - explaining it in our own words is helpful - keeping track of what we have done, - build a protocol and have a 'go-to procedure' - to understand which things we value and how they relate to our work - and there isn't always good applicable material on each topic - this is helpful for onboarding, helpful for showing experts outside of RP to get feedback. And *Common knowledge and language* (see next fold). ::: ::: {.callout-note collapse="true"} ## Another reason: Common knowledge and language In working together we need to know what each of us knows, understands, and believes about these tools, and how each of us interprets them. For example, with Factor Analysis, I know little about it and have never actually run it. But I thought it was seen as useful as a dimension reduction descriptive exercise. But William, who knows more about it, seems to disagree. With Principle Component Analysis, I think/thought I have a good understanding of what it does, and the math and geometry of it. But I do have some nagging questions like ... "what is the role and meaning of the different 'rotations'"? "I know the components are set to be orthogonal ... but how does that relate to the fact that they 'mainly, but not always' have non-overlapping variables"? It may be that others do know the answer to this and could easily explain it to me. Or maybe no one knows and there is a gap in our understanding, and perhaps we might miss when 'things go wrong'. Or perhaps one of us interprets it in one way (like "it is bad when there are overlapping variables in a factor") and another has a different idea ... and we are working at cross-purposes. Having the "conversation in an organized place" should help us establish common knowledge and fix gaps like these. ... hopefully without interminable back and forths, and in a way that we do not 'forget and return to the same discussion in 6 months ::: \ ::: {.callout-note collapse="true"} ## What to include, and how (Reinstein's thoughts) - Don't share any confidential data or sensitive information - Don't feel compelled to flesh out all sections with original content. Don't add content just because "it's in a typical syllabus'. - Focus on things that we use, have used, want to use, or have been requested to address. - Secondarily, on things relevant to Effective Altruism-related research in general - Do curate link, and embed resources from elsewhere - Incorporate examples from our work (where these are not sensitive or they where can be made anonymous) - Top of each chapter/section: A 'where do we use this' section or box. - Do put in content that is more in-depth and technical, or involving R-code and tools - Still, do try to 'offset' details (in folding blocks, appendix sections, margin notes), where it would otherwise clutter the book - Start with 'plain language' explanations of technical content; for ourselves, and potentially to share with partners in future I also hope to use this to develop 'templates and standard practices' for our work. ::: \ **What to include and when to dive into the rabbit hole?** ::: {.callout-note collapse="true"} ## The 'second time it comes up' rule In 'building a knowledge base', there are some things that are important to include, but others should be excluded. If we are compulsive and auto-nerd-sniped to go down every rabbit hole it will be wasteful. But some rabbit holes will be worth going down, at least partially. *What's a good rule-of-thumb for knowing if it's worth a dive?* Maybe the 'second time rule'? Mark the issue the first time it comes up, perhaps leave a placeholder and a link to the original notes or discussion. Then the second time an issue comes up it may be safe to assume its an 'important issue to dive into and document'? ::: Below, 'a partial dive into the rabbit hole', for one possible framework. ::: {.alert .alert-secondary} What do I mean by a 'partial dive into the rabbit hole?' - Explain how it applied to our problem (generically if there is a confidentiality issue) - Curated links [PeteW-gists](https://gist.github.com/peterhurford/)-eqsue, - characterize in our own words (but concisely) - give a code example in a 'vignette', - check our understanding through communication and discussion with others, flag key issues ::: ## Sections and organization {#outline_sections} (Proposed/discussed below -- keep updated with actual structure) Major 'parts': 1. **Presentation, code, organization:** Our guidelines, style, approaches to doing and presenting coding and data work (discussions in progress). Discuss things like reproduceability, use of Tidy syntax (or not), Quarto, storing and labeling data, separating build and analysis, etc. Guides to presnting tables, visuals (moving that to ['how to visualize'](https://rethinkpriorities.github.io/how-to-visualize/), reporting results in text. 2. **Designing surveys and trials** Quantitative, 'qualitative', and practical implementation issues in running surveys and various types of experiments. Some overlap with [EAMT gitbook methods section](https://effective-giving-marketing.gitbook.io/untitled/methodological-discussion/section-introduction-wip). 3. **Statistics and modeling** A range of frameworks: Bayesian, frequentist, descriptive and cases: prediction, causal inference, statistical inference and updating, dimension reduction and recovering 'factors'...[^index-1] [^index-1]: As always, focus on 'what repeatedly comes up in our work', linking the practical cases Aiming at 'preferred approaches' (e.g., 'which tests') with justifications and code vignettes and ::: {.callout-note collapse="true"} ## Modeling, Prediction, inference, and machine learning .. some key topics - "Multi-variable 'regression' models" and specification choices - Interpreting model results - Predictive modeling and machine learning - Practical Bayesian approaches and interpretations - Psychometrics, especially factor analysis ::: 4. **Other topics and worked examples** - Stuff that didn't fit in previous sections - Practical tools like Fermi Monte-carlo - Worked examples linked in previous sections [**MONTE-CARLO 'FERMI ESTIMATION' APPROACHES**](#fermi)**;** - The basic ideas - Causal and Guesstimate - Code-based tools ## Favorite resources, references, tools ["Research tools" Reinstein airtable, data relevant view](https://airtable.com/shrIWaF4UsQ92CavA) [Pete W's gists](https://gist.github.com/peterhurford/) ## Style and 'collophon' For more information on how this 'bookdown' was created, see our public template [here](https://rethinkpriorities.github.io/bookdown-template/), and also consult the resources at [bookdown.org](https://bookdown.org/) ## R setup and packages We are using `Renv` to keep packages aligned. Please install Renv it and snapshot `renv::dependencies` should tell us what packages are used/needed ```{r} library(dplyr) dependencies <- renv::dependencies() dependencies[2] %>% unique() %>% as.list() ``` ## Git repo, shared files, procedure for editing