Code
options(knitr.duplicate.label = "allow")
Surveys/experiment design, analysis; methodology, protocols, templates
options(knitr.duplicate.label = "allow")
A set of curated resources tied to use-cases, and a space to organize our discussion
NOTE: this is now publicly hosted but not indexed. Please be careful not to share any confidential data or sensitive information.
A github-hosted ‘quarto’ for structured in-depth discussion
‘Why do I have a sense this is useful’?
Case against this: When a concept comes up that one doesn’t understand we all have our own resources we can go to, and our tutorials could feel like ‘reinventing the wheel’.
So, why?
our situations are sometimes particular
explaining it in our own words is helpful
keeping track of what we have done,
build a protocol and have a ‘go-to procedure’
to understand which things we value and how they relate to our work
and there isn’t always good applicable material on each topic
this is helpful for onboarding, helpful for showing experts outside of RP to get feedback.
And Common knowledge and language (see next fold).
In working together we need to know what each of us knows, understands, and believes about these tools, and how each of us interprets them.
For example, with Factor Analysis, I know little about it and have never actually run it. But I thought it was seen as useful as a dimension reduction descriptive exercise. But William, who knows more about it, seems to disagree.
With Principle Component Analysis, I think/thought I have a good understanding of what it does, and the math and geometry of it. But I do have some nagging questions like … “what is the role and meaning of the different ‘rotations’”?
“I know the components are set to be orthogonal … but how does that relate to the fact that they ‘mainly, but not always’ have non-overlapping variables”? It may be that others do know the answer to this and could easily explain it to me. Or maybe no one knows and there is a gap in our understanding, and perhaps we might miss when ‘things go wrong’. Or perhaps one of us interprets it in one way (like “it is bad when there are overlapping variables in a factor”) and another has a different idea … and we are working at cross-purposes.
Having the “conversation in an organized place” should help us establish common knowledge and fix gaps like these. … hopefully without interminable back and forths, and in a way that we do not ’forget and return to the same discussion in 6 months
Don’t share any confidential data or sensitive information
Don’t feel compelled to flesh out all sections with original content. Don’t add content just because “it’s in a typical syllabus’.
Focus on things that we use, have used, want to use, or have been requested to address.
Do curate link, and embed resources from elsewhere
Incorporate examples from our work (where these are not sensitive or they where can be made anonymous)
Do put in content that is more in-depth and technical, or involving R-code and tools
Start with ‘plain language’ explanations of technical content; for ourselves, and potentially to share with partners in future
I also hope to use this to develop ‘templates and standard practices’ for our work.
What to include and when to dive into the rabbit hole?
In ‘building a knowledge base’, there are some things that are important to include, but others should be excluded. If we are compulsive and auto-nerd-sniped to go down every rabbit hole it will be wasteful. But some rabbit holes will be worth going down, at least partially.
What’s a good rule-of-thumb for knowing if it’s worth a dive? Maybe the ‘second time rule’?
Mark the issue the first time it comes up, perhaps leave a placeholder and a link to the original notes or discussion.
Then the second time an issue comes up it may be safe to assume its an ‘important issue to dive into and document’?
Below, ‘a partial dive into the rabbit hole’, for one possible framework.
What do I mean by a ‘partial dive into the rabbit hole?’
Explain how it applied to our problem (generically if there is a confidentiality issue)
Curated links PeteW-gists-eqsue,
characterize in our own words (but concisely)
give a code example in a ‘vignette’,
check our understanding through communication and discussion with others, flag key issues
(Proposed/discussed below – keep updated with actual structure)
Major ‘parts’:
Our guidelines, style, approaches to doing and presenting coding and data work (discussions in progress). Discuss things like reproduceability, use of Tidy syntax (or not), Quarto, storing and labeling data, separating build and analysis, etc. Guides to presnting tables, visuals (moving that to ‘how to visualize’, reporting results in text.
Quantitative, ‘qualitative’, and practical implementation issues in running surveys and various types of experiments. Some overlap with EAMT gitbook methods section.
A range of frameworks: Bayesian, frequentist, descriptive
and cases: prediction, causal inference, statistical inference and updating, dimension reduction and recovering ‘factors’…1
Aiming at ‘preferred approaches’ (e.g., ‘which tests’) with justifications and code vignettes and
MONTE-CARLO ‘FERMI ESTIMATION’ APPROACHES;
For more information on how this ‘bookdown’ was created, see our public template here, and also consult the resources at bookdown.org
We are using Renv
to keep packages aligned. Please install Renv it and snapshot
renv::dependencies
should tell us what packages are used/needed
library(dplyr)
dependencies <- renv::dependencies()
Finding R package dependencies ... [20/68] [21/68] [22/68] [23/68] [24/68] [25/68] [26/68] [27/68] [28/68] [29/68] [30/68] [31/68] [32/68] [33/68] [34/68] [35/68] [36/68] [37/68] [38/68] [39/68] [40/68] [41/68] [42/68] [43/68] [44/68] [45/68] [46/68] [47/68] [48/68] [49/68] [50/68] [51/68] [52/68] [53/68] [54/68] [55/68] [56/68] [57/68] [58/68] [59/68] [60/68] [61/68] [62/68] [63/68] [64/68] [65/68] [66/68] [67/68] [68/68] Done!
$Package
[1] "quarto" "bayesplot" "ggplot2"
[4] "pacman" "rmarkdown" "dplyr"
[7] "magrittr" "stringr" "tidyr"
[10] "knitr" "rethinking" "brms"
[13] "patchwork" "tidybayes" "purrr"
[16] "here" "Lahman" "stats4"
[19] "VGAM" "rethinkpriorities" "tibble"
[22] "tidyverse" "labelled" "tidystats"
[25] "furrr" "rnoodling" "DT"
[28] "ie2misc" "ggrepel" "urbnmapr"
[31] "GGally" "lme4" "janitor"
[34] "remotes" "base" "forcats"
[37] "gtsummary" "pryr" "surveytools2"
[40] "vtable" "readr" "rex"
[43] "assertthat" "broom" "glue"
[46] "grid" "huxtable" "kableExtra"
[49] "rlang" "snakecase" "stats"
[52] "vip" "workflowsets" "arm"
[55] "bettertrace" "binom" "bookdown"
[58] "bslib" "corrr" "DescTools"
[61] "digest" "downlit" "downloader"
[64] "gdata" "gganimate" "ggpointdensity"
[67] "ggpubr" "ggridges" "ggthemes"
[70] "infer" "lmtest" "lubridate"
[73] "plotly" "readstata13" "sandwich"
[76] "santoku" "scales" "sjlabelled"
[79] "treemapify" "googlesheets4" "renv"
[82] "conflicted" "dials" "doMC"
[85] "doParallel" "dyneval" "glmnet"
[88] "logr" "parallel" "parsnip"
[91] "ranger" "recipes" "rpart"
[94] "rsample" "tidymodels" "tune"
[97] "yardstick" "ggimage" "ggstatsplot"
[100] "pairwiseComparisons" "devtools" "dynverse"
[103] "MASS"
As always, focus on ‘what repeatedly comes up in our work’, linking the practical cases↩︎