17 MonteCarlo ‘Fermi est.’
‘BOTEC’: Back of the envelope calculations are central to RP’s work
‘Fermi estimation’ is essentially a more formal approach to this, carefully defining and explaining each element of the ‘model’ equation.
When we explicitly define (and justify) a probability distribution over each variable in the model, and compute (often through simulation) the overall uncertainty of the outputs (predictions, estimates, etc), we call this “Monte Carlo Fermi Estimation”.1
17.1 ‘How to measure anything’, By Douglas Hubbard
See
- book website here
- RP book group notes and links (private access)
- Luke M’s summary at Lesswrong
z
Chapter 3: The Urn of Mystery Simulation
The above (folded) narrative is rather confusing, and the spreadsheet is rather bulky. We can explain it and simulate it much more simply by using code. The point is… [what was the point again?]
- Randomly draw a ‘share green’ for each urn, for each of 1000 (or ‘\(K=1000\)’) ‘urns’, and record the majority color, i.e., is it more than half green?
We set the value ‘K=1000’, which we could adjust later, indicating ‘how many urns’ we are using in our simulation. It makes it clearer to define things at the top and see how it drives the results.
Code
K <- 1000
urns <- runif(K, 0, 1)
The code above yields the object urns
, a vector of 1000 probabilities. It assigns ‘urns’ to be equal to the function runif
, i.e., ‘random uniform’.
3 We could view the whole thing in several ways, such as by typing view(urns)
or View(urns)
for a peek. We can also have any part of this printed to the screen. The point is that this object is there in the background (in our ‘environment’). We don’t need to see it in front of us, at all times, as with a spreadsheet.
Printing out a peek at this object:
Code
str(urns)
num [1:1000] 0.08596 0.70347 0.40853 0.00597 0.84137 ...
The object is a ‘numeric vector’ with 1000 elements, the first 5 or so are listed. Each of these represent
- For each of (\(K= 1000\)) urns, randomly draw a single marble, and record whether this draw is the same as the urn’s majority color.4
We put this together into a ‘tibble’ data frame below5
Code
library(tidyverse)
K <- 1000
urns <- tibble(
share_green = runif(K, 0, 1), #recreating the vector of uniform draws as the first column of a tibble
majority = if_else(share_green > .5, "green", "red"), #classifying whether the tibble is majority green or red
draw = if_else(runif(K) > share_green, "red", "green") #if another random uniform draw exceeds the 'share green' in that urn, it's a draw of a red ball. Note this is doing this for the entire vector runif(K) atonce
)
- Display this ‘matrix’ of 1000 outcomes (or a peek at it)
Code
urns
# A tibble: 1,000 × 3
share_green majority draw
<dbl> <chr> <chr>
1 0.687 green red
2 0.517 green green
3 0.456 red red
4 0.350 red green
5 0.280 red red
6 0.448 red green
7 0.651 green red
8 0.947 green green
9 0.553 green red
10 0.497 red red
# … with 990 more rows
The snip above shows how these ‘random draws’ are generated, as noted above.
- Count ‘which share of these agree with their urn’s majority color’
# A tibble: 1 × 1
share_agree
<dbl>
1 0.727
Code
#sum number of cases where the 'draw is the same color as the majority for the urn
I.e., 72.7% of the draws are the same color as the majority of their urn.6
The above code is not as elegant as it should be, and we should clean it up. Still I think this is better than using the Excel spreadsheet. Why? It gives you more control, a better record of what you have done, the ability to do more powerful analysis, and you can do more with the results (such as embedding them into a dynamic document like this one).
For one example, suppose you wanted to test this with 1 million urns. That would be a huge pain to do in Excel… I dare you to try it. In R
we simply change the code to specify \(K =1,000,000\) urns, and do the above again.7
Code
[1] 0.749833
On my computer, this ran almost immediately.
17.1.1 Tools for Monte-Carlo Fermi, other than Excel
Some links to simple vignettes in other tools
17.2 Monte-Carlo Fermi approaches to GiveWell-style Cost-Effectiveness Analysis
Embedded below, David Reinstein and Sam Nolan explain this approach, advocating its use in GiveWell models and beyond, laying out some building blocks,
… and embed some tools and work-in-progress on this HERE, also embedded below.
Overview
- The basic ideas
- Causal and Guesstimate
- Code-based tools
Code
knitr::include_url("https://effective-giving-marketing.gitbook.io/innovations-in-givewell-esque-ceas/")
DR: We may want to look for ways to explicitly incorporate and integrate these approaches into our data analysis work in R, etc.
At least David Reinstein thinks this is what it’s called.↩︎
This uses arguments K draws,
0
lower bound, and1
upper bound.↩︎See the tutorial HERE for some tips on ‘generating random data’ in R.↩︎
Well, we don’t actually draw a marble, we simply make another uniform draw, and if it’s below the ‘share green’ for that urn, we call it green.↩︎
I could reuse the above vector, but I’ll just re-create it instead↩︎
Check out the raw code – note how the number here is automatically generated with ‘inline code’. I didn’t literally type in that number.↩︎
Even better, rather than recopying the code, we would make it a function.↩︎