12 Random/mixed fx, mlm
12.1 Discussions and resources
Slack thread on ‘what are random effects’?
A beginner’s guide to LMER … the LMER
package uses a maximum likelihood (not Bayesian) approach
McElreath “Chapter 12 (?or 13) Models With Memory”, free sample here and recoding here
12.1.1 Sample code from a use case (scratch work)
Context:
Several surveys in different contexts (identified by ‘wave’)
Surveys have “conditions” (video content); this is the object of interest for decisionmaking
as well as different readers
reader
(and some are text-only)
Outcome of key interest is a mean of several survey measures (
interestXXk_mn
)
high_interest_effectivenessmodel <- all_surveys %>%
glmer(interestXXk_gt8 ~ effectiveness_ea_minded_mn + agreeablenesscale + (1|condition) +(1|reader), family="binomial", data=.)
int_video_lmer_interact <- lmer(interestXXk_mn ~
condition*audience + condition*reader + (1|reader),
data = newdatavideo)
#but we also may want to consider
12.1.2 Discussing ‘partial pooling’
From Tristan Mahr’s vignette – this vignette explains a lot (but it doesn’t go into the formal maths)
The above interpretation leaves some unresolved questions:
How distinct is this from the ‘regularization with cross-validation’ that we see in Machine learning approaches? E.g., I could do a ridge model where I allow only the coefficient on reader to be regularized; this also leads to the same sort of ‘shrinkage’ … so what’s the difference?
Thinking by analogy to a Bayesian approach, what does it mean that we assume the intercept is a “random deviations drawn from a distribution”? Isn’t that what we always assume, for each parameter in a Bayesian model … so then, what would it mean for a Bayesian model to have a fixed (vs random) coefficient?
Why wouldn’t we want all our parameters to be random effects? Why include any fixed effects … considering general ideas of overfitting and effects as draws from larger distributions?
1 What is the impact of the choice of giving one feature a ‘random intercept only’ on the estimates of the other coefficients?
My thinking, getting back to an earlier discussion, is that by modeling the effect of reader as a random effect, thus shrinking it relative to the standard linear model’s would mean that the problem of ‘omitted variable bias’ in other coefficients (e.g, on ‘condition’) could remain. This could be a problem if reader is not orthogonal to condition (~if they are correlated to one another).
This also may come down to the question of ‘do we care mainly about interpreting and assessing a particular coefficient’ (as in most modern econometrics) or ‘do we care mainly about a predictive model overall’?
(Related to 3, I think)↩︎