Understanding the generative process behind your observations — and why it determines which statistical models are appropriate.
When a participant responds correctly or incorrectly on a trial, when they report a pain rating of 4, when their reaction time is 312ms — each of these observations was produced by some real-world process. That process is not arbitrary. It has a characteristic shape: some outcomes are more probable than others, and the probabilities follow a pattern reflecting the nature of the generating mechanism.
This pattern is what statisticians formalise as a probability distribution. Statistical models are not just tools for analysing data—they are claims about how the data were generated.
Every analysis implicitly answers the question: what kind of process produced these observations?
ANOVA says: my residuals are Gaussian. Logistic regression says: my outcomes are Bernoulli. Poisson regression says: my counts follow a Poisson process. The model is not just a computational procedure — it is a substantive claim about the generative mechanism.
A common starting point is: "Which test should I use?"
A better starting point is: "What kind of data-generating process could have produced these observations?"
Before thinking about specific models, ask: is my dependent variable discrete or continuous?
This distinction immediately narrows the set of plausible generative models.
If your model assumes the wrong distribution, the consequences range from negligible to severe depending on how far reality departs from the assumption. Standard errors may be wrong, p-values may be inflated or deflated, and effect size estimates may be biased. More subtly, the model may be making arithmetically impossible claims — estimating probabilities above 1, predicting negative counts — which is a sign that the assumed generative process simply does not match the data-generating reality.
Each distribution is associated with a particular kind of generating process. The explorer below lets you see how the shape responds to parameter changes — building the intuition to recognise which family your DV is likely to belong to before you run any formal tests.
There is one property that distinguishes the Gaussian from almost all other common distributions, and it is the property that matters most for ANOVA. In a Gaussian distribution, the mean and the variance are entirely independent parameters. Knowing the mean tells you nothing about the variance. You can have a distribution centred on 50 with very low spread, or the same centre with very high spread — they are simply different parameterisations with no necessary relationship between them.
In most other distributions the mean constrains the variance to some degree. The table below summarises where each distribution stands on this crucial question.
| Distribution | Mean | Variance | Independent? |
|---|---|---|---|
| Gaussian | μ | σ² | Yes ✓ — completely free parameters |
| Bernoulli | p | p(1 − p) | No ✗ — variance fully determined by mean |
| Binomial | np | np(1 − p) | No ✗ — variance determined by mean and n |
| Poisson | λ | λ | No ✗ — variance always equals mean exactly |
| Log-normal | eμ+σ²/2 | (eσ²−1)e2μ+σ² | No ✗ — both depend on both parameters |
| Ex-Gaussian | μ + τ | σ² + τ² | Partial ≈ — τ affects both, but μ and σ provide extra freedom |
| Beta | α/(α+β) | αβ/[(α+β)²(α+β+1)] | No ✗ — entangled through α and β |
| Neg. Binomial | μ | μ + μ²/r | No ✗ — variance exceeds mean, scales with mean |
This table is not a curiosity — it is the key to understanding ANOVA's assumptions. ANOVA decomposes total variability into between-group variance (signal) and within-group variance (noise). For the F-ratio to work properly, the within-group variance needs to be a free parameter — independent of where the group means happen to fall. If variance is locked to the mean, that independence is broken.
Having developed intuitions about which distribution family your DV is likely to follow — from first principles, substantive knowledge, or visual inspection — you can go further and formally fit competing distributions to test that intuition. Rather than assuming a distribution, you let the data speak to which generative model provides the most plausible account of what was observed.
The logic is similar to model comparison in regression: fit each candidate distribution to your data using maximum likelihood estimation (which finds the parameter values that make the observed data most probable under that distribution), then compare the fits using an information criterion. The distribution with the lower AIC or BIC provides a better trade-off between fit quality and model complexity, since more parameters always improve fit and information criteria penalise for that.
Maximum likelihood estimation asks: given this distribution family, what parameter values make my observed data most probable? AIC/BIC then asks: across several families, which one achieves the best fit without overfitting?
Plot a histogram of your DV overlaid with density curves from candidate distributions. Look at the shape — symmetric or skewed? Bounded or unbounded? Long-tailed or compact? This narrows the candidates before any formal fitting.
Use maximum likelihood to estimate the parameters of each plausible distribution. The fitdistrplus package in R makes this straightforward, with automatic starting values and convergence checks across a wide range of families.
Compare AIC and BIC across fitted distributions. Inspect Q-Q plots and P-P plots for each — these show whether the fitted distribution's quantiles match the empirical quantiles. A well-fitting distribution produces points close to the diagonal.
Statistical fit alone is not enough. The winning distribution should also make sense given what you know about the generating process. A distribution that fits well but is implausible mechanistically deserves scrutiny — it may be overfitting, or it may reveal something genuinely interesting about your data.
For positively skewed continuous variables such as reaction times, different generative stories imply different distributions:
The goal is not to pick a convenient model, but to test which of these generative accounts best matches the data.
For ex-Gaussian specifically, the retimes package in R provides direct fitting and the decomposition into μ, σ, and τ components — which is particularly useful since τ carries theoretical interpretability as an index of attentional or executive function variability.
For regression models where you want to allow the distribution to vary as a function of predictors — not just estimate its overall shape — the GAMLSS framework (Generalised Additive Models for Location, Scale and Shape) extends GLMs to cover a very wide range of distributions including beta, ex-Gaussian, negative binomial, and many others, while letting all parameters of the distribution vary with covariates.
If you have read Part 2 of this resource, you have seen the OLS-family argument in detail: ANOVA, RM-ANOVA, and the standard LMM all share one underlying likelihood — residuals are independent, identically distributed, and Gaussian (iid) — and that one likelihood is what defines the family. This section pins the same idea down in distributional terms, so the alternatives surveyed in Section 6 land cleanly.
ANOVA assumes that residual variation follows a Gaussian (normal) distribution: each score is the group's true mean plus a random error drawn from a normal distribution with mean zero and variance σ². This is not the same as saying the errors are "random" — randomness does not imply any particular distribution. ANOVA makes a stronger claim: that the unexplained variation behaves like the sum of many small, independent influences, an assumption under which Gaussian noise is a reasonable approximation. And that error variance is the same across all groups — the homogeneity-of-variance assumption.
Randomness does not imply normality — Gaussian error is a specific assumption about how randomness is structured.
The critical feature of this model is that σ² is an entirely free parameter — it says nothing about where the group means μj happen to lie. The noise is the same whether the groups are close together or far apart. This is exactly the Gaussian mean–variance independence highlighted in Section 3's table. Distributions where that independence breaks down — Bernoulli, binomial, Poisson, negative binomial, log-normal, beta — produce data that ANOVA's likelihood cannot represent honestly. Part 2 unpacks the binary case in full detail (variance = p(1−p), variance heterogeneity is automatic the moment groups differ in mean); here we extend the same logic to the other common generating processes you will meet in behavioural data.
With count data the generating process is typically Poisson or negative binomial — both of which have variance that scales with the mean. Applying ANOVA to raw counts implicitly assumes the noise is constant across groups with different mean counts, which is structurally implausible. When counts are large and roughly symmetric, the Gaussian approximation becomes tolerable, but it is always an approximation — and one that quietly weakens whenever group counts span a wide range or whenever dispersion exceeds what a Poisson process would produce.
The choice of statistical model should be guided by the likely generating process of your DV. The guide below covers the most common cases encountered in behavioural and cognitive research.
In practice, most behavioural data fall into a small number of recurring types. Each type implies a different generative model:
The goal is not to try every possible distribution, but to compare a small set of plausible generative models based on the nature of the data.
Statistical models are not neutral computational procedures. They are claims about the world — specifically, about the process that generated your observations. Learning to ask "what distribution does my DV follow, and why?" before reaching for a test is one of the most transferable skills in quantitative research. Formal distribution fitting gives you the tools to move from plausible intuition to empirical evidence about which generative model is most defensible for your data.