Inside the Model | Beer Concept Test

Featured respondent

—

Take our featured one. Here is what they told the interviewer:

Their answers

These six answers are everything the model knows about this respondent.

How the model thinks about this respondent

The model runs those inputs through an equation. Each non-reference input has a coefficient — a number, learned when the model was fit on all — respondents, that says how much this input pushes the predicted probability up or down. Reference inputs contribute zero; the constant absorbs them. Continuous predictors contribute their coefficient times the measured value.

The calculation for our featured one, row by row:

The total at the bottom is on the log-odds scale. The logistic function σ(x) = 1 / (1 + e^-x) squashes it back into a probability between 0 and 1:

Log-odds total:

—

Predicted probability for our featured one:

—

What actually happened?

We know our featured one's actual outcome. The model didn't — when it was fit, it saw only the six inputs above and the headline rate. So we can put prediction and outcome side by side:

Observed outcome

—

One person, one outcome. The model's claim is about the headline, not about any individual respondent.

What if our featured one had answered differently?

The dropdowns and sliders below start at this one's actual answers. Change any to a different value, then click Apply. A fresh calc table and σ panel below recompute the prediction for this one only — the rest of the sample stays untouched. The locked baseline at the top of the page stays where it is, so you can compare side by side.

Recomputed calculation under your what-if

New log-odds total:

—

New predicted probability:

—

They're one of —

Our featured one's prediction is one number. The model runs the same calculation for every respondent in the sample, producing — predicted probabilities. Plotted together:

Each bar is a 5-percentage-point bin. The vertical line marks the weighted mean of all those probabilities — the headline:

—

Our featured one sits at —. Plenty of respondents sit far above or far below the headline — the spread on either side is what the headline hides.

How well does the model sort people like our featured one?

That last section compared our featured one's prediction against what actually happened for that one person. We can do the same comparison across the whole sample — we know the actual outcome for every respondent, and the model didn't see any of those outcomes when it predicted. So we can ask, after the fact: did the model give higher predicted probabilities to actual approvers and lower ones to actual non-approvers?

The same cloud, split by what actually happened:

Actual approvers got an average predicted probability of —; actual non-approvers averaged —. The gap between those means is Tjur's R², in probability units (shown here as percentage points):

—

0 pp means no separation — the predicted probability tells you nothing about which group the respondent belongs to. 100 pp means perfect separation. Real-world models land in between.

This number measures how well the published model separates the two groups on the data it was fit on, and it stays fixed through the scenarios below. (An all-or-nothing scenario homogenizes one predictor across every respondent, which mechanically shrinks the gap and falsely suggests the model "got worse" under intervention — but nothing about the model has changed.)

And what if everyone had answered differently?

Earlier you watched our featured one's prediction shift when they answered differently. Only that one person moved. The cloud and the headline stayed put because the rest of the sample didn't change.

The simulator below applies the same kind of swap to everyone. Pin one or more respondent-level inputs to a single value, and every respondent gets that value on that input. The headline tiles and the cloud below redraw to show the shift.

No scenario applied — showing baseline.

Baseline headline

—

Under scenario

—

95% CI: —

Change

—

percentage points