Diagnostic questions and a future of online education

April 11, 2020
education bayesian r

Diagnostic questions

I am a big fan of Greg Wilson’s Teaching Tech Together and I have to credit him for sparking my interest in what is called “diagnostic questions”, often used in formative assessments. Diagnostic question is a carefully designed (quick to administer) multiple-choice question (MCQ) with one unambiguous correct answer and a number of plausible distractor (incorrect answers) with diagnostic power.

A distractor is a wrong or less-than-best answer; “plausible” means that it looks like it could be right, while “diagnostic power” means that each of the distractors helps the teacher figure out what to explain next to that particular learner.

In other words, we don’t just want random wrong answers, nor do we want to “trick” people with answers that look “almost right”, but rather we want plausible answers that are able to surface misconceptions and provide teacher with some signals that certain (identifiable) parts of the material were misunderstood and may require repetition or another round of explaination. Here’s what Greg recommends as a strategy for finding “good” wrong answers:

In order to come up with plausible distractors, think about the questions your learners asked or problems they had the last time you taught this subject. If you haven’t taught it before, think about your own misconceptions, ask colleagues about their experiences, or look at the history of your field: if everyone misunderstood your subject in some way fifty years ago, the odds are that a lot of your learners will still misunderstand it that way today. You can also ask open-ended questions in class to collect misconceptions about material to be covered in a later class, or check question and answer sites like Quora or Stack Overflow to see what people learning the subject elsewhere are confused by.

Outside of data science education, the term “diagnostic question” has been popularized by Craig Barton, a passionate math teacher from UK, author of several brilliant books, including highly-acclaimed How I Wish I’d Taught Maths: Lessons learned from research, conversations with experts, and 12 years of mistakes and a Head of Education at Eedi a company with a mission of improving teaching and learning world wide. Craig’s diagnosticquestions.com collects and makes freely available thousands of carefully designed formative assessment questions for maths, sciences and computing (including Excel and Python). I very much recommend anyone with the interest in teaching STEM to register at the website and check out the questions contributed by educators from around the world.

Craig shares 5 Golden Rules for what makes a good diagnostic question:

  1. They should be clear and unambiguous
  2. They should test a single skill/concept
  3. Students should be able to answer them in less than 10 seconds
  4. You should learn something from each incorrect response without the student needing to explain
  5. It is not possible to answer the question correctly whilst still holding a key misconception

Example: Bayes Theorem

Many of us learned (although arguably a little bit too late) a Laplace conditional probability rule, known as Bayes Theorem for discrete events.

Bayes Theorem

Although the formula looks “mathy” and even intimidating, the intuition is pretty straightforward and can be applied in a wide variety of situations, from proving existence of God to filtering spam emails in your mailbox. There have been multiple attempts to explain “Bayes rule” in an intuitive way: some are long and tedius, others are short and sweet like this Bayes Theorem with Lego. Bayes rule is a helpful tool in reasoning with conditional probabilities about the events that otherwise look confusing, like, for example, Monty Hall problem.

More straightforward examples of applying Bayes rule involve reasoning about two related but subtly different distinctions: people who are genuinely sick and those testing postitive on a test. Although there’s quite a strong temptation to conflate these distinctions, Bayes rule draws our attention to relevant details and helps us keep the reasonining straight.

Dangerous fires are rare (1%), but smoke is fairly common (10%) due to barbecues. 90% of dangerous fires make smoke. What is the probability that the fire is dangerous when there’s a smoke?

Conventional way of solving these sort of tasks include recognizing that the data that is given is not what is being asked: smoke given fire, not fire given smoke. Once this is identified, another challenge is typically involving finding the value for denominator in the Bayesian formula (i.e. prevalence of smoke) because it is rarely given as straightforward as in the example above. Although rate of fires is usully given explicitly, rate of smoke is typically given indirectly (as rate of smoke in dangerous and beningn fires separately).


Bayes rule is especially important when “base rate” for the two distinctions is different. There’s even a documented cognitive bias related to ignoring base rates. The problem can be partially alleviated by using Natural Frequencies in presentation, visualization and solving of problems involving two orthogonal distinctions (Gigerenzer & Hoffrage, 1999).

In the example above, it is important to realize that the problem can be written in the following format.

Fire   No fire
Smoke 9 90
No smoke 1 900

It takes some practice to generate these “natural counts” from the problem definition on the fly. In some cases it is possible to “imagine” the population of data points under consideration and “assign” them to relevant categories.

Numeracy in surveys

I recently made visualizations for the results of a questionnaire by folks at University of Cambridge, which included a section testing numeracy of respondents. Data can be accessed here.

dt <- read_csv("https://osf.io/xubqt/download")

qs <- c("Num1", "Num2a", "Num2b", "Num3") 

q_df <- dt %>% 
  slice(1:2) %>% 
  tibble::rowid_to_column() %>% 
  pivot_longer(-rowid, names_to = "var", 
               values_to = "txt_en") %>% 
  filter(var %in% qs, rowid==1) %>% 

data_df <- dt %>% 
  slice(-1:-3) %>% 
  tibble::rowid_to_column() %>% 
  pivot_longer(cols = GenSocTrust:Politics, 
               names_to = "var", values_to = "code") %>% 
  filter(var %in% qs) %>% 
                     as.integer(code))) %>% 
  count(var, code) %>% 
  group_by(var) %>% 
  mutate(pct=round(n/sum(n),3)) %>% 

Mens choir

Here’s a first question from the survey testing people’s numeracy:

Out of 1,000 people in a small town 500 are members of a choir. Out of these 500 members in the choir 100 are men. Out of the 500 inhabitants that are not in the choir 300 are men. What is the probability that a randomly drawn man is a member of the choir? Please indicate the probability in percent. ____ %

This is the question that tests skills underlying application of Bayes Theorem. It requires the respondent to map numbers in the problem to the elements of the formula and perform a simple operation. Although it is possible to solve the problem with Bayes rule, there’s a more intuitive approach which does not invoke math. It might useful to imagine a city divided into two groups of 500 people, out of which 100 and 300 are carved out, respectively. The final “natural count” table will look like this:

Men   Not men   TOTAL
Choir 100 400 500
Not choir  300 200 500

Or in abbreviated notation (with ^ meaning not):

Men   Not men   TOTAL
Choir  CM C^M C
Not choir  ^CM ^C^M ^C

In this table it should be easy to see that the ratio we’re after can be calculated from the first column: ratio of choir men to total men, which is


What “plausible distractors” can you think for this question? Let’s look at the top answers from the actual results of the survey.

Answer   Frequency
10 0.182
25 0.181
40 0.113
20 0.111
30 0.063
50 0.055
5 0.033
60 0.023
15 0.020
33 0.016

Here are some of the plausible but wrong answers to this problem:

Remarkably, many wrong answers involve incorrect denominator (erroneously dividing by total population or by total choir members) manifesting various wrong mental models. Let’s have a look at another assignment.

Bad mushrooms

In a forest 20% of mushrooms are red, 50% brown and 30% white. A red mushroom is poisonous with a probability of 20%. A mushroom that is not red is poisonous with a probability of 5%. What is the probability that a poisonous mushroom in the forest is red? ____ %

This is “different on surface, same in depth” sort of question. A small distracting detail has been thrown in, regarding mushroom color variety. The trick is to disregard this information and collapse “non-red” mushrooms together. Another complication is that the task is formulated in terms of percentages, not absolute counts. Gigerenzer & Hoffrage (1999) predict that this should make it more difficult for people to properly perform Bayesian updating. Let’s convert this problem to “natural counts” and adopt (similar) abbreviation:

Poison   Not poison   TOTAL
Red 40 160 200
Not red  40 760 800

Or in abbreviated notation (with ^ meaning not)

Poison   Not poison   TOTAL
Red  CM C^M C
Not red  ^CM ^C^M ^C

Correct answer is of course


Let’s have a look at the top-10 answers to this question

Answer   Frequency
20 0.022
4 0.014
50 0.012
10 0.007
5 0.006
80 0.005
25 0.005
15 0.003
95 0.003
40 0.003

As we can see, much worse response rate across all answer options. Questionnaire serves this question after three other numeracy questions and the response rate deteriorates with every question indicating significant fatigue. In addition to the “plausible distractor” patterns identified earlier, we have a few more interesting answers here:

Generally, it was probably not a good idea to include this problem for measuring numeracy. Response rate was less than 10% and correct answer was picked by a little over 1% of respondents.

Five-sided die

Imagine we are throwing a five-sided die 50 times. On average, out of these 50 throws how many times would this five-sided die show an odd number (1, 3 or 5)? ____ out of 50 throws.

Here top-5 results looked as follows:

Answer Percent
30 0.251
25 0.148
5 0.060
3 0.046
50 0.043
10 0.043
20 0.033
35 0.024
15 0.019
40 0.016

Here, along with correct answer (30), we find a few plausible disractors:

SSDD problems


Gigerenzer, G., & Hoffrage, U. (1999). Overcoming difficulties in Bayesian reasoning: A reply to Lewis and Keren (1999) and Mellers and McGraw (1999). Psychological Review, 106(2), 425–430. https://doi.org/10.1037/0033-295X.106.2.425

gganimate your hex

August 6, 2019
r gganimate magick

Teach your mate animate

August 4, 2019
r gganimate magick bunny

Rant about dependencies

August 3, 2019
comments powered by Disqus