Research Validation
Reasoning Test
You will be shown 12causal reasoning questions. Each includes the causal graph structure and probability tables needed to compute the answer. Read the tables carefully — the same data can produce different answers depending on whether you're asked an observational, interventional, or counterfactual question.
Human experts score ~98% on similar tasks. The best LLMs score ~29%.