uncertainty AI News List

uncertainty AI News List | Blockchain.News

AI News List

List of AI News about uncertainty

Time	Details
2026-03-23 14:46	University of Tartu Study: Two‑Sample Hybrid Confidence Beats Self‑Consistency for LLM Uncertainty (84.2 AUROC) — 2026 Analysis According to God of Prompt on Twitter, citing a University of Tartu evaluation, verbalized confidence combined with minimal self-consistency (K=2) outperforms the industry-standard self-consistency approach for large reasoning models across 17 tasks in mathematics, STEM, and humanities, delivering 84.2 AUROC in math versus 79.4–81.4 for eight-sample baselines (source: God of Prompt, University of Tartu). As reported by the tweet, single-sample verbalized confidence reaches 71.3 AUROC in math, already beating K=2 self-consistency at 70.5 while using half the compute (source: God of Prompt). According to the summary, returns collapse beyond two samples, adding only ~4.2 AUROC in math and ~2 in STEM and humanities with the hybrid, implying major cost savings for high-stakes deployments like medical, legal, and financial reasoning where calibrated uncertainty is critical (source: God of Prompt, University of Tartu). Source

Time

Details

2026-03-23
14:46

University of Tartu Study: Two‑Sample Hybrid Confidence Beats Self‑Consistency for LLM Uncertainty (84.2 AUROC) — 2026 Analysis

According to God of Prompt on Twitter, citing a University of Tartu evaluation, verbalized confidence combined with minimal self-consistency (K=2) outperforms the industry-standard self-consistency approach for large reasoning models across 17 tasks in mathematics, STEM, and humanities, delivering 84.2 AUROC in math versus 79.4–81.4 for eight-sample baselines (source: God of Prompt, University of Tartu). As reported by the tweet, single-sample verbalized confidence reaches 71.3 AUROC in math, already beating K=2 self-consistency at 70.5 while using half the compute (source: God of Prompt). According to the summary, returns collapse beyond two samples, adding only ~4.2 AUROC in math and ~2 in STEM and humanities with the hybrid, implying major cost savings for high-stakes deployments like medical, legal, and financial reasoning where calibrated uncertainty is critical (source: God of Prompt, University of Tartu).

Source