ZEN INVESTING
zen investing
OpenAI Abandons SWE-bench Verified After Finding 59% of Failed Tests Were Flawed
OpenAI reveals major contamination issues in SWE-bench Verified benchmark, showing frontier AI models memorized solutions and tests rejected correct code.
zen investing
Harvey AI Launches Global Legal Benchmark for UK, Australia, Spain
Harvey's BigLaw Bench Global doubles benchmark size, testing AI legal capabilities across jurisdictions as model scores hit 90% on core tasks.
