{"@type":"StructuredNewsArticle","access":{"license":"neupai_standard","structured_data":"free","full_text_access":null,"full_text_available":false,"attribution_required":true},"content":{"claims":[{"id":"c1","type":"fact","as_of":"2026-05","figures":null,"insight":null,"as_of_raw":"May 2026","statement":"A joint research team from Meta FAIR, Stanford University, and Harvard University published a ProgramBench benchmark paper in May 2026","comparison":null,"expiry_hint":null,"source_type":"research_paper","as_of_explicit":true},{"id":"c2","type":"fact","as_of":"2026-05","figures":null,"insight":null,"as_of_raw":"May 2026","statement":"The research team selected 200 core programs used in actual field environments, including FFmpeg, SQLite, and PHP interpreter, as tasks","comparison":null,"expiry_hint":null,"source_type":"research_paper","as_of_explicit":false},{"id":"c3","type":"fact","as_of":"2026-05","figures":null,"insight":null,"as_of_raw":"May 2026","statement":"They deployed 9 of the highest-performing language models currently available","comparison":null,"expiry_hint":null,"source_type":"research_paper","as_of_explicit":false},{"id":"c4","type":"fact","as_of":"2026-05","figures":null,"insight":null,"as_of_raw":"May 2026","statement":"The research team verified the AI models' code with a total of 248,853 operational tests","comparison":null,"expiry_hint":null,"source_type":"research_paper","as_of_explicit":false},{"id":"c5","type":"fact","as_of":"2026-05","figures":null,"insight":null,"as_of_raw":"May 2026","statement":"Not a single model was able to completely solve any of the 200 tasks","comparison":null,"expiry_hint":null,"source_type":"research_paper","as_of_explicit":false},{"id":"c6","type":"fact","as_of":"2026-05","figures":null,"insight":null,"as_of_raw":"May 2026","statement":"Even the best-performing model only managed to pass 95% of tests on just 6 out of 200 tasks","comparison":null,"expiry_hint":null,"source_type":"research_paper","as_of_explicit":false},{"id":"c7","type":"fact","as_of":"2024","figures":null,"insight":null,"as_of_raw":"2024","statement":"Richard Sutton, University of Alberta professor, is a 2024 ACM A.M. Turing Award recipient","comparison":null,"expiry_hint":null,"source_type":"company_disclosure","as_of_explicit":true},{"id":"c8","type":"fact","as_of":"2025-03","figures":null,"insight":null,"as_of_raw":"March 2025","statement":"Professor Sutton and Andrew Barto, Professor Emeritus at University of Massachusetts, were selected as recipients in March 2025","comparison":null,"expiry_hint":null,"source_type":"company_disclosure","as_of_explicit":true},{"id":"c9","type":"fact","as_of":"2019","figures":null,"insight":null,"as_of_raw":"2019","statement":"Professor Sutton wrote 'The Bitter Lesson' essay in 2019","comparison":null,"expiry_hint":null,"source_type":"research_paper","as_of_explicit":true}],"topics":["ai","programming","research","technology"],"summary":"In a ProgramBench test conducted by a joint research team from Meta, Stanford, and Harvard, none of the world's top 9 AI models achieved a 100% success rate on 200 programming tasks. Richard Sutton, founder of reinforcement learning, argued that LLMs are a dead end and that a new AI paradigm that interacts with the world is needed.","entities":[{"name":"Meta FAIR","type":"organization","metadata":{"parent":null,"ticker":null},"canonical_id":"org:us:meta-fair","role_in_article":"primary_subject"},{"name":"Stanford University","type":"organization","metadata":{"parent":null,"ticker":null},"canonical_id":"org:us:stanford-university","role_in_article":"primary_subject"},{"name":"Harvard University","type":"organization","metadata":{"parent":null,"ticker":null},"canonical_id":"org:us:harvard-university","role_in_article":"primary_subject"},{"name":"Claude Opus","type":"product","metadata":{"parent":null,"ticker":null},"canonical_id":"product:us:claude-opus","role_in_article":"mentioned"},{"name":"GPT","type":"product","metadata":{"parent":null,"ticker":null},"canonical_id":"product:us:gpt","role_in_article":"mentioned"},{"name":"Gemini Pro","type":"product","metadata":{"parent":null,"ticker":null},"canonical_id":"product:us:gemini-pro","role_in_article":"mentioned"},{"name":"Richard Sutton","type":"person","metadata":{"parent":null,"ticker":null},"canonical_id":"person:ca:richard-sutton","role_in_article":"quoted"},{"name":"University of Alberta","type":"organization","metadata":{"parent":null,"ticker":null},"canonical_id":"org:ca:university-of-alberta","role_in_article":"mentioned"},{"name":"Google DeepMind","type":"organization","metadata":{"parent":null,"ticker":null},"canonical_id":"org:us:google-deepmind","role_in_article":"mentioned"},{"name":"OpenAI","type":"company","metadata":{"parent":null,"ticker":null},"canonical_id":"corp:us:openai","role_in_article":"mentioned"},{"name":"Ilya Sutskever","type":"person","metadata":{"parent":null,"ticker":null},"canonical_id":"person:us:ilya-sutskever","role_in_article":"quoted"}],"headline":"World's Top 9 AI Models Take Test... Not One Completely Conquered 200 Tasks","geography":["US","KR"],"ai_emotional_context":{"arousal":0,"valence":0,"primary_emotions":[],"emotional_triggers":[],"secondary_emotions":[]}},"@context":"https://neupai.io/schema/v0.2","identity":{"ai_url":null,"author":"정재엽 기자","language":"en","publisher":{"name":"테크42","type":"online","domain":"www.tech42.co.kr"},"article_id":"tech42_20260511_ai-programming-benchmark-zero-success","updated_at":null,"originality":"self_produced","article_type":"analysis","published_at":"2026-05-11T23:09:14.000Z","canonical_url":"https://www.tech42.co.kr/%ec%84%b8%ea%b3%84-%ec%b5%9c%ea%b3%a0-ai-9%ec%a2%85-%ec%8b%9c%ed%97%98-%eb%b4%a4%eb%8d%94%eb%8b%88200%ea%b0%9c-%ea%b3%bc%ec%a0%9c-%ec%99%84%ec%a0%84-%ec%a0%95%eb%b3%b5-%eb%8b%a8-%ed%95%98/?utm_source=rss&utm_medium=rss&utm_campaign=%25ec%2584%25b8%25ea%25b3%2584-%25ec%25b5%259c%25ea%25b3%25a0-ai-9%25ec%25a2%2585-%25ec%258b%259c%25ed%2597%2598-%25eb%25b4%25a4%25eb%258d%2594%25eb%258b%2588200%25ea%25b0%259c-%25ea%25b3%25bc%25ec%25a0%259c-%25ec%2599%2584%25ec%25a0%2584-%25ec%25a0%2595%25eb%25b3%25b5-%25eb%258b%25a8-%25ed%2595%2598"},"temporal":{"freshness":"recent","next_update_expected":null},"provenance":{"source_chain":["primary_reporting"],"related_articles":[],"original_source_url":null}}