We help companies hire & upskill developers. A customer recently asked: What % of HackerRank problems can LLMs solve? That got us thinking—how should hiring evolve when AI can translate natural language to code?
Our belief: AI will handle much of code generation, so developers will be assessed more on SDLC skills with AI assistants.
To explore this, we’re benchmarking LLMs on real-world software dev scenarios—starting with 65 unseen problems across 10 domains. Beyond correctness, we evaluated consistency—an often overlooked aspect of AI reliability. We’re open-sourcing the dataset on Huggingface and expanding it to cover more domains, ambiguous specs, and harder challenges.
Would love the HN community’s take on this!
Comments URL: https://news.ycombinator.com/item?id=43015631
Points: 4
# Comments: 0
Connectez-vous pour ajouter un commentaire
Autres messages de ce groupe
![Hackers leak cop manuals for departments after breaching major provider](https://www.cdn5.niftycent.com/a/D/m/8/9/G/a/hackers-leak-cop-manuals-for-departments-after-breaching-major-provider.webp)
Article URL: https://www.dailydot.com/debug/lexipol-data-leak-puppygirl-hacker-polycule/
Comments U
![WASM will replace containers](https://www.cdn5.niftycent.com/a/D/Z/3/r/q/b/wasm-will-replace-containers.webp)
Article URL: https://creston.blog/wasm-will-replace-containers/
![Implementing the President's "DOGE" Workforce Optimization Initiative](https://www.cdn5.niftycent.com/a/k/z/7/M/B/G/implementing-the-president-s-doge-workforce-optimization-initiative.webp)
Article URL: https://www.whiteh
![Postmortem: The singular design of Namco's Katamari Damacy (2004)](https://www.cdn5.niftycent.com/a/D/2/o/n/P/K/postmortem-the-singular-design-of-namco-s-katamari-damacy-2004.webp)