We help companies hire & upskill developers. A customer recently asked: What % of HackerRank problems can LLMs solve? That got us thinking—how should hiring evolve when AI can translate natural language to code?
Our belief: AI will handle much of code generation, so developers will be assessed more on SDLC skills with AI assistants.
To explore this, we’re benchmarking LLMs on real-world software dev scenarios—starting with 65 unseen problems across 10 domains. Beyond correctness, we evaluated consistency—an often overlooked aspect of AI reliability. We’re open-sourcing the dataset on Huggingface and expanding it to cover more domains, ambiguous specs, and harder challenges.
Would love the HN community’s take on this!
Comments URL: https://news.ycombinator.com/item?id=43015631
Points: 4
# Comments: 0
Login to add comment
Other posts in this group

Article URL: https://github.com/habedi/hann
Comments URL: https://news.ycombinator.com/i

Article URL: https://github.com/facebookresearch/vggt
Comments URL: https://ne
Article URL: https://utcc.utoronto.ca/~cks/space/blog/sysadmin/RunMoreExtraNetworkFiber
Comments URL

Article URL: https://beej.us/guide/bgc/pdf/bgc_a4_c_1.pdf
Comments URL: ht
Article URL: https://danilafe.com/blog/chapel_x_macros/
Comments URL: https: