Ben and Ryan are joined by Robin Gupta for a conversation about benchmarking and testing AI systems. They talk through the lack of trust and confidence in AI, the inherent challenges of nondeterministic systems, the role of human verification, and whether we can (or should) expect an AI to be reliable. https://stackoverflow.blog/2024/05/24/would-you-board-a-plane-safety-tested-by-genai/
Login to add comment
Other posts in this group
A look at some of the current thinking around chunking data for retrieval-augmented generation (RAG) systems. https://stackoverflow.blog/2024/12/27/breaking-up-is-hard-to-do-chunking-in-rag-applicatio
During the holidays, we’re releasing some highlights from a year full of conversations with developers and technologists. Enjoy! We’ll see you in 2025. https://stackoverflow.blog/2024/12/24/balancing-
There’s no silver bullet for this type of ghost. https://stackoverflow.blog/2024/12/26/the-ghost-jobs-haunting-your-career-search/
Single individuals make less of a difference to the success or failure of a technology project than you might think (and that’s a good thing). https://stackoverflow.blog/2024/12/25/the-real-10x-devel
A developer’s journal is a place to define the problem you’re solving and record what you tried and what worked. https://stackoverflow.blog/2024/12/24/you-should-keep-a-developer-s-journal/
During the holidays, we’re releasing some highlights from a year full of conversations with developers and technologists. Enjoy! We’ll see you in 2025. https://stackoverflow.blog/2024/12/24/how-develo
Computer science deals with concurrency, but what about simultaneity? https://stackoverflow.blog/2024/12/23/can-a-programming-language-implement-time-travel/