Ben and Ryan are joined by Robin Gupta for a conversation about benchmarking and testing AI systems. They talk through the lack of trust and confidence in AI, the inherent challenges of nondeterministic systems, the role of human verification, and whether we can (or should) expect an AI to be reliable. https://stackoverflow.blog/2024/05/24/would-you-board-a-plane-safety-tested-by-genai/
Autentifică-te pentru a adăuga comentarii
Alte posturi din acest grup

AI is changing how we think about coding. While tools evolve, critical thinking, problem-solving, and creativity remain the essential skills for top developers. https://stackoverflow.blog/2025/04/04/u

At HumanX 2025, Ryan sat down with HumanX CEO Stefan Weitz and Crunchbase CEO Jager McConnell to talk about where the money is in the AI space, where most enterprise AI strategies fall short, how comp

Data has always been key to LLM success, but it's becoming key to inference-time performance as well. https://stackoverflow.blog/2025/04/03/from-training-to-inference-the-new-role-of-web-data-in-llms

Efficiently solving a complex scheduling problem using simulated annealing. https://stackoverflow.blog/2025/04/02/not-all-ai-is-generative-efficient-scheduling-with-mathematics/

Two interviews for the price of one, direct from HumanX 2025! Ryan sits down with Raj Patel, AI transformation lead at Holistic AI, and then chats with Audioshake cofounder and CEO Jessica Powell. htt

Deepak Singh, VP of Developer Agents and Experiences at AWS, helps Ryan break down the hype around agentic AI in software development. They cover the definition and real-world functionality of AI agen

In this episode of Leaders of Code, host Ben Popper, Stack Overflow CEO Prashanth Chandrasekar, and GitLab Field CTO Lee Faus explore how GenAI is reshaping software development practices. https://sta