Ben and Ryan are joined by Robin Gupta for a conversation about benchmarking and testing AI systems. They talk through the lack of trust and confidence in AI, the inherent challenges of nondeterministic systems, the role of human verification, and whether we can (or should) expect an AI to be reliable. https://stackoverflow.blog/2024/05/24/would-you-board-a-plane-safety-tested-by-genai/
Melden Sie sich an, um einen Kommentar hinzuzufügen
Andere Beiträge in dieser Gruppe

An update on recent launches and the upcoming roadmap https://stackoverflow.blog/2025/04/23/community-products-roadmap-update-april-2025/

Ryan chats with Dataiku CEO and cofounder Florian Douetteau about the complexities of the genAI data stack and how his company is orchestrating it. https://stackoverflow.blog/2025/04/22/visually-orch

On today’s episode, Ben and Ryan chat with Laly Bar-Ilan, Chief Scientist at Bit. https://stackoverflow.blog/2025/04/18/generating-components-not-tokens/

Is “agentic AI” just a buzzword, or is it the sea change it seems? https://stackoverflow.blog/2025/04/17/wait-what-is-agentic-ai/

Kyle chats with Jesse Tomchak a software engineer at ClickUp about all the spicy backend takes they could find. https://stackoverflow.blog/2025/04/09/wbit-6-be-curious-ask-questions-and-don-t-argue-w

AI is not a linear process. To scale effectively, engineering leaders must account for varied edge cases, presenting a new set of challenges. https://stackoverflow.blog/2025/04/16/engineering-teams-ne

Kyle interviews Michael Stum, a former Stacker who started (and returned) to answering questions on the community site. https://stackoverflow.blog/2025/04/16/wbit-7-exploring-webassembly-with-the-fir