Can We Trust AI Benchmarks? A Review of Current Issues in AI Evaluation

Article URL: https://arxiv.org/abs/2502.06559

Comments URL: https://news.ycombinator.com/item?id=43057968

Points: 12

# Comments: 1

https://arxiv.org/abs/2502.06559

Created 1mo | Feb 15, 2025, 2:20:06 PM

Login to add comment

Other posts in this group

Sell yourself, sell your work

Sell yourself, sell your work

Article URL: https://www.solipsys.co.uk/new/SellYourselfSellYourWork.html?yc25hn

Comments URL:

Mar 26, 2025, 3:30:12 AM | Hacker news

Better Shell History Search

Better Shell History Search

Article URL: https://tratt.net/laurie/blog/2025/better_shell_history_search.html

Comments URL:

Mar 26, 2025, 3:30:12 AM | Hacker news

You might want to stop running atop

You might want to stop running atop

Article URL: https://rachelbythebay.com/w/2025/03/25/atop/

Comments URL:

Mar 26, 2025, 3:30:09 AM | Hacker news

OpenSMTPD and Maildrop working in concert

OpenSMTPD and Maildrop working in concert

Article URL: http://blog.whenhen.com/posts/opensmtpd-and-maildrop-working-in-concert.html

Comments

Mar 26, 2025, 3:30:07 AM | Hacker news

CIA Director Reveals Signal Comes Installed on Agency Computers

CIA Director Reveals Signal Comes Installed on Agency Computers

Article URL: https://theintercept.com/2025/03/25/signal-chat-encryption-hegseth-cia/

Comments URL:

Mar 26, 2025, 3:30:06 AM | Hacker news

The surreal joy of having an overprovisioned homelab

The surreal joy of having an overprovisioned homelab

Article URL: https://xeiaso.net/talks/2025/surreal-joy-homelab/

Comments URL:

Mar 26, 2025, 3:30:04 AM | Hacker news

Optimizing ML Training with Metagradient Descent

Optimizing ML Training with Metagradient Descent

Article URL: https://arxiv.org/abs/2503.13751

Comments URL: https://news.ycombinator.c

Mar 25, 2025, 10:40:19 PM | Hacker news

Techie