Can We Trust AI Benchmarks? A Review of Current Issues in AI Evaluation