Towards Apples to Apples for AI Evaluations: From Real-World Use Cases to Evaluation ScenariosJan 1, 2026ยทYY Choong,K Greene,A QIAN,M Marasli,Z Yang,S Chen,L Dabbish,...ยท 0 min readTypeJournal articlePublicationarXiv preprint arXiv:2605.07986Last updated on Jan 1, 2026 ← Locating Risk: Task Designers and the Challenge of Risk Disclosure in Crowdsourced RAI Content Work Jan 1, 2026Worker Discretion Advised: Co-designing Risk Disclosure in Crowdsourced Responsible AI (RAI) Content Work Jan 1, 2026 →