Skip to content

Chapter 2 References — AI Capability Curve

Curated resources for deeper exploration of the topics in this chapter.

Books

  • Kurzweil, Ray. (2005). The Singularity Is Near: When Humans Transcend Biology. Viking. Introduces the concept of exponential technological growth using Moore's Law as an analogy, foundational for understanding why AI capability curves appear surprising.

  • Brynjolfsson, Erik, and Andrew McAfee. (2014). The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. W. W. Norton. Uses the "second half of the chessboard" metaphor to explain why exponential AI progress catches institutions off guard — directly relevant to how educators should interpret benchmark data.

Articles and Reports

  • METR. (2024). "Measuring the Ability of Language Models to Complete Long-Horizon Tasks." METR Technical Report. https://metr.org/blog/2024-08-06-update-on-evaluations/ The primary source for the METR task-horizon doubling finding referenced throughout this chapter; describes how agent autonomy is measured over time.

  • Hendrycks, Dan, et al. (2020). "Measuring Massive Multitask Language Understanding." arXiv. https://arxiv.org/abs/2009.03300 Introduces the MMLU benchmark that has become a standard measure of AI capability across academic subjects, helping readers understand how AI progress is tracked.

  • Epoch AI. (2024). "Machine Learning Model Performance." https://epochai.org/trends Tracks trends in AI compute, training data, and benchmark scores over time; essential context for understanding the capability curve this chapter describes.

  • McKinsey Global Institute. (2023). "The Economic Potential of Generative AI." McKinsey & Company. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai Quantifies AI cost trends and productivity implications, supporting the chapter's discussion of AI costs falling rapidly alongside rising capabilities.

  • Sevilla, Jaime, et al. (2022). "Compute Trends Across Three Eras of Machine Learning." arXiv. https://arxiv.org/abs/2202.05924 Documents the doubling timescales for AI training compute, giving educators data to evaluate the Moore's Law analogy.

Online Resources

  • Our World in Data. (2024). AI Progress. https://ourworldindata.org/artificial-intelligence Provides free, openly licensed charts showing AI benchmark progress over time, ideal for school board presentations.

  • Stanford HAI. (2024). AI Index Report 2024. https://aiindex.stanford.edu/report/ The most comprehensive annual report tracking AI capabilities, costs, and adoption across sectors including education.

  • Eleuther AI. (2024). Language Model Evaluation Harness. https://github.com/EleutherAI/lm-evaluation-harness The open-source framework behind many public benchmark comparisons; helps readers understand how benchmark saturation is measured.

Videos

  • Two Minute Papers. (2023). "GPT-4 Is Impressive, But What Comes Next?" YouTube. https://www.youtube.com/c/K%C3%A1rolyZsolnai-Feh%C3%A9r Accessible summaries of recent AI capability jumps that illustrate the pace of change described in this chapter.