Article by Cade Metz, Cecilia Kang, Sheera Frenkel, Stuart A. Thompson and Nico Grant in the New York Times, April 6, 2024

The tech giants’ greed for data: How AI models are being trained at the expense of ethics.

The New York Times article examines the ways in which major tech companies, notably OpenAI, Google and Meta, have crossed ethical and legal boundaries to collect vast amounts of data needed to train their artificial intelligence systems. As AI technology has advanced, the need for large, diverse data sets has become critical, prompting companies to adopt controversial practices. This includes ignoring company guidelines, bending copyright laws and exploiting loopholes in user agreements.

OpenAI developed a YouTube video transcription tool to collect conversational text to train its AI models, despite potential conflicts with YouTube’s rules. Google similarly used transcripts of YouTube videos for its AI, without clear authorization. Meta discussed the acquisition of large publishers such as Simon & Schuster in order to avoid lengthy license negotiations for data access.

These practices highlight a growing tension between the relentless pursuit of data to fuel AI advancements and the ethical implications of such data acquisition methods.