Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models. Millions of images of passports, credit cards ...
New research from the Data Provenance Initiative has found a dramatic drop in content made available to the collections used to build artificial intelligence. By Kevin Roose Reporting from San ...