Privacy Concerns Raised: Report Identifies Children’s Photos Used to Train AI

  • By Farrukh Mushtaq

    Farrukh Mushtaq

    Author Image

    Farrukh Mushtaq, a digital marketer at PureSquare, possesses a keen interest in cybersecurity and enjoys writing about it. With several years of experience in the digital marketing industry, he brings expertise and passion to his work.

    See author profile
  • 5 July 2024
  • 6 mins read

Table of Content

Table of Contents

A recent investigation has sparked new concerns about children's online privacy. 

The research claims that even when privacy settings are enabled, images of youngsters shared online are collected and used to train artificial intelligence systems. 

This raises serious ethical concerns about the use of personal data, especially when youngsters cannot consent.

LAION-5B Faces Inquiry For Using Scraped Web Images

According to the human rights advocacy group, even images posted to various sites with strong privacy settings are being scraped from the internet as part of a larger dataset used to train popular AI systems.

Hye Jung Han, a Human Rights Watch researcher, studied a small portion of the LAION-5B dataset, which is a publicly available collection of 5.85 billion multilingual image-text pairs. 

The data is obtained from online archives compiled by Common Crawl, a San Francisco-based nonprofit group that openly distributes copies of data scraped from the internet for research and analysis.

LAION-5B is a prominent dataset for AI developers to train their models.

Dataset Includes Personal Photos, Many Not Publicly Available

Some of the photo links included in the LAION-5B dataset were from personal blogs, school posts, and family photographers hired to take personal photos. According to the research, some photographs were uploaded a decade before LAION-5B was produced.

Many of the images did not appear to be accessible via an online search or the publicly accessible versions of the websites from which they originated, implying that the dataset evades the privacy safeguards imposed by people who posted them, according to the research.

How Many Identifiable Images Found in AI Training Data?

Han said she discovered 190 identifiable photographs of Australian youngsters, including those from the country's Indigenous tribes while studying less than 0.0001% of the LAION-5B dataset. Last month, she discovered 170 photographs of Brazilian children in the data archive.

LAION-5B does not include the photographs themselves, only links to where they are housed and supporting captions. However, Han stated that some of the URLs in the collection contained children's names and information, making it easier to trace their identities.

Human Rights Watch stated in a press release about Han's research that:

One such photo features two boys, ages 3 and 4, grinning from ear to ear as they hold paint brushes in front of a colorful mural. The accompanying caption reveals both children’s full names and ages, and the name of the preschool they attend in Perth, in Western Australia.

Ensuring Children's Privacy in the Age of AI is the Biggest Concern!

The fact that AI development may be compromising children's privacy, even with precautions in place, necessitates immediate action. 

Finding a balance between technological innovation and protecting vulnerable people requires coordination among tech corporations, rights organizations, and legislators. Only then can we ensure the proper use of data and protect children's online presence.
And if you are concerned about cybercriminals stealing your publicly available information from multiple platforms, consider using an all-in-one privacy solution. Get PurePrivacy to protect your personal information and stop data collection from invisible, online trackers.