Porn, dog poop, and social media photos: the "taskers" who are scraping the internet for Meta's AI company.

Porn, dog poop, and social media photos: the "taskers" who are scraping the internet for Meta's AI company.

A company partly owned by Meta has paid tens of thousands of people to train artificial intelligence by sifting through Instagram accounts, collecting copyrighted material, and transcribing pornographic audio, the Guardian can reveal.

Scale AI, which is 49% controlled by Mark Zuckerberg’s social media empire, recruited experts in fields like medicine, physics, and economics—ostensibly to refine advanced AI systems through a platform called Outlier. Its website advertises flexible work for highly qualified individuals, inviting them to “Become the expert that AI learns from.”

However, workers on the platform say they have been drawn into scraping a wide range of personal data from other people—a practice they describe as morally troubling and far removed from refining high-level AI.

Outlier is managed by Scale AI, a company that holds contracts with the Pentagon and U.S. defense contractors. Its CEO, Alexandr Wang, who is also Meta’s chief AI officer, was labeled by Forbes as the “world’s youngest self-made billionaire.” Its former managing director, Michael Kratsios, served as science adviser to former President Donald Trump.

One U.S.-based Outlier contractor said users of Meta platforms like Facebook and Instagram would be surprised to learn how their account data—including photos of themselves and their friends—is being collected. “I don’t think people understood that there’d be somebody at a desk in a random state, looking at your [social media] profile and using it to generate AI data,” they said.

The Guardian spoke with 10 people who have worked for Outlier training AI systems, some for over a year. Many held other jobs as journalists, graduate students, teachers, or librarians. But in an economy increasingly threatened by AI, they sought the extra income. “A lot of us were really desperate,” one worker said. “Many people really needed this job, myself included, and tried to make the best of a bad situation.”

Like the growing global class of AI gig workers, most believed they were training their own replacements. One artist spoke of “internalized shame and guilt” for “contributing directly to the automation of my hopes and dreams.” They added, “As an aspiring human, it makes me angry at the system.”

Glenn Danas, a partner at the law firm Clarkson, which represents AI gig workers in lawsuits against Scale AI and similar platforms, estimates that hundreds of thousands of people worldwide now work for platforms like Outlier. The Guardian spoke with Outlier workers, known as “taskers,” in the UK, U.S., and Australia.

In interviews, taskers described the now-familiar humiliations of AI gig work: constant monitoring and unstable, piecemeal employment. Scale AI has been accused of using “bait-and-switch” tactics—advertising high salaries during recruitment, then offering significantly lower pay. Scale AI declined to comment on ongoing litigation, but a source said pay rates only change if workers choose to join different, lower-paid projects.

Taskers reported having to complete repeated, unpaid AI interviews to qualify for certain assignments; several believed these interviews were reused to train AI. All said they were constantly monitored through a platform called Hubstaff, which could take screenshots of the websites they visited while working. The Scale AI source said Hubstaff is used to ensure accurate payment, not to “actively monitor” taskers.

Several taskers described being asked to transcribe pornographic audio or label images of dead animals or dog feces. One doctoral student said they had to label a diagram of infant genitalia. Others transcribed police calls describing violent incidents.

“We had already been told before that the…””There would be no nudity in this mission. Appropriate behavior, no gore, like no blood,” said the student. “But then I would get an audio transcript for porn, or there would just be random clips of people throwing up for some reason.”

The Guardian has reviewed videos and screenshots of some tasks Outlier required its workers to perform. These included photos of dog feces and prompts such as, “What would you do if an inmate refused to follow orders in a correctional facility?”

A source from Scale AI stated that the company shuts down tasks if inappropriate content is flagged and that workers are not required to continue with tasks that make them uncomfortable. The source added that Scale AI does not take on projects involving child sexual abuse material or pornography.

Outlier workers indicated there was an expectation of social media scraping. Seven taskers described scouring other people’s Instagram and Facebook accounts, tagging individuals by name, location, and friends. Some tasks involved training AI on accounts of people under 18. The assignments were structured to require new data not yet uploaded by other workers, pushing them to delve into more people’s social media accounts.

The Guardian has seen one such task requiring workers to select photos from individuals’ Facebook accounts and order them sequentially by the age of the person in the photo.

Several taskers found these assignments unsettling; one tried to complete them using only photos of celebrities and public figures. “I was uncomfortable including pictures of kids and stuff, but the training materials would have kids in it,” said one worker.

“I didn’t use any friends or family to submit tasks to the AI,” said another. “I do understand that I don’t like it ethically.”

The Scale AI source said taskers did not review private social media accounts and was not aware of tasks involving labeling individuals’ ages or personal relationships. They added that Scale AI does not take on projects with explicit sensitive content related to children but does use children’s public social media data. Workers did not log into personal Facebook or Instagram accounts to complete these tasks.

For another assignment, taskers described harvesting images of copyrighted artwork. Similar to the social media training, the task required constant new input—apparently to train an AI to produce its own artistic images. As workers ran out of options, they turned to the social media accounts of artists and creators.

The Guardian has seen documentation of this assignment, which included AI-generated paintings of “a Native American caregiver” and the instruction: “DO NOT use AI-generated images. Only select hand-drawn, painted, or illustrated artwork created by human artists.”

The Scale AI source said the company does not ask contributors to use copyrighted artwork to complete assignments and declines work that violates this standard.

Taskers also expressed uncertainty about what they might be training the AI to do and how their submissions would be used.

“It does seem like labeling diagrams is something an AI can already do, so I’m really curious as to why we need things like dead animals,” said one.

Scale AI’s clients have included major technology companies such as Google, Meta, and OpenAI, as well as the U.S. Department of Defense and the government of Qatar. The company addresses a growing need as AI models expand: for new, labeled data to train them.

Taskers described interacting with ChatGPT and Claude or using data from Meta to complete assignments; some thought they might be training Meta’s new model, Avocado.

Meta and Anthropic did not respond to a request for comment. OpenAI stated it stopped working with Scale AI in June 2025 and that its “supplier code of conduct sets out clear expectations for the ethical and fair treatment of all.”Most of the taskers the Guardian spoke with continue to take on work through the Outlier platform. The income is inconsistent, and there are sometimes large-scale cuts. Yet, with the AI era rapidly approaching, they feel there may be few alternatives.

“I have to stay optimistic about AI because the outlook otherwise isn’t great,” one worker said. “So I believe things will eventually work out.”

A Scale AI spokesperson stated: “Outlier offers flexible, project-based work with clear compensation. Contributors decide when and how much to engage, and opportunities fluctuate based on project demand. We often hear from highly skilled individuals who appreciate the flexibility and the chance to use their expertise on our platform.”

Frequently Asked Questions
FAQs About Data Collection for AI Training

Disclaimer This FAQ addresses a reported practice of using publicly available online data to train artificial intelligence The specific examples in your query are used here as illustrative categories of the vast range of internet content that may be scraped This FAQ aims to provide clear factual information about the general process

BeginnerLevel Questions

1 What are taskers in this context
Taskers is an informal term often used to describe the workers or automated systems responsible for collecting and labeling vast amounts of online data Their task is to gather this data so it can be used to train AI models

2 Why does an AI company need this kind of data
AI models especially those that generate or understand images and text learn by analyzing massive diverse datasets To handle the real world they need examples of everything people talk about post and search for onlinefrom everyday social media photos to more niche or explicit content This helps the AI understand context recognize objects and generate relevant responses

3 Is my private social media data being taken
Generally AI companies state they train their models on publicly available information This typically means content youve posted with public privacy settings Private messages private accounts or passwordprotected content should not be part of these datasets Always check your privacy settings on social platforms

4 What does scraping the internet mean
Web scraping is the use of automated tools to systematically browse websites and copy publicly available text images and metadata Its like a very fast automated version of copying and pasting information

5 Is this legal
The legality is complex and varies by jurisdiction It often operates in a gray area governed by a websites Terms of Service and copyright law Many companies rely on the argument that using publicly available data for AI training falls under fair use but this is being actively debated and challenged in courts worldwide

Advanced Practical Questions

6 Why would an AI need to see offensive or disturbing content
To safely and effectively moderate content or answer questions about sensitive topics an AI must be able to recognize them Training on such data helps the