Skip to main content

Study: TikTok tops list of most 'scraped' websites for AI training

Zach Russell headshot
TikTok
TikTok has over 1.5 billion active users worldwide.

Video-first platforms are now the most “scraped” websites for consumer data.

According to a new analysis from public data access platform Decodo, TikTok now tops the list of the most-scraped websites for companies looking for videos, images and audio to train their AI solutions. Last year, the video platform was not in the top 10, representing a 321% increase in scraping. With over 1.5 billion active users and a unique algorithm-driven discovery system, Decodo says this change reflects the AI industry's appetite for “short-form video content and cultural trend analysis” to train next-generation multimodal models.

Similarly, YouTube, which was also not ranked in the top 10 last year, came in at number four, an increase of 240% year over year. With over 500 hours of content uploaded every minute and 2.7 billion monthly users, Decodo noted that the platform has become a go-to source for businesses building smarter AI systems.

[READ MORE: EXCLUSIVE: Here’s what drives impulse online purchases]

“The combined scraping activity from YouTube, TikTok, and other video platforms now represents over a third of all scraping requests,” said Decodo. “This surge is driven by the demand for multimodal training data, where video, audio, and text are collected together. These platforms also provide real-time signals on consumer behavior, trends, and product sentiment, making them invaluable for both AI development and market insights.”

Advertisement - article continues below
Advertisement
Decodo 2025 most scraped websites
Decodo 2025 most scraped websites

Google (#2), Amazon (#3) and Walmart (#5) rounded out the top five. ScienceDirect, Crunchbase, Coupang and Airbnb were the websites that entered into the top 10 from last year. TripAdvisor, Craigslist, Bing, Shopify, Lazada and Zillow left the top 10.

By category, video and social media platforms represented a plurality of most scraped websites at 38%. Search engines followed at 24%, with e-commerce platforms right behind at 22%. Professional and academic sources (8%), travel and hospitality sites (5%), and miscellaneous websites and specialized platforms (3%).

"We're seeing a clear move toward websites that have lots of different types of content instead of just basic info,” said Vaidota Juknys, head of commerce at Decodo. “The biggest reason for this shift is that everyone needs tons of varied, good-quality data to train AI chatbots, language models, and other smart tools. Companies operating in various industries are also realizing that the best insights come from mixing different kinds of content together – videos, text, images and how people interact with certain platforms."

X
This ad will auto-close in 10 seconds