Scraping all TikTok videos from a profile

Amber.Leuschke · January 11, 2020, 8:03pm

I’m trying to create a tool that downloads all videos from a given TikTok user’s page, e.g. https://www.tiktok.com/@levelsofpiano. Examining the HTML tree on the profile page revealed a tags that hold links to each video page.

I attempted to use wget to capture the page using wget https://www.tiktok.com/@levelsofpiano > Output.html. However, the resulting HTML didn’t contain any mention of @levelsofpiano.

I then decided to use testcafe (UI testing tool like Selenium) to load the page, wait 20 seconds, and capture the HTML output. However, the videos didn’t load:

import { Selector, ClientFunction } from 'testcafe';
import fs from 'fs';

let username = "levelsofpiano";
fixture `Get Dat Tiktok`.page("https://www.tiktok.com/@" + username);

function sleep(ms) { return new Promise(resolve => setTimeout(resolve, ms)); }

/* got this definition from https://testcafe-discuss.devexpress.com/t/can-i-save-a-web-page-as-an-html-file/461 */
const getPageHTML = ClientFunction(() => document.documentElement.outerHTML);

test('Capture page with loaded elements', async t => {
    await sleep(20000); //20 seconds
    await fs.writeFile('./' + username + '.html',await getPageHTML(), function(err, result) {
        if(err) console.log('error', err);
    });    
});

What else can I try to scrape all these videos? I’m likely going to need a way to scroll through the page to load all videos as well.

Buddy.White75 · January 12, 2020, 6:20am

You can try using a headless browser like Puppeteer to scrape the videos and scroll through the page. Here’s an example script:

const puppeteer = require('puppeteer');
const fs = require('fs');

async function scrapeVideos(username) {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(`https://www.tiktok.com/@${username}`);

  // Scroll to the bottom of the page to load all videos
  await page.evaluate(async () => {
    await new Promise(resolve => {
      let totalHeight = 0;
      let distance = 100;
      let timer = setInterval(() => {
        let scrollHeight = document.body.scrollHeight;
        window.scrollBy(0, distance);
        totalHeight += distance;
        if (totalHeight >= scrollHeight) {
          clearInterval(timer);
          resolve();
        }
      }, 100);
    });
  });

  // Get the URLs of all videos
  const videoLinks = await page.evaluate(() => {
    const links = Array.from(document.querySelectorAll('a[href^="/@"]'));
    return links.map(link => link.href);
  });

  // Download each video
  for (let i = 0; i < videoLinks.length; i++) {
    const videoPage = await browser.newPage();
    await videoPage.goto(`https://www.tiktok.com${videoLinks[i]}`);
    const videoUrl = await videoPage.evaluate(() => {
      const videoElement = document.querySelector('video');
      return videoElement ? videoElement.src : null;
    });
    if (videoUrl) {
      const fileName = `${username}_${i}.mp4`;
      const file = fs.createWriteStream(fileName);
      const response = await page.goto(videoUrl);
      response.body.pipe(file);
      console.log(`Downloaded ${fileName}`);
    }
    await videoPage.close();
  }

  await browser.close();
}

scrapeVideos('levelsofpiano');

This script uses Puppeteer to open a headless browser, navigate to the TikTok user’s page, scroll to the bottom of the page to load all videos, and extract the URLs of each video. It then opens each video page, extracts the URL of the video file, and downloads the file using Node.js’s fs module. The video files are saved with a filename in the format <username>_<index>.mp4.