How to Transcribe YouTube Videos and Generate Subtitles Using Node.js - Blockchain.News
News

How to Transcribe YouTube Videos and Generate Subtitles Using Node.js

Learn to transcribe YouTube videos and generate SRT subtitles with Node.js and AssemblyAI in this comprehensive guide.


  • Jun 25, 2024 16:28
How to Transcribe YouTube Videos and Generate Subtitles Using Node.js

In a recent tutorial by AssemblyAI, developers can learn how to transcribe YouTube videos and generate SRT subtitles using Node.js. This guide not only covers the transcription process but also demonstrates how to create subtitles and leverage the LeMUR framework for video prompting using a Large Language Model (LLM).

Step 1: Set Up Your Development Environment

To get started, developers need to install Node.js 18 or higher. After setting up the project directory and initializing a new Node.js project, the package.json file should be configured to use ES Module syntax.

Next, install the necessary NPM modules:

  • assemblyai: Installs the AssemblyAI JavaScript SDK to interact with the AssemblyAI API.
  • youtube-dl-exec: A wrapper for the yt-dlp CLI tool to retrieve and download YouTube video information.
  • tsx: Allows execution of TypeScript code without additional setup.

Additionally, Python 3.7 or above is required for youtube-dl-exec. An AssemblyAI API key is also necessary, which can be configured as an environment variable on the machine.

Step 2: Retrieve the Audio of a YouTube Video

To transcribe a video, a public URL to the audio track is needed. YouTube stores audio and video separately, and the youtube-dl-exec module can retrieve this information. The following script retrieves the audio URL from a YouTube video:

import { youtubeDl } from "youtube-dl-exec";

const youtubeVideoUrl = "https://www.youtube.com/watch?v=wtolixa9XTg";

console.log("Retrieving audio URL from YouTube video");
const videoInfo = await youtubeDl(youtubeVideoUrl, {
  dumpSingleJson: true,
  preferFreeFormats: true,
  addHeader: ["referer:youtube.com", "user-agent:googlebot"],
});

const audioUrl = videoInfo.formats.reverse().find(
  (format) => format.resolution === "audio only" && format.ext === "m4a",
)?.url;

if (!audioUrl) {
  throw new Error("No audio only format found");
}
console.log("Audio URL retrieved successfully");
console.log("Audio URL:", audioUrl);

With the audio URL, the audio can be transcribed using AssemblyAI.

Step 3: Save the Transcript and Subtitles

Once the transcription is complete, the transcript text can be saved to a file. The following code saves the transcript and generates SRT subtitles:

import { writeFile } from "fs/promises"

console.log("Saving transcript to file");
await writeFile("./transcript.txt", transcript.text!);
console.log("Transcript saved to file transcript.txt");

console.log("Retrieving transcript as SRT subtitles");
const subtitles = await aaiClient.transcripts.subtitles(transcript.id, "srt");
await writeFile("./subtitles.srt", subtitles);
console.log("Subtitles saved to file subtitles.srt");

To generate WebVTT subtitles, simply replace "srt" with "vtt" and save the file with the .vtt extension.

Step 4: Run the Script

To execute the script, use the following command:

npx tsx index.ts

The transcript text and subtitles will be saved to the disk, with the process duration depending on the length of the YouTube video.

Bonus: Prompt a YouTube Video Using LeMUR

AssemblyAI’s LeMUR framework allows developers to build generative AI features. By writing prompts for the LLM, developers can generate responses based on the transcript. For instance, a prompt to summarize the video using bullet points can be implemented as follows:

console.log("Prompting LeMUR to summarize the video");

const prompt = "Summarize this video using bullet points";
const lemurResponse = await aaiClient.lemur.task({
  transcript_ids: [transcript.id],
  prompt,
  final_model: "default"
});
console.log(prompt + ": " + lemurResponse.response);

For further customization, various supported models are listed in the LeMUR documentation.

Next Steps

In this tutorial, developers learned to retrieve audio from a YouTube video, transcribe the audio, generate subtitles, and summarize the video using LeMUR. For more capabilities, check out AssemblyAI’s Audio Intelligence models and LeMUR.

Image source: Shutterstock