Skip to main content
Convert srt to text with regex in JavaScript

Convert srt to text with regex in JavaScript

in this tutorial help to convert srt to text using JavaScript. The srt format, which stands for SubRip Subtitle, is commonly used to store subtitles or captions for videos.

You may use JavaScript’s regular expressions in a number of different ways to convert srt to text. We’ll cover the following ways to convert srt to text :

  • Convert srt to text using replace() method
  • Convert Using srt-to-txt module
  • Convert Using srt-to-vtt module
  • Using match() method

Checkout Other tutorials:

What’s SRT File

A popular file format for storing subtitles is called SubRip Text (SRT), and it’s frequently used to provide closed captions for videos. You might need to transform the SRT data into plain text if you’re using SRT files in a JavaScript project.

Understanding the SRT File Format

Let’s quickly understand the structure of SRT files. Each subtitle in an SRT file consists of three parts:

  1. Subtitle index number.
  2. Timecodes indicating when the subtitle should appear and disappear on the screen.
  3. The text content of the subtitle.

A typical SRT file looks like this:

1
00:00:10,000 --> 00:00:15,000
Hello, this is the first subtitle.

2
00:00:20,000 --> 00:00:25,000
And this is the second subtitle.

As you can see above file, Each subtitle entry starts with an index number, followed by the timecodes (start time and end time) in the format hh:mm:ss,ms, and finally, the text content of the subtitle.

Convert srt to Text Using JavaScript

You may need to extract the text content from SRT files for further processing or analysis. Let’s demonstrate how to convert srt to text using regular expressions in JavaScript.

Convert SRT to Text using replace method

We need to extract only the text content from each subtitle entry. We can achieve this using JavaScript and Regular Expressions.

Step 1: Read the SRT File

We need to read the contents of the SRT file. In a browser environment, you can use the File API or fetch API to read the file. In a Node.js environment, you can use the built-in fs module to read the file.

// In a browser environment
const fileInput = document.getElementById('fileInput');

fileInput.addEventListener('change', (event) => {
  const file = event.target.files[0];
  const reader = new FileReader();

  reader.onload = (e) => {
    const srtContent = e.target.result;
    // Call the function to convert the SRT to text here
  };

  reader.readAsText(file);
});

Step 2: Extract Text Using Regular Expressions

We will use replace method with Regular Expressions to extract the text content from the SRT file. We’ll define a function that takes the SRT content as input and returns the extracted text.

function convertSrtToText(srt) {
  return srt.replace(/^\d+\n([\d:,]+ --> [\d:,]+\n)/gm, '');
}

The above convertSrtToText() function uses a regular expression to remove line numbers and timestamps from the SRT file. It returns the remaining text of the SRT file, minus the line numbers and timestamps.

How To call method

We need to pass the SRT file content as a parameter, as in the following example:

var srt = "file content ";
var text = convertSrtToText(srt);
console.log(text);

CONVERT SRT USING JS MODULES

The srt-to-vtt module is an npm package that can be used to convert SRT files to text. You must install it using the following command before using it:

npm install srt-to-vtt

Let’s use above module to extract text from srt file:

const srtToVtt = require('srt-to-vtt');

srtToVtt.convertSrtToVtt('path/to/input.srt', 'path/to/output.vtt', (err) => {
  if (err) {
    console.error(err);
  } else {
    console.log('Successfully! converted SRT to Text');
  }
});

Convert SRT Using JS Modules

The srt-to-txt module is another npm package that can be used to convert SRT files to text. You must instal it using the following command before using it:

npm install srt-to-txt

Now, We’ll use above module method to convert SRT to text.

const srtToTxt = require('srt-to-txt');

srtToTxt('path/to/input.srt').then((text) => {
  console.log(text);
});

Use match() method

You can use fs module match method that you could use to convert an srt file to text in JavaScript:

//create fs instance
const fs = require('fs');

// Read srt file
const srtFile = fs.readFileSync('/path/to/file.srt', 'utf8');

// Split the srt file into an array of lines
const lines = srtFile.split('\\n');

// Use a for loop to iterate over the lines in the array
for (let i = 0; i < lines.length; i++) {
  if (lines[i].match(/^\\d+$/) || lines[i].match(/^\\d{2}:\\d{2}:\\d{2},\\d{3} --> \\d{2}:\\d{2}:\\d{2},\\d{3}$/)) {
    continue;
  }

  // Print text from the srt file
  console.log(lines[i]);
}

Conclusion

in this article, we learned to convert SRT files to plain text using JavaScript and Regular Expressions. It can be useful in various scenarios, Whether you need to analyze subtitle data, perform language processing, or extract information, These technique allows you to extract the essential text content from SRT files.

Leave a Reply

Your email address will not be published. Required fields are marked *