How to Transcribe Audio and Video to Text in Python with AssemblyAI

In this modern technological world, the conversion of audio and video content into text is a game-changer. More specifically, transcription not only improves the accessibility for the hearing impaired but also gives content creators, analysts, Python programmers, and researchers a fantastic tool.

Today’s guide is all about how you can effortlessly transcribe audio and video files into text using the AssemblyAI Python SDK.

Note: Before moving towards the main subject of the article, install Python on your system, if you don’t have it already.

Prerequisites

Before moving toward the transcription process, make sure that you have all of the essential tools and libraries configured and ready to use.

Install Python Virtual Environment

First of all, open Command Prompt or the terminal of your IDE or Code Editor and create a new Python virtual environment named “venv“.

python -m venv venv
Install Python Virtual Environment
Install Python Virtual Environment

Activate Python Virtual Environment

The next step is to activate the created Python virtual environment on your respective operation system.

For Windows:

venv\Scripts\activate
Activate Python Virtual Environment on Windows
Activate Python Virtual Environment on Windows

For Linux or macOS:

source ./venv/bin/activate

Install AssemblyAI Python SDK

After activating the virtual environment, install the AssemblyAI Python SDK.

pip install assemblyai
Install AssemblyAI Python SDK
Install AssemblyAI Python SDK

For the verification of the AssemblyAI library installation, check out its details.

pip show assemblyai

It can be observed that we have successfully installed AssemblyAI Python SDK version “0.15.1“.

Check AssemblyAI Python SDK Installed
Check AssemblyAI Python SDK Installed

Get AssemblyAI API Key

Navigate to the AssemnblyAI official website and click on the highlighted button to sign up.

Specify your Email Address and hit the “Get your API key” button.

Then, copy your API key from the below screen.

Get AssemblyAI API Key
Get AssemblyAI API Key

Copy the API key and save it somewhere for later use.

Audio/Video Transcription Using the AssemblyAI Python SDK

After fulfilling all of the mentioned prerequisites, you are now ready to check out the procedure of transcribing audio and video using the AssemblyAI Python SDK.

1. Import Required Libraries

First, create a Python file with the “.py” extension and import the “assemblyai” library that offers the functionalities for interacting with the AssemblyAI API.

Moreover, this library permits you to transcribe audio and video content into text effortlessly.

import assemblyai as aai

2. Specify the Audio/Video File URL

Next, specify the URL of the desired audio or video file that you want to transcribe. Here, “URL” simply refers to the location of the content you want to convert to text.

URL = "audio_or_video_file_link"

For instance, in our case, we have added a URL of a podcast episode.

URL = "https://talkpython.fm/episodes/download/356/tips-for-ml-ai-startups.mp3"

3. Set the Output File (Optional)

Now, choose whether you want to save the transcript to a file to display it to the console. We recommend you save it to a file if you want to use it for future reference or analysis.

OUTPUT_FILENAME = "filename.txt"

Specify the filename in the double quotes as we did here.

OUTPUT_FILENAME = "example.txt"

4. Configure Transcription Settings

The AssemblyAI Python SDK permits configuring several settings of the transcription process. However, in our case, we are customizing the formatting and punctuation options of the transcript.

Therefore, we have set the value of “punctuate” and “format_text” as “True“.

More specifically, the “TranscriptionConfig” is used for defining these settings.

config = aai.TranscriptionConfig(
    punctuate=True,
    format_text=True
)

5. Initialize the AssemblyAI SDK

Before making API calls, it is required to initialize the AssemblyAI SDK. This operation is based on setting your unique API key, which can be utilized for authentication when interacting with the AssemblyAI service.

aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()

This step ensures authorized and secure access to the API.

6. Call the AssemblyAI API

Now, start the actual transcription process by utilizing the instantiate “Transcriber” object, the defined URL, and the configuration settings as arguments in the “transcribe()” method.

This method will initiate the transcription of the given video or audio content.

transcript = transcriber.transcribe(URL, config)

7. Write Transcription to File or Print to Console

You can either write the transcribed text to a file or print it to your console, it depends on your project requirement. In case, if you selected to write it to a file, the given code will open the given file and write the transcripted text to it.

Otherwise, if no output file has been specified, the transcribed text will be shown on the console.

if OUTPUT_FILENAME:
    with open(OUTPUT_FILENAME, "w") as file:
        file.write(transcript.text)
else:
    print(transcript.text)

8. Complete Code for Transcription

Here is the complete code for transcribing audio or video files. Make sure to mention your Assembly API key in replacement of “YOUR_API_KEY“.

import assemblyai as aai

URL = "https://talkpython.fm/episodes/download/356/tips-for-ml-ai-startups.mp3"

OUTPUT_FILENAME = "example.txt"

config = aai.TranscriptionConfig(
    punctuate = True,
    format_text = True
)

aai.settings.api_key = f"YOUR_API_KEY"
transcriber = aai.Transcriber()

transcript = transcriber.transcribe(URL, config)
OUTPUT_FILENAME = 'example.txt'

if OUTPUT_FILENAME:
    with open(OUTPUT_FILENAME, "w") as file:
            file.write(transcript.text)
else:
    print(transcript.text)
Complete Code for Transcription
Complete Code for Transcription

Why Transcribe Audio and Video Files into Text?

Transcribing audio and video file into text offer several benefits, such as:

  • Accessibility – Ensuring content inclusivity for the hearing impaired.
  • Searchability – Improving SEO and content discoverability.
  • Analysis – Enabling detailed content analysis and data mining.
  • Repurposing – Facilitating content adaptation into various formats.
  • Translation – Allowing accurate content translation.
  • Legal Compliance – Meeting legal requirements for records.
  • Education – Enhancing learning materials and comprehension.
  • Collaboration – Providing textual records for remote teamwork.
  • Data Enrichment – Contributing to training speech recognition models.

That brought us to the end of our today’s guide related to transcription.

Conclusion

The ability to transcribe audio and video files into text is considered a transformative capability in today’s digital life. With the help of AssemblyAI Python SDK, you can not only achieve this functionality but also perform it much more efficiently.

From SEO optimization and accessibility to content analysis and beyond, transcription plays an essential for unlocking different possibilities in your Python project. So, use AssemblyAI Python SDK and utilize your multimedia content to its fullest potential.

Want to explore and learn more related to Python, do check out our dedicated Python Tutorial Series!

If you read this far, tweet to the author to show them you care. Tweet a thanks
As a professional content writer with 3 years of experience, I specialize in creating high-quality, SEO-optimized content that engages, attracts, and retains the audience.

Each tutorial at GeeksVeda is created by a team of experienced writers so that it meets our high-quality standards.

Join the GeeksVeda Weekly Newsletter (More Than 5,467 Programmers Have Subscribed)
Was this article helpful? Please add a comment to show your appreciation and support.

Got Something to Say? Join the Discussion...