paint-brush
How To Build a Multilingual Text-to-Audio Converter With Pythonby@ajayvallab

How To Build a Multilingual Text-to-Audio Converter With Python

by Ajay Krishnan PrabhakaranJanuary 16th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This article explains how to create a multilingual text-to-audio converter using Python. By utilizing the googletrans library for translation and gTTS for text-to-speech conversion, the tool can translate text into various languages and convert it into audio. The implementation involves translating text, converting it to speech, and playing the audio using pygame. This tool has applications in accessibility, language learning, and multilingual communication, making content more universally accessible.
featured image - How To Build a Multilingual Text-to-Audio Converter With Python
Ajay Krishnan Prabhakaran HackerNoon profile picture
0-item
1-item
2-item


"To have another language is to possess a second soul."
— Charlemagne


Imagine you are traveling to a new country and had the ability to seamless have a conversation in their local language. That is what we will be trying to achieve in this article by building a simple text-to-audio converter app using Python, googletrans API and gTTS for text-to-speech conversion. We will go over the complete code, how the different components work, and how to leverage the different APIs to accomplish different tasks like converting text from English to any language and then converting it to audio in that specific language

The different components

The are three sections to this

  • Translation - googletrans the Python library which uses Google Translation to help with language translation
  • Text-to-speech - gTTS (Google Text-to-Speech) which will help convert text to audio format in the language of our choice
  • Audio playback - pygame which is primarily used for developing games, but we will be using it here to playback the audio that’s generated by gTTS

Prerequisites

We can use pip command in terminal to install the needed libraries:

pip install gTTS googletrans==4.0.0-rc1 pygame


Note: Sometimes you might encounter the below error when running the actual Python code -

AttributeError: 'coroutine' object has no attribute 'text'
sys:1: RuntimeWarning: coroutine 'Translator.translate' was never awaited

Fix - Make sure you have the correct version of googletrans installed. The version 4.0.0-rc1 is known to work well for synchronous operations.

Implementation

translate_text

The translate_text function uses the googletrans  for text translation. It takes two parameters: text, the actual string that needs to be translated, and dest_language the target language code (e.g., 'es' for Spanish). Inside the function, we create a Translator object and call the translate method which returns the translated text.

text_to_audio

The text_to_audio function helps convert the text to audio using gTTS and pygame. It takes two parameters: text and language, this would be the same as the dest_language input as we want the audio to be in the same language as the one it’s translated to. The function creates an audio file using gTTS and stores it as an MP3 file. Then we initialize pygame.mixer to handle audio playback, load the MP3, and then play it. We have a loop to ensure the audio fully finishes playing after which we can clean up the audio file if needed by setting should_clean_up_file to True


Below is the complete code -

from gtts import gTTS
from googletrans import Translator
import pygame
import os

def translate_text(text, dest_language):
    translator = Translator()
    translation = translator.translate(text, dest=dest_language)
    return translation.text

def text_to_audio(text, language):
    mp3_file = f'{language}_output.mp3'
    should_clean_up_file = True
    try:
        tts_file = gTTS(text=text, lang=language, slow=False)
        tts_file.save(mp3_file)
        pygame.mixer.init()
        pygame.mixer.music.load(mp3_file)
        pygame.mixer.music.play()
        while pygame.mixer.music.get_busy():
            pygame.time.Clock().tick(15)
    finally:
  
        if os.path.exists(mp3_file) and should_clean_up_file:
            os.remove(mp3_file)


def main(english_text, target_language='en'):

    translated_text = translate_text(english_text, target_language)
    print(f"English Text: {english_text}")
    print(f"Translated Text: {translated_text}")

    text_to_audio(translated_text, target_language)


if __name__ == "__main__":
    english_text = "Hello, welcome to the world of text-to-speech conversion using Python."
    target_language = 'es'  # Spanish
    main(english_text, target_language)


Input1 - English to Spanish:

english_text = "Hello, welcome to the world of text-to-speech conversion using Python."
target_language = 'es'  # Spanish
main(english_text, target_language)

Output:

English to Spanish translation


Audio output:

Spanish Audio file

This would have created an es_output.mp3 in your current folder which would be played by pygame


Input2 - English to Japanese:

english_text = "Hello, welcome to the world of text-to-speech conversion using Python."
target_language = 'ja'  # Japanese
main(english_text, target_language)

Output:

English to Japanese translation

Audio output:

Japanese Audio file

This would have created an ja_output.mp3 in your current folder which would be played by pygame

Applications and Use Cases

  • Accessibility - This can be easily integrated into a Tourism app or a website which can greatly help people who want to explore a foreign country where they don’t speak the native language, to travel with confidence
  • Language Learning - If someone is interested in learning a new language, we can leverage this tool to self-teach. We simply input the text we want translated and we get the converted text along with audio which can also help with pronunciation
  • Content Consumption - For people who want to multi-task, say listening to an audiobook while driving, this tool would be handy as it can read out the contents in a pace that you prefer
  • Multilingual Communication - In today’s world where multinational deals are common, having the power to articulate your thoughts, and business proposals to anyone in any language is a powerful asset that can make or break deals

Conclusion

There isn’t a space that can’t be benefited by this application. It’s simple to build but its benefits are vast. By developing this tool we not only have solved a real-world problem that many people face but have also learnt how we can use Python to make API calls,

initialize objects, invoke methods, functional programming, and try catch and clean up files after its use. Once you have mastered these and want a challenge you can try building an interactive GUI and host it in a web server to make it more user-friendly and add features like - the option to change pronunciation, pace, etc. The possibilities are endless and hope you keep pushing the boundaries of how we can use technology/coding to advance humankind.