Welcome to streamtotext’s documentation!

streamtotext is a library for performing streaming speech to text transcription.

It heavily utilises asyncio in order to provide an event based api for reading from and processing audio sources which can be sent to various transcription backends. streamtotext is highly customizable: users can provide custom sources for audio, mix and provide filters for audio, as well as provide custom transcription backends.

Installation

You can install streamtotext using pip:

$ pip install git+https://github.com/ibm-dev/streamtotext

Requirements

Currently streamtotext requires a minimum Python version of 3.5. Additional python dependencies are listed in requirements.txt and will be installed automatically by pip.

Why

Many speech transcription applications are best designed in a streaming manner (as opposed to bulk processing of a recording). As an example a live voice transcription application would almost certainly require this. Even an application which does not require live transcription can benefit from the lower latency provided by a streaming implementation where transcription can happen in the background while the sound is still being recorded.

Currently, there is a lack of tooling to perform this task in Python and writing the code to do so can be difficult and error prone. We are providing a toolkit to handle the transcription pipeline as an event stream in a way that allows for highly customizable audio sources (e.g. squelched microphone) and transcription methods (e.g. local trigger before remote service).

Try it out

After installing streamtotext you can run a demo transcription application to transcribe the audio from a local microphone.

$ streamtotext-transcribe-mic watson

Additional Information