Welcome to streamtotext’s documentation!¶
streamtotext is a library for performing streaming speech to text
transcription.
It heavily utilises asyncio in order to provide an event based
api for reading from and processing audio sources which can be sent to
various transcription backends. streamtotext is highly customizable:
users can provide custom sources for audio, mix and provide filters for audio,
as well as provide custom transcription backends.
Installation¶
You can install streamtotext using pip:
$ pip install git+https://github.com/ibm-dev/streamtotext
Requirements¶
Currently streamtotext requires a minimum Python version of 3.5.
Additional python dependencies are listed in requirements.txt and will
be installed automatically by pip.
Why¶
Many speech transcription applications are best designed in a streaming manner (as opposed to bulk processing of a recording). As an example a live voice transcription application would almost certainly require this. Even an application which does not require live transcription can benefit from the lower latency provided by a streaming implementation where transcription can happen in the background while the sound is still being recorded.
Currently, there is a lack of tooling to perform this task in Python and writing the code to do so can be difficult and error prone. We are providing a toolkit to handle the transcription pipeline as an event stream in a way that allows for highly customizable audio sources (e.g. squelched microphone) and transcription methods (e.g. local trigger before remote service).
Try it out¶
After installing streamtotext you can run a demo transcription application
to transcribe the audio from a local microphone.
$ streamtotext-transcribe-mic watson