streamtotext.audio module

Audio source and audio processing compoenents

The two main base classes are AudioSource which provides audio and AudioProcessor which act as a pipeline processor on another AudioSource.

class streamtotext.audio.AudioBlock

Bases: object

An iterator over AudioChunk.

Blocks are used to deliniate continuous chunks of audio. As an example, when using the SquelchedSource audio source a consumer often would like to know at what points the squelch was triggered on and off.

end()
ended
class streamtotext.audio.AudioChunk(start_time, audio, width, freq)

Bases: tuple

A sequence of audio samples.

This is the low level structure used for passing audio. Typically these are obtained from iterating over an AudioBlock.

In order to make this object use minimal memory it is implemented as a namedtuple.

Parameters:
  • start_time (int) – Unix timestamp of the first sample.
  • audio (bytes) – Bytes array of audio samples.
  • width (int) – Number of bytes per sample.
  • freq (int) – Sampling frequency.
audio

Alias for field number 1

freq

Alias for field number 3

start_time

Alias for field number 0

width

Alias for field number 2

class streamtotext.audio.AudioPlayer(source, width, channels, freq)

Bases: object

Play audio from an audio source.

This is not generally useful for transcription, but can be very useful in the development of AudioSource or AudioProcessor classes.

Parameters:
  • source (AudioSource) – Source to play.
  • width (int) – Bytes per sample.
  • channels (int) – Number of channels in output device.
  • freq (int) – Sampling frequency of output device.
play()

Play audio from source.

This method will block until the source runs out of audio.

class streamtotext.audio.AudioSource

Bases: object

Base class for providing audio.

All classes which provide audio in some form implement this class. Audio is obtained by first entering the listen() context manager and then iterating over the AudioSource to obtain AudioBlock.

listen()

Listen to the AudioSource.

Ret:Async context manager which starts and stops the AudioSource.
start()

Start the audio source.

This is where initialization / opening of audio devices should happen.

stop()

Stop the audio source.

This is where deinitialization / closing of audio devices should happen.

class streamtotext.audio.AudioSourceProcessor(source)

Bases: streamtotext.audio.AudioSource

Base class for being a pipeline processor of an AudioSource

Parameters:source (AudioSource) – Input source
start()

Start the input audio source.

This is intended to be called from the base class, not directly.

stop()

Stop the input audio source.

This is intended to be called from the base class, not directly.

class streamtotext.audio.EvenChunkIterator(iterator, chunk_size)

Bases: object

Iterate over chunks from an audio source in even sized increments.

Parameters:
  • iterator (Iterator) – Iterator over audio chunks.
  • chunk_size (int) – Number of samples in resulting chunks
class streamtotext.audio.Microphone(audio_format=None, channels=1, rate=16000, device_ndx=0)

Bases: streamtotext.audio.AudioSource

Use a local microphone as an audio source.

Parameters:
  • audio_format – Sample format, default paInt16
  • channels (int) – Number of channels in microphone.
  • rate (int) – Sample frequency
  • device_ndx (int) – PyAudio device index
start()
stop()
exception streamtotext.audio.NoDefaultInputDeviceError

Bases: Exception

exception streamtotext.audio.NoMoreChunksError

Bases: Exception

class streamtotext.audio.QueueAudioBlock(queue=None)

Bases: streamtotext.audio.AudioBlock

add_chunk(chunk)
class streamtotext.audio.RateConvert(source, n_channels, out_rate)

Bases: streamtotext.audio.AudioSourceProcessor

class streamtotext.audio.RememberingIterator(iterator, memory_size)

Bases: object

memory()
class streamtotext.audio.SingleBlockAudioSource

Bases: streamtotext.audio.AudioSource

class streamtotext.audio.SquelchedBlock(source, squelch_level)

Bases: streamtotext.audio.AudioBlock

class streamtotext.audio.SquelchedSource(source, sample_size=1600, squelch_level=None, prefix_samples=4)

Bases: streamtotext.audio.AudioSourceProcessor

Filter out samples below a volume level from an audio source.

This is useful to prevent constant transcription attempts of background noise, and also to correctly create a ‘trigger window’ where transcription attempts are made.

A sliding window of prefix_samples size is inspected. When the rms of prefix_samples * sample_size samples surpasses the squelch_level this source begins to emit audio. Once the rms of the sliding window passes below 80% of the squelch level this source stop emitting audio.

Parameters:
  • source (AudioSource) – Input source
  • sample_size (int) – Size of each sample to inspect.
  • squelch_level (int) – RMS value to trigger squelch
  • prefix_samples (int) – Number of samples of sample_size to check
static check_squelch(level, is_triggered, chunks)
detect_squelch_level(detect_time=10, threshold=0.8)
start()
class streamtotext.audio.WaveSource(wave_path, chunk_frames=None)

Bases: streamtotext.audio.SingleBlockAudioSource

Use a wave file as an audio source.

Parameters:
  • wave_path (string) – Path to wave file.
  • chunk_frames (int) – Chunk size to return from get_chunk
start()
stop()
streamtotext.audio.chunk_sample_cnt(chunk)

Number of samples which occured in an AudioChunk

Parameters:chunk – The chunk to examine.
streamtotext.audio.merge_chunks(chunks)
streamtotext.audio.split_chunk(chunk, sample_offset)