Tweaks.
This commit is contained in:
parent
6f4ef1eae8
commit
1b362905f9
17
README.md
17
README.md
|
|
@ -3,16 +3,23 @@ Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimod
|
||||||
|
|
||||||
## Speech-to-text
|
## Speech-to-text
|
||||||
|
|
||||||
DSM can be used to build streaming speech-to-text models. These models can be
|
DSM can be used to build streaming speech-to-text models. We provide two such models
|
||||||
batched for efficiency, return word level timestamps, and are great for
|
with a different delay between the audio input and the text output.
|
||||||
interactive applications. We provide two such models, these models are
|
|
||||||
characterized by their size as well as the delay it takes for audio to be
|
|
||||||
transcribed into text. We provide two such models:
|
|
||||||
- An English and French model with ~1b parameters using a 0.5 second delay,
|
- An English and French model with ~1b parameters using a 0.5 second delay,
|
||||||
`kyutai/stt-1b-en_fr`.
|
`kyutai/stt-1b-en_fr`.
|
||||||
- An English only model with ~2.6b parameters using a 2.5 second delay,
|
- An English only model with ~2.6b parameters using a 2.5 second delay,
|
||||||
`kyutai/stt-2.6b-en`.
|
`kyutai/stt-2.6b-en`.
|
||||||
|
|
||||||
|
These speech-to-text models have several advantages:
|
||||||
|
- Easy batching for maximum efficiency: a H100 can process 400 streams in
|
||||||
|
real-time.
|
||||||
|
- Streaming inference: the models can process audio in chunks, which allows
|
||||||
|
for real-time transcription, and is great for interactive applications.
|
||||||
|
- Return word-level timestamps.
|
||||||
|
- Some models have a semantic Voice Activity Detection (VAD) component that
|
||||||
|
can be used to detect when the user is speaking. This is especially useful
|
||||||
|
for building voice agents.
|
||||||
|
|
||||||
More details can be found on the [project page](https://kyutai.org/next/stt).
|
More details can be found on the [project page](https://kyutai.org/next/stt).
|
||||||
|
|
||||||
You can retrieve the sample files used in the following snippets via:
|
You can retrieve the sample files used in the following snippets via:
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue
Block a user