Compare commits

...

5 Commits

Author SHA1 Message Date
gabrieldemarmiesse
403db09953 Remove empty space 2025-06-18 10:41:56 +00:00
gabrieldemarmiesse
332b2b9daa Clarify real-time 2025-06-18 10:40:26 +00:00
gabrieldemarmiesse
7c9953187a Merge branch 'main' into give_uv_instructions 2025-06-18 10:36:57 +00:00
gabrieldemarmiesse
6247aee904 Add french sample 2025-06-18 10:33:42 +00:00
gabrieldemarmiesse
e202e4bb0a Add uv instructions and ignore the sample audio file 2025-06-18 10:32:31 +00:00
2 changed files with 18 additions and 3 deletions

2
.gitignore vendored
View File

@ -192,3 +192,5 @@ cython_debug/
# refer to https://docs.cursor.com/context/ignore-files # refer to https://docs.cursor.com/context/ignore-files
.cursorignore .cursorignore
.cursorindexingignore .cursorindexingignore
bria.mp3
sample_fr_hibiki_crepes.mp3

View File

@ -36,6 +36,12 @@ with version 0.2.5 or later, which can be installed via pip.
python -m moshi.run_inference --hf-repo kyutai/stt-2.6b-en bria.mp3 python -m moshi.run_inference --hf-repo kyutai/stt-2.6b-en bria.mp3
``` ```
If you have `uv` installed, you can skip the installation step and run directly:
```bash
uvx --with moshi python -m moshi.run_inference --hf-repo kyutai/stt-2.6b-en bria.mp3
```
It will install the moshi package in a temporary environment and run the speech-to-text.
### MLX implementation ### MLX implementation
<a href="https://huggingface.co/kyutai/stt-2.6b-en-mlx" target="_blank" style="margin: 2px;"> <a href="https://huggingface.co/kyutai/stt-2.6b-en-mlx" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue" style="display: inline-block; vertical-align: middle;"/> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue" style="display: inline-block; vertical-align: middle;"/>
@ -48,6 +54,12 @@ with version 0.2.5 or later, which can be installed via pip.
python -m moshi_mlx.run_inference --hf-repo kyutai/stt-2.6b-en-mlx bria.mp3 --temp 0 python -m moshi_mlx.run_inference --hf-repo kyutai/stt-2.6b-en-mlx bria.mp3 --temp 0
``` ```
If you have `uv` installed, you can skip the installation step and run directly:
```bash
uvx --with moshi-mlx python -m moshi_mlx.run_inference --hf-repo kyutai/stt-2.6b-en-mlx bria.mp3 --temp 0
```
It will install the moshi package in a temporary environment and run the speech-to-text.
### Rust implementation ### Rust implementation
<a href="https://huggingface.co/kyutai/stt-2.6b-en-candle" target="_blank" style="margin: 2px;"> <a href="https://huggingface.co/kyutai/stt-2.6b-en-candle" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue" style="display: inline-block; vertical-align: middle;"/> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue" style="display: inline-block; vertical-align: middle;"/>
@ -91,8 +103,9 @@ script.
uv run scripts/asr-streaming-query.py bria.mp3 uv run scripts/asr-streaming-query.py bria.mp3
``` ```
The script simulates some real-time processing of the audio. Faster processing The script limits the decoding speed to simulates real-time processing of the audio.
can be triggered by setting the real-time factor, e.g. `--rtf 500` will process Faster processing can be triggered by setting
the real-time factor, e.g. `--rtf 500` will process
the data as fast as possible. the data as fast as possible.
## Text-to-Speech ## Text-to-Speech