Add uv instructions and ignore the sample audio files (#1)

* Add uv instructions and ignore the sample audio file

* Add french sample

* Clarify real-time

* Remove empty space
This commit is contained in:
Gabriel de Marmiesse 2025-06-18 12:45:33 +02:00 committed by GitHub
parent de8202bddc
commit 6f4ef1eae8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 18 additions and 3 deletions

4
.gitignore vendored
View File

@ -191,4 +191,6 @@ cython_debug/
# exclude from AI features like autocomplete and code analysis. Recommended for sensitive data # exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
# refer to https://docs.cursor.com/context/ignore-files # refer to https://docs.cursor.com/context/ignore-files
.cursorignore .cursorignore
.cursorindexingignore .cursorindexingignore
bria.mp3
sample_fr_hibiki_crepes.mp3

View File

@ -36,6 +36,12 @@ with version 0.2.5 or later, which can be installed via pip.
python -m moshi.run_inference --hf-repo kyutai/stt-2.6b-en bria.mp3 python -m moshi.run_inference --hf-repo kyutai/stt-2.6b-en bria.mp3
``` ```
If you have `uv` installed, you can skip the installation step and run directly:
```bash
uvx --with moshi python -m moshi.run_inference --hf-repo kyutai/stt-2.6b-en bria.mp3
```
It will install the moshi package in a temporary environment and run the speech-to-text.
### MLX implementation ### MLX implementation
<a href="https://huggingface.co/kyutai/stt-2.6b-en-mlx" target="_blank" style="margin: 2px;"> <a href="https://huggingface.co/kyutai/stt-2.6b-en-mlx" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue" style="display: inline-block; vertical-align: middle;"/> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue" style="display: inline-block; vertical-align: middle;"/>
@ -48,6 +54,12 @@ with version 0.2.5 or later, which can be installed via pip.
python -m moshi_mlx.run_inference --hf-repo kyutai/stt-2.6b-en-mlx bria.mp3 --temp 0 python -m moshi_mlx.run_inference --hf-repo kyutai/stt-2.6b-en-mlx bria.mp3 --temp 0
``` ```
If you have `uv` installed, you can skip the installation step and run directly:
```bash
uvx --with moshi-mlx python -m moshi_mlx.run_inference --hf-repo kyutai/stt-2.6b-en-mlx bria.mp3 --temp 0
```
It will install the moshi package in a temporary environment and run the speech-to-text.
### Rust implementation ### Rust implementation
<a href="https://huggingface.co/kyutai/stt-2.6b-en-candle" target="_blank" style="margin: 2px;"> <a href="https://huggingface.co/kyutai/stt-2.6b-en-candle" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue" style="display: inline-block; vertical-align: middle;"/> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue" style="display: inline-block; vertical-align: middle;"/>
@ -91,8 +103,9 @@ script.
uv run scripts/asr-streaming-query.py bria.mp3 uv run scripts/asr-streaming-query.py bria.mp3
``` ```
The script simulates some real-time processing of the audio. Faster processing The script limits the decoding speed to simulates real-time processing of the audio.
can be triggered by setting the real-time factor, e.g. `--rtf 500` will process Faster processing can be triggered by setting
the real-time factor, e.g. `--rtf 500` will process
the data as fast as possible. the data as fast as possible.
## Text-to-Speech ## Text-to-Speech