From 8e254d8b098b0963a21a4ec6e7028aa04d4f7984 Mon Sep 17 00:00:00 2001 From: Vaclav Volhejn Date: Wed, 2 Jul 2025 17:42:21 +0200 Subject: [PATCH] Document PyTorch and MLX examples --- README.md | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 57 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 927582b..75af0cc 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,9 @@ -This repo contains instructions and examples of how to run Kyutai Speech-To-Text models. +This repo contains instructions and examples of how to run +[Kyutai Speech-To-Text](#kyutai-speech-to-text) +and [Kyutai Text-To-Speech](#kyutai-text-to-speech) models. These models are powered by delayed streams modeling (DSM), a flexible formulation for streaming, multimodal sequence-to-sequence learning. @@ -192,9 +194,61 @@ The MLX models can also be used in swift using the [moshi-swift codebase](https://github.com/kyutai-labs/moshi-swift), the 1b model has been tested to work fine on an iPhone 16 Pro. -## Text-to-Speech +## Kyutai Text-to-Speech -We're in the process of open-sourcing our TTS models. Check back for updates! + + Open In Colab + + +We provide different implementations of Kyutai TTS for different use cases. Here is how to choose which one to use: + +- PyTorch: for research and tinkering. If you want to call the model from Python for research or experimentation, use our PyTorch implementation. +- Rust: for production. If you want to serve Kyutai TTS in a production setting, use our Rust server. Our robust Rust server provides streaming access to the model over websockets. We use this server to run Unmute. +- MLX: for on-device inference on iPhone and Mac. MLX is Apple's ML framework that allows you to use hardware acceleration on Apple silicon. If you want to run the model on a Mac or an iPhone, choose the MLX implementation. + +### PyTorch implementation + + + Open In Colab + + +Check out our [Colab notebook](https://colab.research.google.com/github/kyutai-labs/delayed-streams-modeling/blob/main/tts_pytorch.ipynb) or use the script: + +```bash +# From stdin, plays audio immediately +echo "Hey, how are you?" | python scripts/tts_pytorch.py - - + +# From text file to audio file +python scripts/tts_pytorch.py text_to_say.txt audio_output.wav +``` + +This requires the [moshi package](https://pypi.org/project/moshi/), which can be installed via pip. +If you have [uv](https://docs.astral.sh/uv/) installed, you can skip the installation step +and just prefix the command above with `uvx --with moshi`. + +### Rust server + +Example coming soon. + +### MLX implementation + +[MLX](https://ml-explore.github.io/mlx/build/html/index.html) is Apple's ML framework that allows you to use +hardware acceleration on Apple silicon. + +Use our example script to run Kyutai TTS on MLX. +The script takes text from stdin or a file and can output to a file or stream the resulting audio. + +```bash +# From stdin, plays audio immediately +echo "Hey, how are you?" | python scripts/tts_mlx.py - - + +# From text file to audio file +python scripts/tts_mlx.py text_to_say.txt audio_output.wav +``` + +This requires the [moshi-mlx package](https://pypi.org/project/moshi-mlx/), which can be installed via pip. +If you have [uv](https://docs.astral.sh/uv/) installed, you can skip the installation step +and just prefix the command above with `uvx --with moshi-mlx`. ## License