-
Notifications
You must be signed in to change notification settings - Fork 4
Add whisper.cpp backend, enable GPU support on Apple Silicon #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 22 commits
011ce2a
c305f6b
39dfa0c
463d130
8541001
4c2bcd9
3bac835
a3fc3cb
31b44fe
eeb1e78
618054a
30780b8
4c00da6
5779f87
475803e
62576c0
26b1c80
0e03f8d
41c6994
153439e
d486350
56cd370
6fe0f1c
f7ae6d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,3 +43,4 @@ poetry/core/* | |
|
|
||
| .env | ||
| app/output/*/ | ||
| dump.rdb | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| ASR_ENGINE_OPTIONS = frozenset([ | ||
| "task", | ||
| "language", | ||
| "initial_prompt", | ||
| "split_on_word", | ||
| ]) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,93 @@ | ||
| import logging | ||
| import os | ||
| from io import StringIO | ||
| from threading import Lock | ||
| from typing import Union, BinaryIO | ||
|
|
||
| from pywhispercpp.model import Model | ||
|
|
||
| import json | ||
| from .constants import ASR_ENGINE_OPTIONS | ||
|
|
||
| logging.basicConfig(format='[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s', level=logging.INFO, force=True) | ||
| logger = logging.getLogger(__name__) | ||
|
|
||
| model_name = os.getenv("ASR_MODEL", "small") | ||
| model_path = os.getenv("ASR_MODEL_PATH", os.path.join(os.path.expanduser("~"), ".cache", "whisper")) | ||
|
|
||
| model_lock = Lock() | ||
|
|
||
| model = None | ||
| def load_model(next_model_name: str): | ||
| with model_lock: | ||
| global model_name, model | ||
|
|
||
| if model and next_model_name == model_name: | ||
| return model | ||
|
|
||
| if not model: | ||
| logger.info(Model.system_info()) | ||
|
|
||
| model = Model(next_model_name, models_dir=model_path) | ||
|
|
||
| model_name = next_model_name | ||
|
|
||
| return model | ||
|
|
||
|
|
||
| def build_options(asr_options): | ||
| options_dict = { | ||
| 'language': asr_options.get('language'), | ||
| 'translate': asr_options.get('task', '') == 'translate', | ||
| 'token_timestamps': asr_options.get('split_on_word', False), | ||
| } | ||
| if asr_options.get('initial_prompt'): | ||
| options_dict['initial_prompt'] = asr_options['initial_prompt'] | ||
| if asr_options.get('split_on_word'): | ||
| options_dict['max_len'] = 1 | ||
| options_dict['split_on_word'] = True | ||
| return options_dict | ||
|
|
||
|
|
||
| def transcribe(audio, asr_options, output): | ||
| options_dict = build_options(asr_options) | ||
| logger.info(f"whisper.cpp options: {options_dict}") | ||
|
|
||
| with model_lock: | ||
| segments = [] | ||
| text = "" | ||
| segment_generator = model.transcribe(audio, **options_dict) | ||
| for segment in segment_generator: | ||
| if not segment.text: | ||
| continue | ||
| segment_dict = { | ||
| "start": float(segment.t0) / 100.0, | ||
| "end": float(segment.t1) / 100.0, | ||
| "text": segment.text, | ||
| } | ||
| segments.append(segment_dict) | ||
| text = text + segment.text + " " | ||
| result = { | ||
| "language": options_dict.get("language"), | ||
| "segments": segments, | ||
| "text": text | ||
| } | ||
|
|
||
| output_file = StringIO() | ||
| write_result(result, output_file, output) | ||
| output_file.seek(0) | ||
|
|
||
| return output_file | ||
|
|
||
|
|
||
| def language_detection(_audio): | ||
| raise NotImplementedError("language detection not implemented for whisper.cpp") | ||
|
|
||
|
|
||
| def write_result( | ||
| result: dict, file: BinaryIO, output: Union[str, None] | ||
| ): | ||
| if output == "json": | ||
| json.dump(result, file) | ||
| else: | ||
| return 'Please select an output method!' | ||
|
Comment on lines
+93
to
+96
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in the other engines we offer a bunch of output formats...but do we ever use anything but json in those either? the whole backend interface is json-based. maybe this was useful using the web interface?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't - this design came from whisper-asr-webservice, and was to support the web interface but also the API, which included export functionality. We don't use this part of the API, since we build our exports in Lua. We could copy over (or make reusable) the export code from the faster-whisper engine, but since we don't use it, it seemed like a waste of effort. But this is a bit of a funky design at the moment. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -131,8 +131,8 @@ You can customize the behavior of the ReaSpeech Docker image by setting | |
| environment variables when running the container. Here are the available | ||
| environment variables and their default values: | ||
|
|
||
| - `ASR_ENGINE`: The ASR engine to use. Options are `faster_whisper` (default) | ||
| and `openai_whisper`. | ||
| - `ASR_ENGINE`: The ASR engine to use. Options are `faster_whisper` (default), | ||
| `openai_whisper`, and `whisper_cpp`. | ||
|
Comment on lines
+134
to
+135
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not in the scope of this pr but wondering if we should provide some basic information (and link to project) about these engines. my selection process was "works on my old macbook outside of docker" and that was always hard to say what the right way to describe this is to someone interested in development lol
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I agree this should be documented somewhere. I think picking the right engine is not a problem that most users should have to solve, but in this particular case (for Mac users), it's the difference between GPU acceleration and not. That seems important enough that users should understand how and why to do it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reacts with "nod" emoji |
||
|
|
||
| To set an environment variable when running the Docker container, use the `-e` | ||
| flag followed by the variable name and value. For example, to use the | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,9 @@ You should now be able to start ReaSpeech's services by running: | |
| # Start all services | ||
| poetry run python3.10 app/run.py | ||
|
|
||
| # Start all services except for Redis | ||
| poetry run python3.10 app/run.py --no-start-redis | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this section call out why one might want to run the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, good point. I'll update it. The motivation was that in some cases, Redis is installed using an OS package (Debian package, Homebrew, etc.), and the service is started and managed by the OS infrastructure. I'd really like to make Redis optional - feature #95 |
||
| # For usage instructions | ||
| poetry run python3.10 app/run.py --help | ||
| ``` | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious about what the purpose of a segment without text is 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was due to the first segment being a "beginning of text" sort of token with no actual content. I think that I have a better way to solve this now.