Add whisper.cpp backend, enable GPU support on Apple Silicon #132

ramen · 2024-12-29T17:33:43Z

This change adds a new ASR engine that uses whisper.cpp via the pywhispercpp library. This enables GPU-accelerated transcription on Apple Silicon. Fixes #125

To use whisper.cpp, set the ASR_ENGINE=whisper_cpp environment variable when starting the service or Docker container. Note that GPU acceleration is only available outside of Docker, since it requires access to Apple's libraries.

Example:
ASR_ENGINE=whisper_cpp poetry run python3.10 app/run.py --build-reascripts

There are a few differences with the whisper.cpp engine:

Support for word-level transcripts is different; there is no current way to get a segment/word hierarchy. Instead, there is a mode that essentially returns a segment per word. This doesn't mesh well with a number of ReaSpeech's features, so those features are disabled for this engine.
Resource consumption is lower. I was able to process 1.5 hours of audio using the small model on my M1 Mac mini with 8GB RAM, and with GPU acceleration, it only took about 6 minutes. This would previously have caused an out of memory condition on this hardware.
Timestamps seem anecdotally less accurate compared to faster-whisper
There is no Voice Activity Detection (VAD) - though the latter might be possible with the use of a library called "webrtcvad"

This change includes an improvement to run.py that prevents Ctrl-C from being sent to subprocesses and enables a graceful shutdown when running ReaSpeech outside of Docker.

This reverts commit 8695c6f.

ramen · 2024-12-29T17:34:22Z

mikeylove

this is awesome! after a bit of initial confusion, my test file now transcribes in less than a second (on my M4 Pro Mini). one small diff i noticed between small.en and small is that the former had a trailing segment that said [BLANK AUDIO].

re: that initial confusion, the "loading model" phase was being reported on the reaspeech side as "transcribing." before i realized this, i was puzzled why it seemed to be 1) reporting very small but regular progress updates, 2) taking forever and 3) being nearly instant on subsequent transcriptions.

not sure how to think about the "split on words" option. the output from running with this turned on is (obviously) pretty bulky. i'm having a hard time coming up with a hypothetical situation where words-as-top-level would be useful. i'm also quite aware that this is very possibly only a limit of my own imagination. 😂

my individual comments here are mostly to identify issues/updates to pursue outside of this pr.

mikeylove · 2025-01-01T19:39:17Z

app/whisper_cpp/core.py

+            if not segment.text:
+                continue


curious about what the purpose of a segment without text is 🤔

This was due to the first segment being a "beginning of text" sort of token with no actual content. I think that I have a better way to solve this now.

mikeylove · 2025-01-01T19:43:04Z

app/whisper_cpp/core.py

+    if output == "json":
+        json.dump(result, file)
+    else:
+        return 'Please select an output method!'


in the other engines we offer a bunch of output formats...but do we ever use anything but json in those either? the whole backend interface is json-based. maybe this was useful using the web interface?

We don't - this design came from whisper-asr-webservice, and was to support the web interface but also the API, which included export functionality. We don't use this part of the API, since we build our exports in Lua. We could copy over (or make reusable) the export code from the faster-whisper engine, but since we don't use it, it seemed like a waste of effort. But this is a bit of a funky design at the moment.

mikeylove · 2025-01-01T19:47:01Z

docs/development.md

+- `ASR_ENGINE`: The ASR engine to use. Options are `faster_whisper` (default),
+  `openai_whisper`, and `whisper_cpp`.


not in the scope of this pr but wondering if we should provide some basic information (and link to project) about these engines. my selection process was "works on my old macbook outside of docker" and that was always openai_whisper because i could never resolve the library conflicts causing faster_whisper to crash.

hard to say what the right way to describe this is to someone interested in development lol

Yeah, I agree this should be documented somewhere. I think picking the right engine is not a problem that most users should have to solve, but in this particular case (for Mac users), it's the difference between GPU acceleration and not. That seems important enough that users should understand how and why to do it.

reacts with "nod" emoji

mikeylove · 2025-01-01T19:50:36Z

docs/no-docker.md

 # Start all services
 poetry run python3.10 app/run.py

+# Start all services except for Redis
+poetry run python3.10 app/run.py --no-start-redis
+


should this section call out why one might want to run the --no-start-redis version or is it obvious enough? feels like maybe the user who needs the option would understand the difference but another less developer-minded might not. 🤔🤔🤔🤔

Yes, good point. I'll update it. The motivation was that in some cases, Redis is installed using an OS package (Debian package, Homebrew, etc.), and the service is started and managed by the OS infrastructure. I'd really like to make Redis optional - feature #95

ramen · 2025-01-02T16:07:36Z

this is awesome! after a bit of initial confusion, my test file now transcribes in less than a second (on my M4 Pro Mini). one small diff i noticed between small.en and small is that the former had a trailing segment that said [BLANK AUDIO].

I have noticed more of this type of thing as well - I've also seen things like "[Laughs]" which I don't recall seeing with any other engine. In some cases, these are special tokens, and I have a way to filter those out, which I'll incorporate into this PR.

re: that initial confusion, the "loading model" phase was being reported on the reaspeech side as "transcribing." before i realized this, i was puzzled why it seemed to be 1) reporting very small but regular progress updates, 2) taking forever and 3) being nearly instant on subsequent transcriptions.

Yes, I think the way that this library (pywhispercpp) reports progress is odd, and basically reports the model loading progress but not the transcription. At least there's some visual feedback about the work it's doing.

not sure how to think about the "split on words" option. the output from running with this turned on is (obviously) pretty bulky. i'm having a hard time coming up with a hypothetical situation where words-as-top-level would be useful. i'm also quite aware that this is very possibly only a limit of my own imagination. 😂

So, good news! I figured out how to get segments with words, and I can remove this option. It turns out that pywhispercpp's high-level Model class hides some of whisper.cpp's functionality, and it's actually possible to merge tokens into words by detecting word boundaries, which are indicated by the first character of the token being a space. Stay tuned...

ramen added 21 commits December 29, 2024 09:40

Add whisper.cpp support

011ce2a

Support split_on_word as a separate concept from word_timestamps

c305f6b

Always pass token_timestamps (seems stateful)

39dfa0c

Move progress logging to debug level

463d130

Add whitespace between segments in concatenated text

8541001

Log whisper.cpp system info (shows GPU-related flags)

4c2bcd9

Centralize logging configuration

3bac835

Revert "Centralize logging configuration"

a3fc3cb

This reverts commit 8695c6f.

Graceful shutdown, ctrl-c handling for run.py

31b44fe

Make starting Redis optional

eeb1e78

Ignore Redis data

618054a

Change docker-compose default engine back to faster_whisper

30780b8

Document --no-start-redis flag

4c00da6

Hide unsupported features when transcript is not word-level

5779f87

Document whisper_cpp ASR engine option

475803e

Hide word granularity when transcript is not word-level

62576c0

Stub language detection method

26b1c80

Reduce nesting

0e03f8d

Ignore segments with empty text

41c6994

Add split on words option for whisper.cpp

153439e

Use unfiltered data when testing for presence of words

d486350

ramen requested review from mikeylove and smrl December 29, 2024 17:33

Clean up variable handling, don't try to terminate process that exited

56cd370

mikeylove approved these changes Jan 1, 2025

View reviewed changes

ramen added 2 commits January 2, 2025 09:26

Get segments with words from whisper.cpp

6fe0f1c

Document reason for --no-start-redis

f7ae6d9

ramen merged commit 8febe95 into main Jan 2, 2025
2 checks passed

ramen deleted the whisper-cpp branch January 2, 2025 19:26

koir1100 mentioned this pull request May 27, 2025

[bug]: whisper decoding error #171

Open

		- `ASR_ENGINE`: The ASR engine to use. Options are `faster_whisper` (default),
		`openai_whisper`, and `whisper_cpp`.

Add whisper.cpp backend, enable GPU support on Apple Silicon #132

Add whisper.cpp backend, enable GPU support on Apple Silicon #132

Uh oh!

Conversation

ramen commented Dec 29, 2024

Uh oh!

ramen commented Dec 29, 2024

Uh oh!

mikeylove left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramen commented Jan 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants