Skip to content
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
011ce2a
Add whisper.cpp support
ramen Dec 23, 2024
c305f6b
Support split_on_word as a separate concept from word_timestamps
ramen Dec 28, 2024
39dfa0c
Always pass token_timestamps (seems stateful)
ramen Dec 28, 2024
463d130
Move progress logging to debug level
ramen Dec 28, 2024
8541001
Add whitespace between segments in concatenated text
ramen Dec 28, 2024
4c2bcd9
Log whisper.cpp system info (shows GPU-related flags)
ramen Dec 28, 2024
3bac835
Centralize logging configuration
ramen Dec 28, 2024
a3fc3cb
Revert "Centralize logging configuration"
ramen Dec 28, 2024
31b44fe
Graceful shutdown, ctrl-c handling for run.py
ramen Dec 28, 2024
eeb1e78
Make starting Redis optional
ramen Dec 29, 2024
618054a
Ignore Redis data
ramen Dec 29, 2024
30780b8
Change docker-compose default engine back to faster_whisper
ramen Dec 29, 2024
4c00da6
Document --no-start-redis flag
ramen Dec 29, 2024
5779f87
Hide unsupported features when transcript is not word-level
ramen Dec 29, 2024
475803e
Document whisper_cpp ASR engine option
ramen Dec 29, 2024
62576c0
Hide word granularity when transcript is not word-level
ramen Dec 29, 2024
26b1c80
Stub language detection method
ramen Dec 29, 2024
0e03f8d
Reduce nesting
ramen Dec 29, 2024
41c6994
Ignore segments with empty text
ramen Dec 29, 2024
153439e
Add split on words option for whisper.cpp
ramen Dec 29, 2024
d486350
Use unfiltered data when testing for presence of words
ramen Dec 29, 2024
56cd370
Clean up variable handling, don't try to terminate process that exited
ramen Dec 29, 2024
6fe0f1c
Get segments with words from whisper.cpp
ramen Dec 30, 2024
f7ae6d9
Document reason for --no-start-redis
ramen Jan 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,4 @@ poetry/core/*

.env
app/output/*/
dump.rdb
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ RUN export DEBIAN_FRONTEND=noninteractive \
lua-check \
fswatch \
make \
build-essential \
cargo \
ffmpeg \
redis \
Expand Down
103 changes: 76 additions & 27 deletions app/run.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
#!/usr/bin/env python

import argparse
import os
import signal
import subprocess
import sys
import argparse
import time

argmap = {
'--redis-bin': {
'default': 'redis-server',
'help': 'Path to Redis server binary (default: %(default)s)' },
'--no-start-redis': {
'action': 'store_true',
'help': 'Do not start Redis server' },
'--celery-broker-url': {
'default': 'redis://localhost:6379/0',
'help': 'Celery broker URL (default: %(default)s)' },
Expand Down Expand Up @@ -60,46 +65,90 @@
if args.enable_swagger_ui:
os.environ['ENABLE_SWAGGER_UI'] = '/docs'

shutdown_requested = False

def signal_handler(signum, frame):
global shutdown_requested
print('\nShutdown requested...', file=sys.stderr)
shutdown_requested = True

# Set up signal handlers before starting processes
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)

processes = {}

# Start Redis
print('Starting database...', file=sys.stderr)
processes['redis'] = subprocess.Popen([args.redis_bin], stdout=subprocess.DEVNULL)
if not args.no_start_redis:
print('Starting database...', file=sys.stderr)
processes['redis'] = \
subprocess.Popen(
[args.redis_bin],
stdout=subprocess.DEVNULL,
start_new_session=True)

# Start Celery
print('Starting worker...', file=sys.stderr)
processes['celery'] = subprocess.Popen(['celery', '-A', 'app.worker.celery', 'worker', '--pool=solo', '--loglevel=info'])
processes['celery'] = \
subprocess.Popen([
'celery',
'-A', 'app.worker.celery',
'worker',
'--pool=solo',
'--loglevel=info'
], start_new_session=True)

# Start Gunicorn
print('Starting application...', file=sys.stderr)
processes['gunicorn'] = subprocess.Popen(['gunicorn', '--bind', '0.0.0.0:9000', '--workers', '1', '--timeout', '0', 'app.webservice:app', '-k', 'uvicorn.workers.UvicornWorker'])
processes['gunicorn'] = \
subprocess.Popen([
'gunicorn',
'--bind', '0.0.0.0:9000',
'--workers', '1',
'--timeout', '0',
'app.webservice:app',
'-k', 'uvicorn.workers.UvicornWorker'
], start_new_session=True)

# Wait for any process to exit
pid, waitstatus = os.wait()
exitcode = os.waitstatus_to_exitcode(waitstatus)
exitcode = 0
process_name = '<unknown>'
for name, p in processes.items():
if p.pid == pid:
process_name = name
break
if exitcode < 0:
print('Process', process_name, 'received signal', -exitcode, file=sys.stderr)
else:
print('Process', process_name, 'exited with status', exitcode, file=sys.stderr)

# Terminate any child processes
print('Terminating child processes...', file=sys.stderr)
for name, p in processes.items():

while not shutdown_requested:
try:
print('Terminating', name, file=sys.stderr)
pid, waitstatus = os.waitpid(-1, os.WNOHANG)
except ChildProcessError:
break
if pid == 0: # No process has exited
time.sleep(0.1)
continue

# kinda bass-ackwards, but poll() returns None if process is still running
if not p.poll():
p.terminate()
else:
print(name, "already exited", file=sys.stderr)
exitcode = os.waitstatus_to_exitcode(waitstatus)
for name, p in processes.items():
if p.pid == pid:
process_name = name
break

if exitcode < 0:
print('Process', process_name, 'received signal', -exitcode, file=sys.stderr)
else:
print('Process', process_name, 'exited with status', exitcode, file=sys.stderr)
shutdown_requested = True

# Graceful shutdown sequence
print('Initiating graceful shutdown...', file=sys.stderr)
for name, p in reversed(list(processes.items())):
if name == process_name:
continue
try:
print(f'Terminating {name}...', file=sys.stderr)
p.terminate()
try:
p.wait(timeout=5) # Give each process 5 seconds to shut down
except subprocess.TimeoutExpired:
print(f'Force killing {name}...', file=sys.stderr)
p.kill()
except Exception as e:
print(e, file=sys.stderr)
print(f'Error shutting down {name}: {e}', file=sys.stderr)

# Exit with status of process that exited
status = 1 if exitcode < 0 else exitcode
Expand Down
7 changes: 7 additions & 0 deletions app/webservice.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,12 +56,15 @@
"encode",
"output",
"vad_filter",
"split_on_word",
"word_timestamps",
"model_name",
])

if ASR_ENGINE == "faster_whisper":
from .faster_whisper.constants import ASR_ENGINE_OPTIONS
elif ASR_ENGINE == "whisper_cpp":
from .whisper_cpp.constants import ASR_ENGINE_OPTIONS
else:
from .openai_whisper.constants import ASR_ENGINE_OPTIONS

Expand Down Expand Up @@ -207,6 +210,10 @@ async def asr(
description="Enable the voice activity detection (VAD) to filter out parts of the audio without speech",
include_in_schema=(True if ASR_ENGINE == "faster_whisper" else False)
)] = False,
split_on_word: Annotated[bool | None, Query(
description="Return one segment per word",
include_in_schema=(True if ASR_ENGINE == "whisper_cpp" else False)
)] = False,
word_timestamps: bool = Query(default=False, description="Word level timestamps"),
model_name: Union[str, None] = Query(default=None, description="Model name to use for transcription"),
use_async: bool = Query(default=False, description="Use asynchronous processing")
Expand Down
6 changes: 6 additions & 0 deletions app/whisper_cpp/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
ASR_ENGINE_OPTIONS = frozenset([
"task",
"language",
"initial_prompt",
"split_on_word",
])
93 changes: 93 additions & 0 deletions app/whisper_cpp/core.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import logging
import os
from io import StringIO
from threading import Lock
from typing import Union, BinaryIO

from pywhispercpp.model import Model

import json
from .constants import ASR_ENGINE_OPTIONS

logging.basicConfig(format='[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s', level=logging.INFO, force=True)
logger = logging.getLogger(__name__)

model_name = os.getenv("ASR_MODEL", "small")
model_path = os.getenv("ASR_MODEL_PATH", os.path.join(os.path.expanduser("~"), ".cache", "whisper"))

model_lock = Lock()

model = None
def load_model(next_model_name: str):
with model_lock:
global model_name, model

if model and next_model_name == model_name:
return model

if not model:
logger.info(Model.system_info())

model = Model(next_model_name, models_dir=model_path)

model_name = next_model_name

return model


def build_options(asr_options):
options_dict = {
'language': asr_options.get('language'),
'translate': asr_options.get('task', '') == 'translate',
'token_timestamps': asr_options.get('split_on_word', False),
}
if asr_options.get('initial_prompt'):
options_dict['initial_prompt'] = asr_options['initial_prompt']
if asr_options.get('split_on_word'):
options_dict['max_len'] = 1
options_dict['split_on_word'] = True
return options_dict


def transcribe(audio, asr_options, output):
options_dict = build_options(asr_options)
logger.info(f"whisper.cpp options: {options_dict}")

with model_lock:
segments = []
text = ""
segment_generator = model.transcribe(audio, **options_dict)
for segment in segment_generator:
if not segment.text:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious about what the purpose of a segment without text is 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was due to the first segment being a "beginning of text" sort of token with no actual content. I think that I have a better way to solve this now.

segment_dict = {
"start": float(segment.t0) / 100.0,
"end": float(segment.t1) / 100.0,
"text": segment.text,
}
segments.append(segment_dict)
text = text + segment.text + " "
result = {
"language": options_dict.get("language"),
"segments": segments,
"text": text
}

output_file = StringIO()
write_result(result, output_file, output)
output_file.seek(0)

return output_file


def language_detection(_audio):
raise NotImplementedError("language detection not implemented for whisper.cpp")


def write_result(
result: dict, file: BinaryIO, output: Union[str, None]
):
if output == "json":
json.dump(result, file)
else:
return 'Please select an output method!'
Comment on lines +93 to +96
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the other engines we offer a bunch of output formats...but do we ever use anything but json in those either? the whole backend interface is json-based. maybe this was useful using the web interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't - this design came from whisper-asr-webservice, and was to support the web interface but also the API, which included export functionality. We don't use this part of the API, since we build our exports in Lua. We could copy over (or make reusable) the export code from the faster-whisper engine, but since we don't use it, it seemed like a waste of effort. But this is a bit of a funky design at the moment.

4 changes: 3 additions & 1 deletion app/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ def update(self, progress):
ASR_ENGINE = os.getenv("ASR_ENGINE", "faster_whisper")
if ASR_ENGINE == "faster_whisper":
from .faster_whisper import core as asr_engine
elif ASR_ENGINE == "whisper_cpp":
from .whisper_cpp import core as asr_engine
else:
from .openai_whisper import core as asr_engine

Expand Down Expand Up @@ -150,7 +152,7 @@ def get_output_url_path(job_id: str):

def update_progress(context):
def do_update(units, total, current):
logger.info(f"Updating progress with units={units}, total={total}, current={current}")
logger.debug(f"Updating progress with units={units}, total={total}, current={current}")
context.update_state(
state=STATES["transcribing"],
meta={"progress": {"units": units, "total": total, "current": current}}
Expand Down
4 changes: 2 additions & 2 deletions docs/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,8 @@ You can customize the behavior of the ReaSpeech Docker image by setting
environment variables when running the container. Here are the available
environment variables and their default values:

- `ASR_ENGINE`: The ASR engine to use. Options are `faster_whisper` (default)
and `openai_whisper`.
- `ASR_ENGINE`: The ASR engine to use. Options are `faster_whisper` (default),
`openai_whisper`, and `whisper_cpp`.
Comment on lines +134 to +135
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in the scope of this pr but wondering if we should provide some basic information (and link to project) about these engines. my selection process was "works on my old macbook outside of docker" and that was always openai_whisper because i could never resolve the library conflicts causing faster_whisper to crash.

hard to say what the right way to describe this is to someone interested in development lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree this should be documented somewhere. I think picking the right engine is not a problem that most users should have to solve, but in this particular case (for Mac users), it's the difference between GPU acceleration and not. That seems important enough that users should understand how and why to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reacts with "nod" emoji


To set an environment variable when running the Docker container, use the `-e`
flag followed by the variable name and value. For example, to use the
Expand Down
3 changes: 3 additions & 0 deletions docs/no-docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ You should now be able to start ReaSpeech's services by running:
# Start all services
poetry run python3.10 app/run.py

# Start all services except for Redis
poetry run python3.10 app/run.py --no-start-redis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this section call out why one might want to run the --no-start-redis version or is it obvious enough? feels like maybe the user who needs the option would understand the difference but another less developer-minded might not. 🤔🤔🤔🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. I'll update it. The motivation was that in some cases, Redis is installed using an OS package (Debian package, Homebrew, etc.), and the service is started and managed by the OS infrastructure. I'd really like to make Redis optional - feature #95

# For usage instructions
poetry run python3.10 app/run.py --help
```
Expand Down
Loading
Loading