I need to get the f0 for some chunks on audio, knowing the timestamps and indexes of samples of audio defining the chunks. Since librosa.pyin() is very slow, I would like to compute it once for the whole audio and then get the f0 for every chunk I need. For this I need to know how f0 scales to the raw audio array. I need something like :
def convert(audio_index):
# do something
return f0_index
audio, sr = librosa.load("sample.wav")
f0, voiced_flag, voiced_probs = librosa.pyin(audio,
sr=sr,
fmin=librosa.note_to_hz('C2'),
fmax=librosa.note_to_hz('C7')
)
f0_start = convert(audio_start_index)
f0_end = convert(audio_end_index)
f0_chunk = f0[f0_start:f0_end]