Speech to Text via Whisper openAI

Comments

Todd-A0 wrote on 3/4/2023, 9:03 PM

Subtitle Edit is ready to go for these implementations so whenever we get a fixed or functional new one it can be added

It's great software, I hope if they make the change to GPU processing for whisper, it doesn't get the same problems as the GPU whisper versions.

Maybe run it by RX10 first, separate out the person you need, and use whisper next.

Great idea! 👍

RogerS wrote on 3/5/2023, 1:22 AM

Fora a non-python Whisper that does CPU or GPU you can grab it here: https://github.com/Purfview/whisper-standalone-win

It's not working in SubtitleEdit at the moment but works from the command prompt (run cmd as admin). It doesn't seem to have repeated lines.

Save it somewhere, dump ffmpeg.exe to the folder with whisper.exe, change the command prompt folder to there "cd C:\Whisper\" for example. Try this as a template (you can changer the language, location and model type).

whisper.exe --device cuda --language en --model "base" "C:\Videos\video name.mp4"

Custom PC (2022) Intel i5-13600K with UHD 770 iGPU with 31.0.101.4091 driver, MSI z690 Tomahawk motherboard, 64GB Corsair DDR5 5200 ram, NVIDIA 2080 Super (8GB) with latest studio driver, 2TB Hynix P41 SSD, Windows 11 Pro 64 bit

Dell XPS 15 laptop (2017) 32GB ram, NVIDIA 1050 (4GB) with latest studio driver, Intel i7-7700HQ with Intel 630 iGPU (driver 31.0.101.2115), dual internal SSD (256GB; 1TB), Windows 10 64 bit

Vegas 19.648
Vegas 20.236

VEGAS 4K "sample project" benchmark: https://forms.gle/ypyrrbUghEiaf2aC7
VEGAS Pro 20 "Ad" benchmark: https://forms.gle/eErJTR87K2bbJc4Q7

Todd-A0 wrote on 3/14/2023, 1:53 AM

This is the whisper variant i'm using currently https://github.com/Dadangdut33/Speech-Translate/releases/tag/1.1.0

It seems pretty good, in this example created subtitles for a 3min video using large dictionary in 1 minute (rtx 3080) . It gets things almost perfect until about 2 minutes where timing begins to be affected. I thought others interested in translation could use this as a barometer of sorts, and even download this video and compare the app version they're using. Russian historically has been difficult for whisper to do a good job at, If the whisper version you're using does a better job let us know

This uses GPU, maybe only Nvidia. It has no integration with any NLE.