Speech to Text via Whisper openAI

Comments

Todd-A0 wrote on 3/4/2023, 9:03 PM

Subtitle Edit is ready to go for these implementations so whenever we get a fixed or functional new one it can be added

It's great software, I hope if they make the change to GPU processing for whisper, it doesn't get the same problems as the GPU whisper versions.

Maybe run it by RX10 first, separate out the person you need, and use whisper next.

Great idea! 👍

RogerS wrote on 3/5/2023, 1:22 AM

Fora a non-python Whisper that does CPU or GPU you can grab it here: https://github.com/Purfview/whisper-standalone-win

It's not working in SubtitleEdit at the moment but works from the command prompt (run cmd as admin). It doesn't seem to have repeated lines.

Save it somewhere, dump ffmpeg.exe to the folder with whisper.exe, change the command prompt folder to there "cd C:\Whisper\" for example. Try this as a template (you can changer the language, location and model type).

whisper.exe --device cuda --language en --model "base" "C:\Videos\video name.mp4"

Custom PC (2022) Intel i5-13600K with UHD 770 iGPU with 31.0.101.4091 driver, MSI z690 Tomahawk motherboard, 64GB Corsair DDR5 5200 ram, NVIDIA 2080 Super (8GB) with latest studio driver, 2TB Hynix P41 SSD, Windows 11 Pro 64 bit

Dell XPS 15 laptop (2017) 32GB ram, NVIDIA 1050 (4GB) with latest studio driver, Intel i7-7700HQ with Intel 630 iGPU (driver 31.0.101.2115), dual internal SSD (256GB; 1TB), Windows 10 64 bit

Vegas 19.648
Vegas 20.270

VEGAS 4K "sample project" benchmark: https://forms.gle/ypyrrbUghEiaf2aC7
VEGAS Pro 20 "Ad" benchmark: https://forms.gle/eErJTR87K2bbJc4Q7

Todd-A0 wrote on 3/14/2023, 1:53 AM

This is the whisper variant i'm using currently https://github.com/Dadangdut33/Speech-Translate/releases/tag/1.1.0

It seems pretty good, in this example created subtitles for a 3min video using large dictionary in 1 minute (rtx 3080) . It gets things almost perfect until about 2 minutes where timing begins to be affected. I thought others interested in translation could use this as a barometer of sorts, and even download this video and compare the app version they're using. Russian historically has been difficult for whisper to do a good job at, If the whisper version you're using does a better job let us know

This uses GPU, maybe only Nvidia. It has no integration with any NLE.

Todd-A0 wrote on 3/27/2023, 11:31 PM

I tried the new version of StoryToolKit (Nvidia GPU only) by downloading the video in my last message and re-translating it. Top subs are the new translation. https://github.com/octimot/StoryToolkitAI/releases/tag/v0.17.16

It doesn't have the same timing problems seen with Speech-Translate, but as a negative it's formatting not as good, and instead of using multiple shorter sentences it seem to like to form paragraphs instead. Neither perfect options, but StoryToolKit in standalone mode (for Vegas users) possibly better choice, just need to break up the sub paragraphs manually where required

If your whisper translator does a better job, please share

wwaag wrote on 9/5/2023, 1:43 PM

Just wrote a new Batch WhisperAI Speech to Text tool and created a new thread. Here's the link https://www.vegascreativesoftware.info/us/forum/happyotter-batchwhisperai-speech-to-text--142423/

AKA the HappyOtter at https://tools4vegas.com/. System 1: Intel i7-8700k with HD 630 graphics plus an Nvidia 1050ti graphics card. System 2: Intel i7-3770k with HD 4000 graphics plus an AMD RX550 graphics card. System 3: Laptop. Dell Inspiron Plus 16. Intel i7-11800H, Intel Graphics. Current cameras include Panasonic FZ2500, GoPro Hero11 and Hero8 Black plus a myriad of smartPhone, pocket cameras, video cameras and film cameras going back to the original Nikon S.