Speech to Text via Whisper openAI

Comments

Former user wrote on 3/4/2023, 9:03 PM

Subtitle Edit is ready to go for these implementations so whenever we get a fixed or functional new one it can be added

It's great software, I hope if they make the change to GPU processing for whisper, it doesn't get the same problems as the GPU whisper versions.

Maybe run it by RX10 first, separate out the person you need, and use whisper next.

Great idea! 👍

RogerS wrote on 3/5/2023, 1:22 AM

Fora a non-python Whisper that does CPU or GPU you can grab it here: https://github.com/Purfview/whisper-standalone-win

It's not working in SubtitleEdit at the moment but works from the command prompt (run cmd as admin). It doesn't seem to have repeated lines.

Save it somewhere, dump ffmpeg.exe to the folder with whisper.exe, change the command prompt folder to there "cd C:\Whisper\" for example. Try this as a template (you can changer the language, location and model type).

whisper.exe --device cuda --language en --model "base" "C:\Videos\video name.mp4"

Custom PC (2022) Intel i5-13600K with UHD 770 iGPU with latest driver, MSI z690 Tomahawk motherboard, 64GB Corsair DDR5 5200 ram, NVIDIA 2080 Super (8GB) with latest studio driver, 2TB Hynix P41 SSD and 2TB Samsung 980 Pro cache drive, Windows 11 Pro 64 bit

Dell XPS 15 laptop (2017) 32GB ram, NVIDIA 1050 (4GB) with latest studio driver, Intel i7-7700HQ with Intel 630 iGPU (latest available driver), dual internal SSD (1TB; 1TB), Windows 10 64 bit

VEGAS Pro 19.651
VEGAS Pro 20.411
VEGAS Pro 21.208
VEGAS Pro 22.93

Try the
VEGAS 4K "sample project" benchmark (works with VP 16+): https://forms.gle/ypyrrbUghEiaf2aC7
VEGAS Pro 20 "Ad" benchmark (works with VP 20+): https://forms.gle/eErJTR87K2bbJc4Q7

Former user wrote on 3/14/2023, 1:53 AM

This is the whisper variant i'm using currently https://github.com/Dadangdut33/Speech-Translate/releases/tag/1.1.0

It seems pretty good, in this example created subtitles for a 3min video using large dictionary in 1 minute (rtx 3080) . It gets things almost perfect until about 2 minutes where timing begins to be affected. I thought others interested in translation could use this as a barometer of sorts, and even download this video and compare the app version they're using. Russian historically has been difficult for whisper to do a good job at, If the whisper version you're using does a better job let us know

This uses GPU, maybe only Nvidia. It has no integration with any NLE.

Former user wrote on 3/27/2023, 11:31 PM

I tried the new version of StoryToolKit (Nvidia GPU only) by downloading the video in my last message and re-translating it. Top subs are the new translation. https://github.com/octimot/StoryToolkitAI/releases/tag/v0.17.16

It doesn't have the same timing problems seen with Speech-Translate, but as a negative it's formatting not as good, and instead of using multiple shorter sentences it seem to like to form paragraphs instead. Neither perfect options, but StoryToolKit in standalone mode (for Vegas users) possibly better choice, just need to break up the sub paragraphs manually where required

If your whisper translator does a better job, please share

wwaag wrote on 9/5/2023, 1:43 PM

Just wrote a new Batch WhisperAI Speech to Text tool and created a new thread. Here's the link https://www.vegascreativesoftware.info/us/forum/happyotter-batchwhisperai-speech-to-text--142423/

AKA the HappyOtter at https://tools4vegas.com/. System 1: Intel i7-8700k with HD 630 graphics plus an Nvidia RTX4070 graphics card. System 2: Intel i7-3770k with HD 4000 graphics plus an AMD RX550 graphics card. System 3: Laptop. Dell Inspiron Plus 16. Intel i7-11800H, Intel Graphics. Current cameras include Panasonic FZ2500, GoPro Hero11 and Hero8 Black plus a myriad of smartPhone, pocket cameras, video cameras and film cameras going back to the original Nikon S.

Vegas_Pro_Brasil wrote on 12/9/2023, 11:47 PM

@bitman

I was doing some tests with your script and noticed some points of improvement.

1.Transcription only works if the files are on the C:\ drive. If the files are on another drive, D:\ for example, transcription will not work.

2.The script does not work when names are separated by spaces. For example: The file name "Video 01.mp4" must be "Video01.mp4", "Video_01.m4" or Video-01.mp4 in order to be processed correctly.

3.After the transcription process finishes, an error occurs when trying to add the SRT file to the Timeline as Text Events.

This error occurs because the script is looking for a different file name than the file generated by Whisper. For example: Whisper generates a srt output file called "Video_01.srt" but the script is looking for a file called "Video_01.mp4.srt" (The original file extension .mp4 is being added to the file name).

See my screen recording to understand better.

Can you please correct the script when you can, or let me know what needs to be changed to fix this.

Thanks!

bitman wrote on 12/10/2023, 9:09 AM

@Vegas_Pro_Brasil I have adapted the script to support spaces in the audio filenames, it is in version 3, you can download it from the start page in this post. It is just a one line change, you could also just adapt the v2 script (around line148):

sw.WriteLine("whisper " + myFile + modelOption); //temp remove for speed testing rest of APP

to add stuff like + "\"" in the argument, this will avoid the argument being escaped prematurely!

sw.WriteLine("whisper " + "\"" + myFile + "\"" + modelOption); //temp remove for speed testing rest of APP

 

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Vegas_Pro_Brasil wrote on 12/10/2023, 2:51 PM

Thanks @bitman. The item 2 is solved. Can you see items 1 and 3 when you can?

1.Transcription only works if the files are on the C:\ drive. If the files are on another drive, D:\ for example, transcription will not work.

3.After the transcription process finishes, an error occurs when trying to add the SRT file to the Timeline as Text Events.

This error occurs because the script is looking for a different file name than the file generated by Whisper. For example: Whisper generates a srt output file called "Video_01.srt" but the script is looking for a file called "Video_01.mp4.srt" (The original file extension .mp4 is being added to the file name).

See my screen recording to understand better.

bitman wrote on 12/11/2023, 7:30 AM

@Vegas_Pro_Brasil 

Version 4 should fix your issues! See post start.

Latest Update 11/12/2023:

I made a small update (but also a big improvement and bug fix for some users @Joelson) to support speech to text when the drive location of the audio media is not located on the same drive as the Vegas project.

By the way, text to speech media and Vegas project on the same, but another drive than C: did work in the previous versions (I tested this, hence some confusion), but apparently not when the Vegas project itself was on different drive then the Vegas media...

Last changed by bitman on 12/11/2023, 7:31 AM, changed a total of 1 times.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Vegas_Pro_Brasil wrote on 12/11/2023, 12:54 PM

@bitman  Thank you for more this fix.

Now only item 3 is missing.

3.After the transcription process finishes, an error occurs when trying to add the SRT file to the Timeline as Text Events.

This error occurs because the script is looking for a different file name than the file generated by Whisper. For example: Whisper generates a srt output file called "Video_01.srt" but the script is looking for a file called "Video_01.mp4.srt" (The original file extension .mp4 is being added to the file name).

See my screen recording to understand better.

It is strange that the script looks for the name of a file other than the file generated by Whisper to insert the SRT file as a text event in the timeline. This is why the error below occurs.

Whisper generates > File name + SRT Extension
The script searches > File name + File extension + SRT Extension

To fix this, the script needs to look for the correct file generated by Whisper. In this case: File name + SRT Extension.

I'm on the second day trying to find a solution for this but without success. 😂😂😂

bitman wrote on 12/11/2023, 2:29 PM

@Vegas_Pro_Brasil Strange, but the filename + file type extension + SRT extension is the correct way of the script. On my PC, whisper generates the above, and script uses the above and it just works...

Last changed by bitman on 12/11/2023, 2:29 PM, changed a total of 1 times.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Vegas_Pro_Brasil wrote on 12/11/2023, 2:57 PM

@bitman

It really is strange. See my screen recording. The file generated by Whisper does not have the extension of original file. It's just the original File Name + .srt Extension.

The Script message shows that it is looking for File Name + File Extension + .srt Extension.

I never changed the Whisper settings. Maybe it's a language problem.

Do you know how I can change the script so it works for me in this situation? I've tried Chat GPT , Git Hub, Stack Overflow and almost all over the internet and I haven't figured out how to modify the script to make it work for me. 😂😂😂

bitman wrote on 12/11/2023, 3:19 PM

I will have a look tomorrow for a specific solution for you if possible, it is getting late in Belgium's timezone!

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Vegas_Pro_Brasil wrote on 12/11/2023, 7:39 PM

Thanks @bitman

I'll be anxiously waiting and hoping there's a solution. Your script is very good and was very well written. He shows the step by step in detail and I'm learning a lot by watching his code.

bitman wrote on 12/12/2023, 7:47 AM

@Vegas_Pro_Brasil I have a version v5 of the script ready. It should work for both our machines,

  • for those that whisper saves filename + media type extension + .srt
  • for those that whisper saves filename + .srt (without media type extension .wav .mp4 etc...)

Not sure why whisper works differently, maybe you have an other version or a different install of all the stuff that is needed to make whisper work.

Anyway, solution was to copy the .srt text file without file extension into an .srt file with media type extension so the rest of the script would work (but only in case the file did not exist via a stripping .srt and reconstruct the full path + mediatype + .srt)

Last changed by bitman on 12/12/2023, 7:57 AM, changed a total of 1 times.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Vegas_Pro_Brasil wrote on 12/12/2023, 8:42 AM

@bitman Uhuuuuu!

It works perfectly fine now!!! You're one of those people who gets it right the first time. Congratulations! It looks great!

I have a small question: How do I disable the option long subtitles are automatically split on a newline after 9 words?

I know how to configure the Whisper to use the native options --max_line_widtht and --max_line_count and so I won't need of this option.

bitman wrote on 12/12/2023, 9:08 AM

@Vegas_Pro_Brasil You owe me a beer!

around line 649 in the v5 script if you open it with the free notepad++, you see the following:

            if (spaces == 9)  //seems optimal for ENGLISH

You can increase 9 with a higher number; this will allow more spaces in the line of text (crude method used to detect sentence length) before a newline is issued.

Last changed by bitman on 12/12/2023, 9:09 AM, changed a total of 1 times.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Vegas_Pro_Brasil wrote on 12/12/2023, 8:44 PM

@bitman See this screenshot bellow.

The Track 01 are Text Events created using the native Vegas "Import Subtitles from File" option. (Vegas use the original lenghts of srt file.)

The Track 02 are Text Events created using the script. (Apparently, in some situations the script makes changes to the length of Text Events created from the srt file.)

How do I make the script create Text Events with the original lenght of the srt file without modifications? so that it has exactly the same duration as in the srt file. It is possible?

jetdv wrote on 12/12/2023, 9:26 PM

@Vegas_Pro_Brasil, One thing you probably need to do is make sure your timeline timecode format matches the SRT file (i.e. Time). See if that makes a difference. If it does, the script can change to that format and then change back to the current format at the end as we did with the other scripts you were working with.

public RulerFormat OrgRulerFormat;
            OrgRulerFormat = myVegas.Project.Ruler.Format;
            myVegas.Project.Ruler.Format = RulerFormat.Time;
            myVegas.UpdateUI();
                myVegas.Project.Ruler.Format = OrgRulerFormat;

 

Vegas_Pro_Brasil wrote on 12/13/2023, 5:41 AM

@jetdv

The srt file is correct. It is only when it is imported as text events by the script that the change occurs. If the same srt file is imported as text events directly by Vegas, everything is normal. I think this occurs because at the time @bitman wrote the script there was a lot of inconsistency in the duration times generated by Whisper, and he tried to correct this as best as possible, but today with updates this practically no longer happens.

I just need to know how to configure the script to import the srt file as text events with the original times. You know I'm kind of dumb with scripts and I get lost without the right guidance.

bitman wrote on 12/13/2023, 8:21 AM

@Vegas_Pro_Brasil I have added a new script "Whisper STT RAW v1 (see beginning of post), this is basically the same v5 script, but omits the word wrap optimization's after 9 words, and as such keeps the original srt layout.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Vegas_Pro_Brasil wrote on 12/13/2023, 9:37 AM

Hi @bitman

I tested the Whisper STT RAW script.

I did a little test with Whisper new "word-Level" feature and this was the result. For some reason the script still imports the srt file into the timeline with different times of the original. See my screen record bellow.

I'm sending the video I used in the test and also the generated files for you to check.

https://drive.google.com/file/d/1ocMZRCJ-kK85aVS2K_Z-Es6TxfiGWLaB/view?usp=sharing

Vegas_Pro_Brasil wrote on 12/13/2023, 7:44 PM

@bitman

I have great news.

The problem was caused by the time format of the timeline. When I changed the timeline format to "Time" the problem was resolved.

Is it possible to modify the script so that it is not necessary to change the timeline format to Time?

If you want the Whisper STT RAW script it is not necessary. Because this works well in Whisper Speech To Text V5

jetdv wrote on 12/13/2023, 9:17 PM

Here's the changes needed to switch it to "Time" and then back to whatever it was:

https://www.vegascreativesoftware.info/us/forum/speech-to-text-via-whisper-openai--137928/?page=3#ca900398