Speech to Text via Whisper openAI

Comments

bitman wrote on 12/14/2023, 4:39 AM

@Vegas_Pro_Brasil @jetdv Good find, I never realized that one of your issues issue was related to a specific user setting of the timeline timecode. Mine was set to yet another variant: "Time & Frames", this one however pretty much behaved like "Time" for the scripts. To avoid issues and to improve the script, I have added @jetdv code to set the timeline timecode to "Time" when adding subtitles, and restoring the user preference after the insert.

See start of this post for script version v6 (and the RAW variant v2), I have removed the links of the older scripts to clean up a bit.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Vegas_Pro_Brasil wrote on 12/14/2023, 8:19 AM

@bitman

I think now I owe you a case of beer and another case to @jetdv. Everything working as expected now. Thank you very much to you and Edward.

 

zzzzzz9125 wrote on 12/14/2023, 10:28 AM

hey, I've found two problems.

1. In Vegas Pro 16 and before, the script can't generate Titles & Text events properly for its GUID has been changed after 17. Just change {Svfx:com.vegascreativesoftware:titlesandtext} in your script to {Svfx:com.sonycreativesoftware:titlesandtext}, so that it can be used in 16 and before, without affecting the functionality in newer versions.

2. When I click the Balanced button, it can generate the .srt file (with other files) normally, but when I click Draft(fast), nothing is generated in the folder. I don't know what's going on.

Last changed by zzzzzz9125 on 12/14/2023, 10:28 AM, changed a total of 1 times.

Using VEGAS Pro 22 build 93 & VEGAS Pro 21 build 208.

Information about my PC:
Brand Name: HP VICTUS Laptop
System: Windows 11.0 (64-bit) 10.00.22631
CPU: 12th Gen Intel(R) Core(TM) i7-12700H
GPU: NVIDIA GeForce RTX 3050 Laptop GPU
GPU Driver: NVIDIA Studio Driver 560.70

bitman wrote on 12/14/2023, 11:56 AM

@zzzzzz9125 Thanks, useful information for those who are at Vegas pro 16 (or even older versions), feel free to change the script for yourselves, but I am not going to do it, for the simple reason that I cannot test it anymore on version 16; I upgrade every year and keep maximum 1 or 2 prior versions after the update; oldest version -1 usually get the axe when I upgrade!

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

jetdv wrote on 12/15/2023, 9:14 AM

@bitman, for my scripts that need to work in "14 and newer", I did this:

            if (myVegas.Version.Contains("14") || myVegas.Version.Contains("15") || myVegas.Version.Contains("16"))
            {
                genUID = "{Svfx:com.sonycreativesoftware:titlesandtext}"; //Sony Titles & Text
            }
            else
            {
                genUID = "{Svfx:com.vegascreativesoftware:titlesandtext}"; //Magix Titles & Text
            }

 

bitman wrote on 12/18/2023, 2:51 AM

@jetdv Thanks, I added the code in v7

Update 18/12/2023: "Whisper Speech To Text v7"

  • add backward compatibility for (old "Sony" Vegas versions UI plugin naming) scripting in Vegas 14,15,16 (Note: only tested in Vegas 21 - not tested in 14, 15 or 16)

@zzzzzz9125 version v7 for you to try, I could not test it on these older non Magix Vegas versions!

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Vegas_Pro_Brasil wrote on 12/18/2023, 6:49 AM

@bitman @jetdv @zzzzzz9125

I tested it on Vegas Pro 14 that I have installed on my system and everything works correctly.

I have been studying this code of this script patiently and realized that it is very well designed.

Bitman took care to make it compatible with versions of Vegas where it is not possible to import srt files natively as text events and this is a very good thing, as it shows respect for users who have older versions

To my surprise, I managed to adapt this script to work with a Whisper variant called Whisper-Faster which is about 5 times faster than the standard Whisper and everything is working correctly.

I created a Whisper-Faster Windows installer and adapted the script created by Bitman and too other scripts and presets to work in Vegas. Below is a small demonstration of how it works. There are still some small improvements left but it is almost finished.

 

bvideo wrote on 12/19/2023, 3:18 PM

Fantastic!

pierre-k wrote on 12/20/2023, 4:36 AM

I know that if I want to export subtitles from Vegas to srt or sub, I have to use Vegasaur.

Has anything changed for the better over the years? How do you export these subtitles?

Vegas_Pro_Brasil wrote on 1/15/2024, 3:32 AM

Hi @bitman

I was testing the script and for some reason I don't know, the transcription process just stops on its own after a few minutes. For example a 15 minute video always stops at 23%-24%. Have you ever faced this problem? What can I do to resolve it?

bitman wrote on 1/15/2024, 9:22 AM

@Vegas_Pro_Brasil I do not really use speech to text (I do use a lot of text to speech), the whisper thing was just a fun project I wanted to try out, so I do not recall I tried any longer audio than a few minutes. So It may have always been an issue, I am not sure. Anyway I think it is aways safer to cut stuff (video or audio) in smaller chunks which are more manageable and easier on your system to process.

On the other hand, there may be some internal script process "safety" timing I do not know off in the Vegas scripting engine which may time out if the script engine is idling too long whilst whisper is busy processing. If that is the case, the script may need extra code to keep itself busy!

Something @jetdv can maybe shed a light (if scripts can time out when idling).

Last changed by bitman on 1/15/2024, 9:23 AM, changed a total of 1 times.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

jetdv wrote on 1/15/2024, 10:41 AM

The script, itself, should have no issue. However, newer versions of Vegas have started including something that checks if there's no responses so Vegas can properly shut down if it gets stuck so that might be coming into play depending on the version used.

Kilo wrote on 5/17/2024, 9:48 PM

Hello! I'm hoping you can do me a huge favor. I followed your tutorial on how to do auto subtitles in Vegas pro without the 365 service and after many hours of trial and error (my fault) I got it to work. Now I am seeing another problem which I never expected unfortunately! I make vertical videos and the subtitles I create are only 1-3 words long not entire sentences like whisper AI outputs. I've seen some people online suggesting changing the source code but I'm not sure I can do that since I downloaded it from the console like your tutorial showed. Do you have any suggestions on how i should fix this problem. Thanks a lot for any help you can give me i know your times valuable so i truly appreciate it

bitman wrote on 5/18/2024, 2:43 AM

@Kilo if you take the latest script (Whisper Speech To Text v7) and open it with notepad++ then go to line 669, there is a statement to check for spaces between words to determine the number of words before a line break. It is currently at value 9, meaning 9 words, which seems optimal for English in normal horizontal aspect. For vertical videos you can perhaps lower this value to 2 or 3... Feel free to experiment!

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Kilo wrote on 5/18/2024, 3:59 PM

Thank you so much bitman, I know you don't have to help but just know this is a tremendous time saver for me and my business. Unfortunately I can not get it to work. when I change line 669 of the code it seems to make the words go below each other

kind of

like this

but i was hoping that I would have 2-3 words per text media generator on Vegas pro. I'm not sure if this is possible but if you have any suggestions I would be forever grateful! Thanks again!

Kilo wrote on 5/20/2024, 3:00 PM

Ive spent days trying to figure this out and have made almost no progress. I need to find a way to shorten the amount of words to 2-3 per vegas pro text section! If anyone can help in any way or even just suggest a possible solution i would do ANYTHING! ILL FOR REAL PAY SOMEONE! pls help <3

bitman wrote on 5/21/2024, 1:07 PM

@Kilo To limit the number of words without newline breaks, then the whisper generated .SRT internal contend needs to be split (including timestamps) before import. This can be achieved by extra new code in the script, or separately by an external application.

This could be a nice project, but I do not have the time to do it now. It is more like a winter project!

 If you take the latest script (Whisper Speech To Text v7) and open it with notepad++ then go to line 295, you can change the font and lower the font size so you can cram more words on your vertical videos... Not really a solution, but I suppose all small bits help!

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Kilo wrote on 5/22/2024, 1:42 AM

After awhile of messing around i found that if you add

"sw.WriteLine("whisper " + "\"" + myFile + "\"" + " --word_timestamps True --max_line_width 16 --max_line_count 1 " + modelOption);"

On line 162 instead of what is currently on 162 in V7 it will create srt. files that has sections of no more than 16 characters (or whatever you change that number to)

Thank you to bitman again, if you ever do continue this project i hope this helps u slightly. I may continue to improve this script to add a few other easy customizations in the future and i will post here with the results if i have any positive ones

bitman wrote on 5/22/2024, 1:02 PM

@Kilo Thanks for the tip, I suppose vertical videos as a final delivery are getting more popular for smartphone end use!

As a side note, it is a PIA to camouflage and integrate vertical video into 'normal' horizontal FHD aspect in a more or less visually satisfying fashion.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

bitman wrote on 6/28/2024, 8:11 AM

Some new Vegas 21 builds (e.g build 315) may have issues with my script v7, I have a new version available v8 which checks on the Vegas main version only rather than main version + build version, otherwise it gets confused with Sony :)

here is the link to v8

https://www.dropbox.com/scl/fi/voojqanq3cz0a7le31e2r/Whisper-Speech-To-Text-v8.cs?rlkey=xeq7v2zlqk8ir0aq154ewv0sw&dl=0

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2