Speech to Text via Whisper openAI

Comments

bitman wrote on 5/18/2024, 2:43 AM

@Kilo if you take the latest script (Whisper Speech To Text v7) and open it with notepad++ then go to line 669, there is a statement to check for spaces between words to determine the number of words before a line break. It is currently at value 9, meaning 9 words, which seems optimal for English in normal horizontal aspect. For vertical videos you can perhaps lower this value to 2 or 3... Feel free to experiment!

APPS: VIDEO: VP 365 suite (VP 22 build 250) VP 21 build 315, VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 17 HDpro XXL, Boris Continuum 2025, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 18, Spectral Layers Pro 10, Audacity, FOTO: Zoner studio X, DXO photolab (8), Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 24H2 (since October 2024)
  • CPU: i9-13900K with Air Cooler: Noctua NH-D15 G2 HBC
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: Gigabyte GeForce RTX 5090 Aorus Xtreme WF AIO 32GB
  • Monitor: LG UltraGear 45GX950A 44.5" WUHD 5K2K OLED monitor (21:9), Resolution: 5120x2160, 165 Hz
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Kilo wrote on 5/18/2024, 3:59 PM

Thank you so much bitman, I know you don't have to help but just know this is a tremendous time saver for me and my business. Unfortunately I can not get it to work. when I change line 669 of the code it seems to make the words go below each other

kind of

like this

but i was hoping that I would have 2-3 words per text media generator on Vegas pro. I'm not sure if this is possible but if you have any suggestions I would be forever grateful! Thanks again!

Kilo wrote on 5/20/2024, 3:00 PM

Ive spent days trying to figure this out and have made almost no progress. I need to find a way to shorten the amount of words to 2-3 per vegas pro text section! If anyone can help in any way or even just suggest a possible solution i would do ANYTHING! ILL FOR REAL PAY SOMEONE! pls help <3

bitman wrote on 5/21/2024, 1:07 PM

@Kilo To limit the number of words without newline breaks, then the whisper generated .SRT internal contend needs to be split (including timestamps) before import. This can be achieved by extra new code in the script, or separately by an external application.

This could be a nice project, but I do not have the time to do it now. It is more like a winter project!

 If you take the latest script (Whisper Speech To Text v7) and open it with notepad++ then go to line 295, you can change the font and lower the font size so you can cram more words on your vertical videos... Not really a solution, but I suppose all small bits help!

APPS: VIDEO: VP 365 suite (VP 22 build 250) VP 21 build 315, VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 17 HDpro XXL, Boris Continuum 2025, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 18, Spectral Layers Pro 10, Audacity, FOTO: Zoner studio X, DXO photolab (8), Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 24H2 (since October 2024)
  • CPU: i9-13900K with Air Cooler: Noctua NH-D15 G2 HBC
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: Gigabyte GeForce RTX 5090 Aorus Xtreme WF AIO 32GB
  • Monitor: LG UltraGear 45GX950A 44.5" WUHD 5K2K OLED monitor (21:9), Resolution: 5120x2160, 165 Hz
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Kilo wrote on 5/22/2024, 1:42 AM

After awhile of messing around i found that if you add

"sw.WriteLine("whisper " + "\"" + myFile + "\"" + " --word_timestamps True --max_line_width 16 --max_line_count 1 " + modelOption);"

On line 162 instead of what is currently on 162 in V7 it will create srt. files that has sections of no more than 16 characters (or whatever you change that number to)

Thank you to bitman again, if you ever do continue this project i hope this helps u slightly. I may continue to improve this script to add a few other easy customizations in the future and i will post here with the results if i have any positive ones

bitman wrote on 5/22/2024, 1:02 PM

@Kilo Thanks for the tip, I suppose vertical videos as a final delivery are getting more popular for smartphone end use!

As a side note, it is a PIA to camouflage and integrate vertical video into 'normal' horizontal FHD aspect in a more or less visually satisfying fashion.

APPS: VIDEO: VP 365 suite (VP 22 build 250) VP 21 build 315, VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 17 HDpro XXL, Boris Continuum 2025, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 18, Spectral Layers Pro 10, Audacity, FOTO: Zoner studio X, DXO photolab (8), Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 24H2 (since October 2024)
  • CPU: i9-13900K with Air Cooler: Noctua NH-D15 G2 HBC
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: Gigabyte GeForce RTX 5090 Aorus Xtreme WF AIO 32GB
  • Monitor: LG UltraGear 45GX950A 44.5" WUHD 5K2K OLED monitor (21:9), Resolution: 5120x2160, 165 Hz
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

bitman wrote on 6/28/2024, 8:11 AM

Some new Vegas 21 builds (e.g build 315) may have issues with my script v7, I have a new version available v8 which checks on the Vegas main version only rather than main version + build version, otherwise it gets confused with Sony :)

here is the link to v8

https://www.dropbox.com/scl/fi/voojqanq3cz0a7le31e2r/Whisper-Speech-To-Text-v8.cs?rlkey=xeq7v2zlqk8ir0aq154ewv0sw&dl=0

APPS: VIDEO: VP 365 suite (VP 22 build 250) VP 21 build 315, VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 17 HDpro XXL, Boris Continuum 2025, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 18, Spectral Layers Pro 10, Audacity, FOTO: Zoner studio X, DXO photolab (8), Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 24H2 (since October 2024)
  • CPU: i9-13900K with Air Cooler: Noctua NH-D15 G2 HBC
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: Gigabyte GeForce RTX 5090 Aorus Xtreme WF AIO 32GB
  • Monitor: LG UltraGear 45GX950A 44.5" WUHD 5K2K OLED monitor (21:9), Resolution: 5120x2160, 165 Hz
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2