Speech to Text via Whisper openAI

bitman wrote on 11/11/2022, 9:40 AM

Some good news, I have been diving into an alternative to "Vegas Pro Speech to Text". Although it is a very fine feature and easy to use, it has some drawbacks: first of all, it is a Vegas 365 - only feature (I hope this may change). Lots of people will not be able to use it - those without a subscription. Like all AI based stuff it is dependent on the model and results may vary. It also lacks a way to tune for quality and language. I suspect it also being more favorable to English.

So here is an alternative: Whisper openAI.

I have created a simple Vegas script to call whisper and convert speech to text. Just place the cursor over an event on the timeline and the script will create result files with text. In a future version I can extend this to create subtitles from these result files on the timeline, feel free to add this or add more of the whisper capabilities like quality, language and translation options. Refer to the document on Whisper at the bottom of this post.

Here is the link to the Vegas script called "Whisper Speech to Text":

https://www.dropbox.com/s/8gpb8w1fjj9bnt1/Whisper%20Speech%20To%20Text.cs?dl=0

The only caveat is that it requires quite a bit of effort to get whisper installed, it depends on Python, GIT, FFmpeg, etc. and setting of environment variables. So, you need to install a bunch of supporting stuff before you can use whisper. But it is doable. For this purpose, I have put together a document on how to use and install whisper (and its dependent programs), it has all the links to get you up and running.

Here is the link to the document on Whisper openAI:

https://www.dropbox.com/s/dh62ripb58xth86/AI%20whisper.docx?dl=0

Have fun!

Last changed by bitman

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Comments

todd-b wrote on 11/11/2022, 9:33 PM

@bitman This is the Nvidia Resolve version that I"ve been testing, it will work in standalone to generate subtitles for Vegas. Maybe it would be possible for you or @jetdv to add scripting for Vegas. Playback is 2x speed . The file it's transcribing is approx 4m30s, it transcribes in 73seonds. Could you provide a transcribe benchmark for the version you're using.

And this is a Translation test. I chose a computer jargon heavy video as I thought that would show problems. I've only watched once, but only problem I noticed is "Kabylake" becomes "Capylake". Translation of 6 minute video took 68seconds. Not sure why the German translation to English was faster than English to English.

bitman wrote on 11/12/2022, 4:13 AM

@todd-b Can you provide the above media files for download so I can benchmark?

As you have the resolve version of whisper, it may already be working for the Vegas script 'Whisper Speech to Text' I posted 'as is', you can try it out.

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

todd-b wrote on 11/12/2022, 4:18 AM

@bitman https://github.com/octimot/StoryToolkitAI/releases/tag/v0.17.1

Your 3090 should eat this up

edit: oh media files, just in general interested in how well your version works. apparently there's multiple versions, and the one I"m using is slow in comparision

bitman wrote on 11/15/2022, 10:49 AM

@todd-b I did some benchmarking on speech to text: first the standard 365 Vegas speech to text, then the Vegas script I posted but with extra arguments for the different models:

for example, to change the default multilanguage model in the script to "Tiny English only" model change

sw.WriteLine("whisper " + myFile);

to

sw.WriteLine("whisper " + myFile + " --model tiny.en");

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

bitman wrote on 11/15/2022, 10:52 AM

Here are the benchmark results (and accuracy scores):

Kingfisher.wav audio file converted to text: (.srt format)
==========================================
Vegas 365 Speech to text: 42 seconds

Whisper with model argument: suffix ".en" English only 
Whisper Speech to text Vegas script (--model tiny.en): 17 seconds
Whisper Speech to text Vegas script (--model base.en): 33 seconds
Whisper Speech to text Vegas script (--model medium.en): 248 seconds

Default model whisper (no arguments)
Whisper Speech to text Vegas script (*): 82s

Note (*): default model is multi-language "base"

Last changed by bitman on 11/16/2022, 2:17 PM, changed a total of 2 times.

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

bitman wrote on 11/15/2022, 10:59 AM

The kingfisher.wav audio file was generated via Vegas's own text to speech tool (English US, female voice Jenny) using a Wikipedia text about kingfishers. I used this generated wav file as the source for the speech to text benchmarks.

You can benchmark yourself, here is the original text:

Kingfishers or Alcedinidae are a family of small to medium-sized, brightly colored birds in the order Coraciiformes. They have a cosmopolitan distribution, with most species found in the tropical regions of Africa, Asia, and Oceania but also can be seen in Europe. They can be found in deep forests near calm ponds and small rivers. The family contains 114 species and is divided into three subfamilies and 19 genera. All kingfishers have large heads, long, sharp, pointed bills, short legs, and stubby tails. Most species have bright plumage with only small differences between the sexes. Most species are tropical in distribution, and a slight majority are found only in forests.

They consume a wide range of prey usually caught by swooping down from a perch. While kingfishers are usually thought to live near rivers and eat fish, many species live away from water and eat small invertebrates. Like other members of their order, they nest in cavities, usually tunnels dug into the natural or artificial banks in the ground. Some kingfishers nest in arboreal termite nests. A few species, principally insular forms, are threatened with extinction. In Britain, the word "kingfisher" normally refers to the common kingfisher.

 

Last changed by bitman on 11/15/2022, 11:13 AM, changed a total of 1 times.

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

bitman wrote on 11/15/2022, 11:32 AM

I noticed there is a bug in Vegas 365 native speech to text transcript: a complete sentence was omitted in the .srt file:

The family contains 114 species and is divided into three subfamilies and 19 genera. 

In the kingfisher example, whisper was more accurate, did not have a missing sentence and was mostly faster (*)

(*) The speed depends on the model used as seen in the benchmark. Note that Whisper does need to download the model into cache first if the model is not being used before, which adds some seconds or minutes extra to download. However subsequent transcripts using the same model is faster as whisper does not need to download.

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

todd-b wrote on 11/15/2022, 6:30 PM

Now that's a comprehensive benchmark! 😀👍

Would you have any idea what the lowest VRAM requirement is, and does that dictate the models that can be used? The large models is 3gig, i'm guessing it needs to all stay in Vram.

I was interested in how fast/well the Vegas version worked. Wonder if it's just not as good as whisper or they compromised by choosing speed over accuracy, using a smaller AI model.

So I guess we need a windows exe version coupled with a Vegas script. The route you took is beyond most people. I was following step by step instruction for installing the python version and failed to get it to work, mainly because I was blindly following a guide but had no idea what anything did, I'm guessing something was left out, or incorrect.

bitman wrote on 11/16/2022, 2:44 AM

@todd-b I feel your frustration! It takes a bit of effort to install all of it to make whisper for Vegas work, but if you manage, you can also use it stand alone in windows. I am using windows 11 (22H2). Best is to follow the whisper document (link in the original post) which contains all the install instructions (if you have not already followed it).

One of the main reasons something does not work like python of FFmpeg after install is usually the environment variable path is not set for the application. This is typically something you sometimes must add manually (if it was not included in an install package). You cannot call an application from the windows console if you are not in the same spot (=directory) as where the application is stored - that is why you need to tell the path to the application in the environment variables so you can start the application from the console from everywhere.

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Joelson wrote on 11/16/2022, 4:56 AM

@bitman Thanks for the procedure and the script... Unfortunately I couldn't make it work here.

@todd-b The latest version of Subtitle Edit has a BETA option that uses Whisper to convert speech to text. It is necessary to follow some steps, and install Whisper using CMD to activate the option, but if you follow the instructions provided in Subtitle Edit when trying to use this option, I believe everything will work fine. If you have difficulties, let me know and I'll make a screen recording showing you step by step.

The biggest advantage of using Whisper in Subtitle Edit is that you can use all the tools of the Subtitle Edit to edit the .srt file.

todd-b wrote on 11/16/2022, 4:56 PM

@Joelson That's really well written polished software and a pleasure to use and it's still beta. And very simple installation, I love how it downloads 3rd party modules it'self and installs, with no need to restart . This is most likely the answer currently for Vegas people who want automated AI generated subs but don't want a Vegas subscription.

The biggest advantage of using Whisper in Subtitle Edit is that you can use all the tools of the Subtitle Edit to edit the .srt file.

Yeah, turning a non integration negative into a positive, fix any errors in purpose built software before export. A Whisper problem seems to be when it experiences a lot of voices at one time, it can lose sync, but if you only imported a wav file you won't know it lost sync. A low resolution video could be used instead to asses sync issues with it's build in video player, if the goal it to export a perfect .srt without the need to edit it within vegas.

https://github.com/SubtitleEdit/subtitleedit/releases

todd-b wrote on 11/18/2022, 7:36 PM

@Joelson Is it as it appears, you can't actually do direct translations via whisper in Subtitle Edit currently?

It looks like you have to use whisper to get AI generated subs in the original language, then use google translate to translate the text. There's 2 points of potential error instead of 1. I tried ticking the translate box in the whisper prompt but it says 'no text'

bitman wrote on 11/19/2022, 4:47 AM

@todd-b @Joelson translation in Vegas using my Vegas script "whisper speech to text" works with automatic translation to English (if you add -- task translate in the original script):

Below is the link to a video how the script was able to translate from German

https://www.dropbox.com/s/ms42ljnslel210p/German.mp4?dl=0

 

Last changed by bitman on 11/19/2022, 4:48 AM, changed a total of 1 times.

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

bitman wrote on 11/19/2022, 5:04 AM

@wwaag Maybe just a thought, you could add whisper to the happy otter toolset, seems like a perfect match to me! It certainly would benefit from easier installation and UI panels!

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Joelson wrote on 11/19/2022, 9:03 AM

One of the main reasons something does not work like python of FFmpeg after install is usually the environment variable path is not set for the application. This is typically something you sometimes must add manually (if it was not included in an install package). You cannot call an application from the windows console if you are not in the same spot (=directory) as where the application is stored - that is why you need to tell the path to the application in the environment variables so you can start the application from the console from everywhere.

@bitman 

I have good news. I was able to get your script to work here.

But I believe I have this problem with the environment variables you mentioned. I have two local disks (C and D). But the script only works correctly if the files are on local disk C.

Can you tell me how to fix this problem? Or send me a link showing step by step what I have to do.

I too tested the translation feature and it works fine. The video had audio in Portuguese and was translated into English. How do I choose other languages? For example, I want to translate a video that has the audio in English into Portuguese.

bitman wrote on 11/19/2022, 11:20 AM

@Joelson Good to hear! The translation (if used with the extra translate argument added) as far as I know, only goes one way: from foreign spoken language to English text...

There is a language argument you can add, but it probably only helps to analyze faster I would guess. By default (=without model arguments) the model used is the multilanguage 'base' model. This is the model I used in the original script. It will auto detect the language and is a good compromise between speed and accuracy. In my document you can find the extra model arguments.

Pure English speech is probably better served than the multilanguage default by adding a model argument:

--base.en or --tiny.en as specific English only model arguments

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

bitman wrote on 11/19/2022, 11:36 AM

@Joelson with regard to environment variables: here are a few steps you can follow:

1) on your keyboard: press windows key + s

2) this will bring up the search window, next type: system variables into the search bar.

3) the System Properties panel opens, select the advanced tab (if not already shown)

4) click on Environment Variables

5) click on path (of user variables) then click on edit button, then on a new line you can type in the full path C:\ blabla of the application .exe location such as python, FFmpeg

6) you may need to restart your PC

Note: sometimes you need to add a path in the system variables as well.

 

Last changed by bitman on 11/19/2022, 11:36 AM, changed a total of 1 times.

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Joelson wrote on 11/19/2022, 11:44 AM

@Joelson Is it as it appears, you can't actually do direct translations via whisper in Subtitle Edit currently?

It looks like you have to use whisper to get AI generated subs in the original language, then use google translate to translate the text. There's 2 points of potential error instead of 1. I tried ticking the translate box in the whisper prompt but it says 'no text'

@todd-b Just use the option Auto Translate of the Subititle Edit.

The biggest advantage of this option is that text generated by Whisper can be translated into a wide variety of languages and not just English, which is apparently the only translation available in Whisper, as @bitman mentioned.

 

 

todd-b wrote on 11/19/2022, 6:51 PM

@todd-b @Joelson translation in Vegas using my Vegas script "whisper speech to text" works with automatic translation to English (if you add -- task translate in the original script):

Below is the link to a video how the script was able to translate from German

https://www.dropbox.com/s/ms42ljnslel210p/German.mp4?dl=0

 

@bitman Very nice, but we still need a windows executable version to marry with your script to help a majority of Vegas users. I tried your script with the Resolve Nvidia windows exe, but there is no communication with it,

@todd-b Just use the option Auto Translate of the Subititle Edit.

The biggest advantage of this option is that text generated by Whisper can be translated into a wide variety of languages and not just English, which is apparently the only translation available in Whisper, as @bitman mentioned.

@Joelson But is there also a negative by using whisper to translate to original language text then google translate to convert that text to another language? I'd rather google translate not be a part of the process, although with the few translates using subtitle edit, it hasn't shown any glaring issues. I would rather whisper do it all, the thinking being only need to worry about whisper inaccuracies, not whisper + google translate inaccuracies . It's still beta though.

I would rather use subtitle edit as it makes fixing errors so easy, but this Resolve version in standalone mode, This is Russian to English.

 

bitman wrote on 11/21/2022, 3:57 AM

@todd-b Indeed it is currently not the most convenient install procedure, certainly for the lesser computer savvy which I presume is the majority of Vegas users, but it is certainly doable for those who are already dabbling with scripting. That said, I am working on a new variant of the script with some option panels.

APPS: VIDEO: VP 365 (20), VP 19 post (latest build -643), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 15 HDpro XXL, Boris Continuum 2022.5, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Audio Clean Lab 3, Sound Forge Audio Studio 16, Sound Forge Pro 14, Spectral Layers Pro 8, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 22H2
  • CPU: i9-12900K with Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2