how can I test speech to text?

Mindmatter wrote on 10/10/2022, 7:50 AM

HI all
none of the trial versions include the speech to text feature. So how can I test it?

Thanks!

AMD Ryzen 9 5900X, 12x 3.7 GHz
32 GB DDR4-3200 MHz (2x16GB), Dual-Channel
NVIDIA GeForce RTX 3070, 8GB GDDR6, HDMI, DP, studio drivers
ASUS PRIME B550M-K, AMD B550, AM4, mATX
7.1 (8-chanel) Surround-Sound, Digital Audio, onboard
Samsung 970 EVO Plus 250GB, NVMe M.2 PCIe x4 SSD
be quiet! System Power 9 700W CM, 80+ Bronze, modular
2x WD red 6TB
2x Samsung 2TB SSD

Comments

jetdv wrote on 10/10/2022, 8:23 AM

Upload your audio file here and let someone that has a subscription version send you the results?

RogerS wrote on 10/10/2022, 8:31 AM

Do Vegas 365 for one month?

vkmast wrote on 10/10/2022, 9:01 AM

FYI, VEGAS Pro 365 monthly subscription:

"Subscription and cancellation conditions: VEGAS Pro 365 will be available immediately after payment and activation. The charge for the minimum term is payable as a single sum upon conclusion of the contract. The minimum term begins on the date of purchase. The contractual period of VEGAS Pro 365 will be automatically extended by one month at a time until you cancel the agreement You will be informed well in advance if the extension rate or taxes included change. A cancellation is possible up to 1 day before the end of the contract period. To cancel the contract, please send an email stating your customer number to: infoservice@magix.net"

https://www.vegascreativesoftware.com/us/product-comparison/#productMenu

Mindmatter wrote on 10/10/2022, 2:52 PM

good idea, thanks all!

AMD Ryzen 9 5900X, 12x 3.7 GHz
32 GB DDR4-3200 MHz (2x16GB), Dual-Channel
NVIDIA GeForce RTX 3070, 8GB GDDR6, HDMI, DP, studio drivers
ASUS PRIME B550M-K, AMD B550, AM4, mATX
7.1 (8-chanel) Surround-Sound, Digital Audio, onboard
Samsung 970 EVO Plus 250GB, NVMe M.2 PCIe x4 SSD
be quiet! System Power 9 700W CM, 80+ Bronze, modular
2x WD red 6TB
2x Samsung 2TB SSD

Former user wrote on 10/10/2022, 6:34 PM

@jetdv are you able to do anything with this github, as far as making it compatible with Vegas?

https://github.com/octimot/StoryToolkitAI

jetdv wrote on 10/10/2022, 6:47 PM

@Former user, I'll take a look at it.

jetdv wrote on 10/11/2022, 7:41 AM

That would definitely take a lot of work as it's written in Python. And the "whisper" portion is also in Python but is not compiled to a DLL that any other app could just access to call the routines. Not saying it can't be done. But it would take a lot of time and effort.

Former user wrote on 10/11/2022, 6:34 PM

@jetdv I don't understand any of it, I"m waiting for him to list detailed instructions for Windows installation. It seems that it can be used as a standalone and be used for any NLE, but integration into various editors is the difficult part. Thanks for having a look!

jetdv wrote on 10/12/2022, 8:02 AM

@Former user, let me know when he does post the detailed instructions. I'd also be interested in looking at that.

Subtitler22 wrote on 12/15/2022, 6:48 AM

Hello everyone, new member to this forum. I also want to test the Speech to Text feature on the Vegas Pro 365 for many languages, but there is a huge difference in the price between monthly and annual subscription. I am tempted to pay for the annual but there is no money back guarantee in case it doesn't meet my expectations.
I saw in another post that the Speech to Text is based on the Microsoft Azure which is not cheap so is there a limitation on how many hours I can use the Vegas Pro every month?
Do you think the accuracy of the transcription is satisfactory for other languages besides English?

 

RogerS wrote on 12/15/2022, 11:27 PM

There's no hourly limitation that I can find in Vegas. There does seem to be a max clip length of 40 min at once (you can cut it into two pieces and do it twice). I think it's good for non-English languages though have only tested it with Japanese.

I'm also using Subtitle Edit with Whisper after learning about it here. The analysis takes place on your computer rather than a server so it's slow (1 min audio takes at least 2 min to process) but there are no length limits. Be sure to install the beta of Subtitle Edit which fixes formatting issues with Whisper output.

Last changed by RogerS on 12/15/2022, 11:29 PM, changed a total of 1 times.

Custom PC (2022) Intel i5-13600K with UHD 770 iGPU with latest driver, MSI z690 Tomahawk motherboard, 64GB Corsair DDR5 5200 ram, NVIDIA 2080 Super (8GB) with latest studio driver, 2TB Hynix P41 SSD and 2TB Samsung 980 Pro cache drive, Windows 11 Pro 64 bit https://pcpartpicker.com/b/rZ9NnQ

ASUS Zenbook Pro 14 Intel i9-13900H with Intel graphics iGPU with latest ASUS driver, NVIDIA 4060 (8GB) with latest studio driver, 48GB system ram, Windows 11 Home, 1TB Samsung SSD.

VEGAS Pro 21.208
VEGAS Pro 22.239

Try the
VEGAS 4K "sample project" benchmark (works with VP 16+): https://forms.gle/ypyrrbUghEiaf2aC7
VEGAS Pro 20 "Ad" benchmark (works with VP 20+): https://forms.gle/eErJTR87K2bbJc4Q7

bitman wrote on 12/16/2022, 7:24 AM

@Mindmatter you can always try out the Microsoft azure demo itself to see how it would be more or less in Vegas Pro 365 (as VP uses it)

https://azure.microsoft.com/en-us/products/cognitive-services/speech-to-text/#overview

APPS: VIDEO: VP 365 suite (VP 22 build 194) VP 21 build 315, VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 17 HDpro XXL, Boris Continuum 2025, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 18, Spectral Layers Pro 10, Audacity, FOTO: Zoner studio X, DXO photolab (8), Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 24H2 (since October 2024)
  • CPU: i9-13900K (upgraded my former CPU i9-12900K),
  • Air Cooler: Noctua NH-D15 G2 HBC (September 2024 upgrade from Noctua NH-D15s)
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

bitman wrote on 12/16/2022, 7:32 AM

@Mindmatter @Former user

Or use use my Vegas script to use Whisper, not nearly as handy as Vegas 365 (due to the effort it takes to install Whisper), but it gets the job done and in some ways better, but requires a lot of things to install and configure. However, once Whisper is installed (and all the supporting software such as python and other stuff) are installed, the Vegas script using whisper is even easier to use as Vegas 365. Obviously if you are not comfortable in dealing with all the hassle and trouble of all the stuff required to run whisper it is no brainer to go for Vegas 365, See my post:

https://www.vegascreativesoftware.info/us/forum/speech-to-text-via-whisper-openai--137928/#ca863141

@jetdv have you tried my script yet? By the way, I must thank you as I learned a few things from watching your videos on Vegas scripting in order to figure out a few things while I wrote the script!

Last changed by bitman on 12/16/2022, 7:55 AM, changed a total of 2 times.

APPS: VIDEO: VP 365 suite (VP 22 build 194) VP 21 build 315, VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 17 HDpro XXL, Boris Continuum 2025, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 18, Spectral Layers Pro 10, Audacity, FOTO: Zoner studio X, DXO photolab (8), Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 24H2 (since October 2024)
  • CPU: i9-13900K (upgraded my former CPU i9-12900K),
  • Air Cooler: Noctua NH-D15 G2 HBC (September 2024 upgrade from Noctua NH-D15s)
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Subtitler22 wrote on 12/16/2022, 8:15 AM

@RogerS thanks for the link about Whisper AI and the Subtitle Edit. I had an old version with VOSK only and it wasn't that good, but with the latest version I see some improvements. The Whisper option doesn't seem to work for me on version 3.6.10 and 3.6.11 Beta. It is probably an issue on my computer so I need to investigate.

I am interested in transcribing Japanese videos and I have been testing this YouTube trailer on different applications just to see which one does a perfect job. It has a Japanese subtitles on YouTube so I can use that as a reference for perfection if I can reproduce it successfully using transcribing tools. I would appreciate it if anyone can take the time to transcribe it using Vegas Pro 20 360, as I can't do it without paying first annually or monthly.

If there is another tool that does a better job than Vegas Pro 20 360, I would really appreciate the effort.

Subtitler22 wrote on 12/16/2022, 8:21 AM

@Mindmatter you can always try out the Microsoft azure demo itself to see how it would be more or less in Vegas Pro 365 (as VP uses it)

https://azure.microsoft.com/en-us/products/cognitive-services/speech-to-text/#overview

Thanks I tried it few days ago and it seems to do a good job.

RogerS wrote on 12/16/2022, 9:07 AM

Whisper works in Subtitle Edit. Install .10 and then paste the beta contents over it (which is still .10) . Download the model (base one is fine) and give it a try!

I mainly do Japanese and English actually.

If you can get me a link to a video I can download I'd be willing to transcribe it through Vegas for you. Please send me a message on this forum with the link.

I like Whisper as it goes past 40 minutes and Subtitle Edit's tools are first-rate for fixing up the files. I'd try Bitman's script but it's just too many steps for me to get through.

Subtitler22 wrote on 12/16/2022, 9:58 AM

Thanks for suggestion. I posted above a link to a Japanese trailer above. I really appreciate it.

RogerS wrote on 12/16/2022, 10:09 AM

I don't see a link, I see an embedded video. I can't get that into Vegas.

Subtitler22 wrote on 12/16/2022, 1:58 PM

I am going to try @bitman script tomorrow and hopefully I will manage to get it to work. Would be interesting to see how it works on the Vegas Pro trial version.

Subtitler22 wrote on 12/17/2022, 7:47 AM

I managed to make Whisper works on my computer using @bitman instructions and got the trailer transcription OK. It was really worth the effort.

Subtitler22 wrote on 12/22/2022, 1:31 PM

@RogerS If you have a GPU with memory at least 4GB, you can lower the transcription time by making Whisper transcibe using the GPU instead of using the CPU, and using PowerShell.
I bought 8GB GPU and installed it today and it can transcribe a Japanese video of 1 hour in 30 minutes or less, depending on how much speech there is, using the --model medium.
You mentioned that 1 minute audio takes you 2 minutes to process. Using a GPU you can expect 1 minute audio to process in 30 seconds or less.

Former user wrote on 12/22/2022, 7:28 PM

@bitman @Subtitler22 or anyone else using the Vegas Btiman version of whisper, if you have time, could you download the clip and add the subtitles your version creates. Don't modify anything, I"m interested in raw outputs. Yellow = Davinci Resolve version (standalone mode), and white is subtitle edit (actually you could paste your .srt sub file instead)

Watch this clip, don't read the text under the clip.

.

The story as I understand it (Don't speak Russian) The Russians have developed new radar software for their anti missile systems to identify and destroy a NATO Missile called HIMARS. They have been training on drones, at this point they have not shot any HIMARS out of the sky.

I don't think either subtitles gets that across properly. Resolve version seems to absorb a lot of speech then spit out a paraphrased version all at once, while subtitle edit gives a better real time output, but I think it's fault is the dual translating. Currently Subtitle edit is transcribing to Russian, then using Google Translate to convert to English. It sounds like a really stupid idea, 2 AI's that are not perfect combining together to create more errors. I used the same dictionary for both (medium multilingual), the error with NATO and HIMARS is surely the doing of google translate.

 

RogerS wrote on 12/22/2022, 8:35 PM

If you have a GPU with memory at least 4GB, you can lower the transcription time by making Whisper transcibe using the GPU instead of using the CPU, and using PowerShell.
I bought 8GB GPU and installed it today and it can transcribe a Japanese video of 1 hour in 30 minutes or less, depending on how much speech there is, using the --model medium.
You mentioned that 1 minute audio takes you 2 minutes to process. Using a GPU you can expect 1 minute audio to process in 30 seconds or less.

I do have GPUs with more than 4GB of ram. When I read into Whisper's implementation with Subtitle Edit it said it was CPU only. I was wondering if there was a way to leverage the GPU. How do you do that? Powershell looks like a command prompt so does this not involve Subtitle Edit at all?

Whisper doesn't use the CPU very efficiently, only using 4 cores. I figured out I can run 3 at once to max out my CPU and with the batch feature could leave it running all day. As a bonus it heated my room without the use of a heater (drawing about 200 watts for ~10 hours straight).

Former user wrote on 12/23/2022, 12:25 AM

@RogerS Try the resolve Nvidia Windows version in stand alone mode. It is a local installation(extract files to directory and run the exe.) so won't interfere with your other software. https://github.com/octimot/StoryToolkitAI/releases

I have found the multilingual english dictionary (about 1.5gb) to be a good all rounder for transcribing and translating to english. It's alpha software so has bugs. For example when trying to get a transcript for a 3.5 hour video only English-small would get to the end, medium and large dictionaries as well as others failed at various times into the video ranging between 1 and 2 hours. I"m not sure sure if the problem is with this fork or universal

Best though would be to test Bitman's Vegas compatible version but the install sure looks like a process if you're not familiar with python. When the Windows version is here ALL Vegas users can rejoice.