"Subscription and cancellation conditions: VEGAS Pro 365 will be available immediately after payment and activation. The charge for the minimum term is payable as a single sum upon conclusion of the contract. The minimum term begins on the date of purchase. The contractual period of VEGAS Pro 365 will be automatically extended by one month at a time until you cancel the agreement You will be informed well in advance if the extension rate or taxes included change. A cancellation is possible up to 1 day before the end of the contract period. To cancel the contract, please send an email stating your customer number to: infoservice@magix.net"
That would definitely take a lot of work as it's written in Python. And the "whisper" portion is also in Python but is not compiled to a DLL that any other app could just access to call the routines. Not saying it can't be done. But it would take a lot of time and effort.
Former user
wrote on 10/11/2022, 6:34 PM
@jetdv I don't understand any of it, I"m waiting for him to list detailed instructions for Windows installation. It seems that it can be used as a standalone and be used for any NLE, but integration into various editors is the difficult part. Thanks for having a look!
Hello everyone, new member to this forum. I also want to test the Speech to Text feature on the Vegas Pro 365 for many languages, but there is a huge difference in the price between monthly and annual subscription. I am tempted to pay for the annual but there is no money back guarantee in case it doesn't meet my expectations. I saw in another post that the Speech to Text is based on the Microsoft Azure which is not cheap so is there a limitation on how many hours I can use the Vegas Pro every month? Do you think the accuracy of the transcription is satisfactory for other languages besides English?
There's no hourly limitation that I can find in Vegas. There does seem to be a max clip length of 40 min at once (you can cut it into two pieces and do it twice). I think it's good for non-English languages though have only tested it with Japanese.
I'm also using Subtitle Edit with Whisper after learning about it here. The analysis takes place on your computer rather than a server so it's slow (1 min audio takes at least 2 min to process) but there are no length limits. Be sure to install the beta of Subtitle Edit which fixes formatting issues with Whisper output.
Or use use my Vegas script to use Whisper, not nearly as handy as Vegas 365 (due to the effort it takes to install Whisper), but it gets the job done and in some ways better, but requires a lot of things to install and configure. However, once Whisper is installed (and all the supporting software such as python and other stuff) are installed, the Vegas script using whisper is even easier to use as Vegas 365. Obviously if you are not comfortable in dealing with all the hassle and trouble of all the stuff required to run whisper it is no brainer to go for Vegas 365, See my post:
@jetdv have you tried my script yet? By the way, I must thank you as I learned a few things from watching your videos on Vegas scripting in order to figure out a few things while I wrote the script!
@RogerS thanks for the link about Whisper AI and the Subtitle Edit. I had an old version with VOSK only and it wasn't that good, but with the latest version I see some improvements. The Whisper option doesn't seem to work for me on version 3.6.10 and 3.6.11 Beta. It is probably an issue on my computer so I need to investigate.
I am interested in transcribing Japanese videos and I have been testing this YouTube trailer on different applications just to see which one does a perfect job. It has a Japanese subtitles on YouTube so I can use that as a reference for perfection if I can reproduce it successfully using transcribing tools. I would appreciate it if anyone can take the time to transcribe it using Vegas Pro 20 360, as I can't do it without paying first annually or monthly.
If there is another tool that does a better job than Vegas Pro 20 360, I would really appreciate the effort.
Whisper works in Subtitle Edit. Install .10 and then paste the beta contents over it (which is still .10) . Download the model (base one is fine) and give it a try!
I mainly do Japanese and English actually.
If you can get me a link to a video I can download I'd be willing to transcribe it through Vegas for you. Please send me a message on this forum with the link.
I like Whisper as it goes past 40 minutes and Subtitle Edit's tools are first-rate for fixing up the files. I'd try Bitman's script but it's just too many steps for me to get through.
I am going to try @bitman script tomorrow and hopefully I will manage to get it to work. Would be interesting to see how it works on the Vegas Pro trial version.
@RogerS If you have a GPU with memory at least 4GB, you can lower the transcription time by making Whisper transcibe using the GPU instead of using the CPU, and using PowerShell. I bought 8GB GPU and installed it today and it can transcribe a Japanese video of 1 hour in 30 minutes or less, depending on how much speech there is, using the --model medium. You mentioned that 1 minute audio takes you 2 minutes to process. Using a GPU you can expect 1 minute audio to process in 30 seconds or less.
Former user
wrote on 12/22/2022, 7:28 PM
@bitman@Subtitler22 or anyone else using the Vegas Btiman version of whisper, if you have time, could you download the clip and add the subtitles your version creates. Don't modify anything, I"m interested in raw outputs. Yellow = Davinci Resolve version (standalone mode), and white is subtitle edit (actually you could paste your .srt sub file instead)
Watch this clip, don't read the text under the clip.
.
The story as I understand it (Don't speak Russian) The Russians have developed new radar software for their anti missile systems to identify and destroy a NATO Missile called HIMARS. They have been training on drones, at this point they have not shot any HIMARS out of the sky.
I don't think either subtitles gets that across properly. Resolve version seems to absorb a lot of speech then spit out a paraphrased version all at once, while subtitle edit gives a better real time output, but I think it's fault is the dual translating. Currently Subtitle edit is transcribing to Russian, then using Google Translate to convert to English. It sounds like a really stupid idea, 2 AI's that are not perfect combining together to create more errors. I used the same dictionary for both (medium multilingual), the error with NATO and HIMARS is surely the doing of google translate.
If you have a GPU with memory at least 4GB, you can lower the transcription time by making Whisper transcibe using the GPU instead of using the CPU, and using PowerShell. I bought 8GB GPU and installed it today and it can transcribe a Japanese video of 1 hour in 30 minutes or less, depending on how much speech there is, using the --model medium. You mentioned that 1 minute audio takes you 2 minutes to process. Using a GPU you can expect 1 minute audio to process in 30 seconds or less.
I do have GPUs with more than 4GB of ram. When I read into Whisper's implementation with Subtitle Edit it said it was CPU only. I was wondering if there was a way to leverage the GPU. How do you do that? Powershell looks like a command prompt so does this not involve Subtitle Edit at all?
Whisper doesn't use the CPU very efficiently, only using 4 cores. I figured out I can run 3 at once to max out my CPU and with the batch feature could leave it running all day. As a bonus it heated my room without the use of a heater (drawing about 200 watts for ~10 hours straight).
Former user
wrote on 12/23/2022, 12:25 AM
@RogerS Try the resolve Nvidia Windows version in stand alone mode. It is a local installation(extract files to directory and run the exe.) so won't interfere with your other software. https://github.com/octimot/StoryToolkitAI/releases
I have found the multilingual english dictionary (about 1.5gb) to be a good all rounder for transcribing and translating to english. It's alpha software so has bugs. For example when trying to get a transcript for a 3.5 hour video only English-small would get to the end, medium and large dictionaries as well as others failed at various times into the video ranging between 1 and 2 hours. I"m not sure sure if the problem is with this fork or universal
Best though would be to test Bitman's Vegas compatible version but the install sure looks like a process if you're not familiar with python. When the Windows version is here ALL Vegas users can rejoice.