Speech to Text via Whisper openAI

Comments

Subtitler22 wrote on 12/17/2022, 7:56 AM

@bitman thank you so much for the Whisper instructions. I got it to work with a bit of struggle, but it was totally worth the efforts. I am transcribing using cmd and it seems to take a lot of resources with high CPU usage. I had to restart few times and start again. I will probably split long videos into small parts. Not using --model seems to work best. My PC has Intel i5 so perhaps I need to upgrade to i7 or i9

bitman wrote on 12/19/2022, 4:21 AM

@Subtitler22 I am glad you like it, if I am not mistaken, using the default (= not using the --model argument) is using the "small" model which is a step higher (=better, but slower) than model "base". The higher the model chosen, the slower the process becomes and the more VRAM is required, but the better the accuracy becomes.

In my v2 script (Whisper Speech To Text v2) I use the model "small" as the "balanced" selection, I guess I could just have omitted the --model in that case.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Subtitler22 wrote on 12/19/2022, 6:13 AM

@bitman I have tested few languages on the --model small and they seem to be accurate enough. with --model base they are just not acceptable. My problem now is that my computer running i5 is just not good enough to handle Whisper and Python. I need to restart and not use any other applications apart from cmd or Powershell and the Task Manager to check resources performance.
Are there any recommendations for which CPU and GPU to use in a new computer build to cope with --model large just in case I want to cover all possible options for Whisper?
How does Vegas Pro 20 360 Speech To Text (Non English languages) compares to Whisper as regards to accuracy? Perhaps it is cheaper to just pay for the subscription instead of building a new computer.

bitman wrote on 12/20/2022, 3:41 AM

@Subtitler22 It is definitely cheaper to subscribe to Vegas 365 than to build a new PC. On the other hand you will obviously enjoy working with Vegas and other applications more with better hardware.

On the subject of accuracy for foreign languages I can say with my own experience with Dutch (the Flemish Belgian variant of Dutch) that Whisper is far superior to what Vegas 365 offers (via Microsoft azure). In fact the automatic language detection often fails with an error popup on Vegas 365 after analysis. I have to indicate specifically Dutch (Belgian) for it to work on 365, and the result is worse than whispers default --model small.

The fact alone that you cannot tune the accuracy on Vegas 365 current speech to text application is often a deal breaker.

There may also be a privacy concern; if I am not mistaken, Vegas 365 uploads your speech to external servers (probably Microsoft) so they can be processed in the cloud with obvious benefits to put the heavy burden of transcription not on your own limited hardware, but on their powerful hardware. The downside to this, is that it will tax your network and is a privacy concern as your speech is on their servers.

Whisper on the other hand, will download the model in memory (once), then process everything locally, which obviously taxes your own system much more than Vegas 365; as you already have experienced.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Subtitler22 wrote on 12/20/2022, 5:20 AM

@bitman Thanks a lot for the detailed information. I realized yesterday that I was using my CPU (Intel i5) to do the transcription task and NOT the GPU (mine is nvidia GT 1030 2GB), which is supposed to be faster.
To make the GPU do the work, I had to use PyTorch and add --device CUDA in order for it to work with the GPU.
With 2GB, I got it to work really fast on the tiny and base models but with the model small (needs 4GB) it didn't work. In order to use --model small I had to add --device cpu.

In short, instead of building a new computer, I will probably buy a GPU with at least 4GB. I can always use the new GPU on a new PC build in the future.

bitman wrote on 12/20/2022, 7:32 AM

@Subtitler22 I must thank you for the CUDA idea, (to use the GPU to accelerate things up). Until now I always ran whisper via command line or the Wisper Vegas script without the extra argument "--device cuda". I vaguely recall I read something about acceleration, but I did not pursue it, I was happy that whisper "as is" worked in the first place. I am looking into it.

Last changed by bitman on 12/20/2022, 10:05 AM, changed a total of 2 times.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Subtitler22 wrote on 12/20/2022, 9:26 AM

@bitman credit goes to this YouTube video. I think I will buy a 12GB GPU to cover all options.

bitman wrote on 12/20/2022, 10:04 AM

@Subtitler22 Here is an update on the use of CUDA. Some observations:

  1. The extra argument --device CUDA is wrong using caps; you have to use --device cuda (not in capitals)
  2. You have to have PyTorch installed to make use of cuda
  3. If you have PyTorch installed, you do not need the argument --device cuda for whisper, as it will use PyTorch and cuda by default; this means I do not have change the current script (v2) to enjoy the GPU acceleration.
  4. If you have PyTorch installed and still want to use the CPU, you can use --device cpu

After having installed PyTorch, the whisper acceleration with cuda is impressive:

I ran a quick test on my 18s Dutch audio sample on my PC, Pytorch installed, with the "--model large" (="Best" in my script):

without GPU acceleration (with argument --device cpu): 109 seconds

default without or with argument (--device cuda): 18 seconds (6x faster)

Last changed by bitman on 12/20/2022, 12:12 PM, changed a total of 2 times.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

bitman wrote on 12/20/2022, 12:08 PM

@Former user @Vegas_Pro_Brasil

I redid and updated the "kingfisher" benchmark you can find a bit earlier in this post after installing PyTorch, the speed improvement is spectacular.

Last changed by bitman on 12/20/2022, 12:10 PM, changed a total of 2 times.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Subtitler22 wrote on 12/26/2022, 10:51 AM

@bitman @Vegas_Pro_Brasil @Former user

I ran into a problem using whisper when there was a long section with no speech and when there was speech again it just didn't transcribe it and kept repeating the last transcripted text.
From the help menus there is this option --no_speech_threshold which has a default value of 0.6
After expereminting with lower values down to 0.275 this seems to help get it back on track. It took longer to transcribe but it was a small price to pay to get it working again.

If you get into a similar situation just add --no_speech_threshold 0.275 or any other values that might work for you.

RogerS wrote on 2/6/2023, 7:56 PM

Would this new app be easier to integrate into Vegas than the current mix of files? It's called WhisperDesktop, has sourcecode available and here's a video of it in use:

I'm getting 30fps on a NVIDIA 1050 which is so fast.

Custom PC (2022) Intel i5-13600K with UHD 770 iGPU with latest driver, MSI z690 Tomahawk motherboard, 64GB Corsair DDR5 5200 ram, NVIDIA 2080 Super (8GB) with latest studio driver, 2TB Hynix P41 SSD and 2TB Samsung 980 Pro cache drive, Windows 11 Pro 64 bit

Dell XPS 15 laptop (2017) 32GB ram, NVIDIA 1050 (4GB) with latest studio driver, Intel i7-7700HQ with Intel 630 iGPU (latest available driver), dual internal SSD (1TB; 1TB), Windows 10 64 bit

VEGAS Pro 19.651
VEGAS Pro 20.411
VEGAS Pro 21.208
VEGAS Pro 22.93

Try the
VEGAS 4K "sample project" benchmark (works with VP 16+): https://forms.gle/ypyrrbUghEiaf2aC7
VEGAS Pro 20 "Ad" benchmark (works with VP 20+): https://forms.gle/eErJTR87K2bbJc4Q7

Dave-Wallin-Eddy wrote on 2/15/2023, 11:19 PM

Would this new app be easier to integrate into Vegas than the current mix of files? It's called WhisperDesktop, has sourcecode available and here's a video of it in use:

I'm getting 30fps on a NVIDIA 1050 which is so fast.

I had hopes on WhisperDesktop being great but I tried it on 3 different computers and it simply crashes out when loading the "models". On the other hand the StoryToolkitAI is working fine. If code could be added to it to detect Vegas and not just Resolve it would be great(er).

RogerS wrote on 2/15/2023, 11:57 PM

Interesting, perhaps load the models manually? I have it working on a GTX 1050 (mobile) and RTX 2080 (desktop). Const-Me has also now been integrated into the latest Subtitle Edit beta. https://github.com/SubtitleEdit/subtitleedit/releases

Custom PC (2022) Intel i5-13600K with UHD 770 iGPU with latest driver, MSI z690 Tomahawk motherboard, 64GB Corsair DDR5 5200 ram, NVIDIA 2080 Super (8GB) with latest studio driver, 2TB Hynix P41 SSD and 2TB Samsung 980 Pro cache drive, Windows 11 Pro 64 bit

Dell XPS 15 laptop (2017) 32GB ram, NVIDIA 1050 (4GB) with latest studio driver, Intel i7-7700HQ with Intel 630 iGPU (latest available driver), dual internal SSD (1TB; 1TB), Windows 10 64 bit

VEGAS Pro 19.651
VEGAS Pro 20.411
VEGAS Pro 21.208
VEGAS Pro 22.93

Try the
VEGAS 4K "sample project" benchmark (works with VP 16+): https://forms.gle/ypyrrbUghEiaf2aC7
VEGAS Pro 20 "Ad" benchmark (works with VP 20+): https://forms.gle/eErJTR87K2bbJc4Q7

Subtitler22 wrote on 2/22/2023, 1:52 PM

I have been busy trying to figure out how to improve on using Whisper AI. I thought if I can get only the vocals from a video file without any surrounding noises or music, then this might help to make a better transcriptions. I found a useful feature in iZotope RX 10 Standard called Music Rebalance that can isolate the vocals from the other sounds. I did some tests with audio files and it is definitely making an improvement.

Dave-Wallin-Eddy wrote on 2/23/2023, 9:23 PM

@bitman

 

FYI...your V2 of the script seems to load fine. Opens the GUI and lets me select what I want but it "ends" near instantly. Where does your script save the srt file to? Is is not in the same directory as the video file. Maybe something I am (not) doing?

And this is the error:

---------------------------------------------------------
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.IO.FileNotFoundException: Could not find file 'I:\IS2013.mp4.srt'.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
   at System.IO.StreamReader..ctor(String path, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 bufferSize, Boolean checkHost)
   at System.IO.StreamReader..ctor(String path, Encoding encoding)
   at EntryPoint.MakeLinkedList(Vegas myVegas, String myPathPlusFileName)
   at EntryPoint.FromVegas(Vegas vegas)
   --- End of inner exception stack trace ---
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at ScriptPortal.Vegas.ScriptHost.ScriptManager.Run(Assembly asm, String className, String methodName)
   at ScriptPortal.Vegas.ScriptHost.RunScript(Boolean fCompileOnly)

fr0sty wrote on 2/23/2023, 9:43 PM

There's something in the works soon that will enable all users of VEGAS, sub or perpetual, to access this and other VEGAS Hub features.

Systems:

Desktop

AMD Ryzen 7 1800x 8 core 16 thread at stock speed

64GB 3000mhz DDR4

Geforce RTX 3090

Windows 10

Laptop:

ASUS Zenbook Pro Duo 32GB (9980HK CPU, RTX 2060 GPU, dual 4K touch screens, main one OLED HDR)

bitman wrote on 2/24/2023, 5:48 AM

@Dave-Wallin-Eddy The srt file is located in the same directory location as the audio source; however, I was able to reproduce the same issue you have when placing source audio on another drive than the C-drive (and consequently the srt file fails to save). I see you have your video (or audio) source on the root of "I" - drive which is different than a folder on C: drive. I tested the script with audio sources on folders on the C-drive. Possibly the script or its installed stuff to make whisper work, has issues when the source is not on the C-drive...

Try to put your source audio on a folder in the C-drive. Also, make sure you repeat if it does not work from the first time; it may need time to download the models in memory first.

Last changed by bitman on 2/24/2023, 6:23 AM, changed a total of 1 times.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Dave-Wallin-Eddy wrote on 2/24/2023, 10:26 PM

Yes, after I posted this I did move the file to the C drive. But same exact error.occurred.

 

@Dave-Wallin-Eddy The srt file is located in the same directory location as the audio source; however, I was able to reproduce the same issue you have when placing source audio on another drive than the C-drive (and consequently the srt file fails to save). I see you have your video (or audio) source on the root of "I" - drive which is different than a folder on C: drive. I tested the script with audio sources on folders on the C-drive. Possibly the script or its installed stuff to make whisper work, has issues when the source is not on the C-drive...

Try to put your source audio on a folder in the C-drive. Also, make sure you repeat if it does not work from the first time; it may need time to download the models in memory first.

 

bitman wrote on 2/25/2023, 7:42 AM

@Dave-Wallin-Eddy Maybe a silly question on my part, but did you install all the rest that is needed for whisper to work? The Vegas whisper script is only the "hook" in Vegas to provide input for whisper (and if needed insert subtitles in Vegas).

You need to install a lot of extra stuff such as FFmpeg, Python, Git and whisper (via Git) itself. All is explained in the document (ref. link at the beginning of this post).

Even if you have installed all the executables, make sure you do not forget to adapt the environment variables path for them - so the system can find the installed executables so they can be called from the folder you have your audio).

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2

 

 

Former user wrote on 3/3/2023, 10:58 PM

@bitman would you know if whisper can separate people, even if currently not implemented but the data is there?

eg

Bob: What a nice day!

John: It sure is Bob!

Claire: What a day to be alive!

It is the only reason I use Premiere for captions and transcripts involving more than 1 person.

@RogerS I tried that version you're using. I found it to be very fast but gave the most errors, how have you found it and what model are you using now?

RogerS wrote on 3/3/2023, 11:23 PM

Hi @Former user I don't know about separating people- I now have 3 iterations of Whisper on my system in SubtitleEdit (Open AI standalone, Const-me and CPP). You could ask on any of their GitHub pages.

I found the Const-me one useful for English with the medium or large model and quick enough even on my laptop GPU. I recently did a 10-minute video I have been procrastinating on subtitles for years and it was close to perfect right in SubtitleEdit.

For Japanese it messes up and repeats lines too much. Others reported the same on GitHub so I'm hopeful there's an update.

At the moment all these implementations seem to be in flux so I'm hopeful there will be bugfixes forthcoming.

Last changed by RogerS on 3/4/2023, 3:33 AM, changed a total of 1 times.

Custom PC (2022) Intel i5-13600K with UHD 770 iGPU with latest driver, MSI z690 Tomahawk motherboard, 64GB Corsair DDR5 5200 ram, NVIDIA 2080 Super (8GB) with latest studio driver, 2TB Hynix P41 SSD and 2TB Samsung 980 Pro cache drive, Windows 11 Pro 64 bit

Dell XPS 15 laptop (2017) 32GB ram, NVIDIA 1050 (4GB) with latest studio driver, Intel i7-7700HQ with Intel 630 iGPU (latest available driver), dual internal SSD (1TB; 1TB), Windows 10 64 bit

VEGAS Pro 19.651
VEGAS Pro 20.411
VEGAS Pro 21.208
VEGAS Pro 22.93

Try the
VEGAS 4K "sample project" benchmark (works with VP 16+): https://forms.gle/ypyrrbUghEiaf2aC7
VEGAS Pro 20 "Ad" benchmark (works with VP 20+): https://forms.gle/eErJTR87K2bbJc4Q7

Former user wrote on 3/3/2023, 11:40 PM

For Japanese it messes up and repeats lines too much. Others reported the same on GitHub so I'm hopeful there's an update.

At the moment all these implementations seem to be in flux so I'm hopeful there will be bugfixes forthcoming.

@RogerS I tried it with translation a number of times, I get the same. It stops translating and repeats the same line. Glad it's a known problem that will be addressed soon. That can also occur with the Resolve version, just not as frequently.

RogerS wrote on 3/4/2023, 3:33 AM

I don't know if it will be addressed soon, Const.me doesn't seem to be in active development and this bug was likely inherited from CPP, which will hopefully address it. I downloaded the third option though haven't really tested it.

Subtitle Edit is ready to go for these implementations so whenever we get a fixed or functional new one it can be added.

Custom PC (2022) Intel i5-13600K with UHD 770 iGPU with latest driver, MSI z690 Tomahawk motherboard, 64GB Corsair DDR5 5200 ram, NVIDIA 2080 Super (8GB) with latest studio driver, 2TB Hynix P41 SSD and 2TB Samsung 980 Pro cache drive, Windows 11 Pro 64 bit

Dell XPS 15 laptop (2017) 32GB ram, NVIDIA 1050 (4GB) with latest studio driver, Intel i7-7700HQ with Intel 630 iGPU (latest available driver), dual internal SSD (1TB; 1TB), Windows 10 64 bit

VEGAS Pro 19.651
VEGAS Pro 20.411
VEGAS Pro 21.208
VEGAS Pro 22.93

Try the
VEGAS 4K "sample project" benchmark (works with VP 16+): https://forms.gle/ypyrrbUghEiaf2aC7
VEGAS Pro 20 "Ad" benchmark (works with VP 20+): https://forms.gle/eErJTR87K2bbJc4Q7

bitman wrote on 3/4/2023, 5:13 AM

@bitman would you know if whisper can separate people, even if currently not implemented but the data is there?

@Former user Not that I recall, I do know that you can separate voice with different people in izotope RX10 advanced. Maybe run it by RX10 first, separate out the person you need, and use whisper next...

section at 8:23 (text navigation and multi speaker detection):

Last changed by bitman on 3/4/2023, 5:18 AM, changed a total of 3 times.

APPS: VIDEO: VP 365 (22 build 93, 21 - build 315), VP 365 20, VP 19 post (latest build -651), (uninstalled VP 12,13,14,15,16 Suite,17, VP18 post), Vegasaur, a lot of NEWBLUE plugins, Mercalli 6.0, Respeedr, Vasco Da Gamma 16 HDpro XXL, Boris Continuum 2024, Davinci Resolve Studio 18, SOUND: RX 10 advanced Audio Editor, Sound Forge Pro 17, Spectral Layers Pro 10, Audacity, FOTO: Zoner, DXO, Luminar, Topaz...

  • OS: Windows 11 Pro 64, version 23H2
  • CPU: i9-13900K (upgraded my former CPU i9-12900K), Air Cooler: Noctua NH-D15s
  • RAM: DDR5 Corsair 64GB (5600-40 Vengeance)
  • Graphics card: ASUS GeForce RTX 3090 TUF OC GAMING (24GB) 
  • Monitor: LG 38 inch ultra-wide (21x9) - Resolution: 3840x1600
  • C-drive: Corsair MP600 PRO XT NVMe SSD 4TB (PCIe Gen. 4)
  • Video drives: Samsung NVMe SSD 2TB (980 pro and 970 EVO plus) each 2TB
  • Mass Data storage & Backup: WD gold 6TB + WD Yellow 4TB
  • MOBO: Gigabyte Z690 AORUS MASTER
  • PSU: Corsair HX1500i, Case: Fractal Design Define 7 (PCGH edition)
  • Misc.: Logitech G915, Evoluent Vertical Mouse, shuttlePROv2