Place the text event on a track above the video and slide it to approximate the right location. You can zoom in and use the scrubber tool to find the appropriate starting point. Click there. You can now slide the text event to that starting position and it will snap to the cursor. Drag the end of the text event left to shorten or right to lengthen as necessary.
Sounds like an interesting task. If it were mine, I think I would use markers to define a duration first (i.e. listen to the tune, and enclosed a desired phrase on the timeline with markers - "my Bonnie lies over the ocean" for example). Insert a video track to the timeline.
Then, with video preview enabled, right click the blank video track at a point that is smack dab in the middle of your markers, select the appropriate text generator plugin (I forget exactly what it's called), type in "my Bonn . . .", and close the plugin, position the newly created text event in the middle of your markers if it's not there already. Right click on that text event and select edit generated media, then, use the nifty position tool to fine tune the position of the text as you want it to appear on the screen (you can watch the position change on the video preview if you're timeline cursor is positioned over your text event), making certain your placement and text size allow your phrase to fit safely within the boundaries of the viewing screen.
Then, just to be safe, close the text generator, and play that section of the timeline to make certain the position is where you want it so that the text appears/disappears at points appropriate to someone dependent on the text to sing along (you'll want the text to come up a bit ahead of the tune, I'd guess).
To add a professional touch, you'll probably want to add fades at the beginning and end of each text event.
Unless you are subtitling Wagner's Ring, or creating a bouncing ball, I bet you'll be an expert at this by the third phrase.
As usual, there are plenty of ways to get the job done.
I might also try copying the text event, changing the color of the characters, and inserting a wipe transition between them. The wipe starts when the first word is sung, and ends when the last word is sung.