Apparently, 16 is the default number for 64 bit. I reduced my thread count for 64 bit work from 16 to match the 4 cores that my processor has.
Two minutes fifty-five seconds on the timeline rendered to the MainConcept Internet HD 720p codec in 88 second with 16 threads and 87 seconds with 4 threads. The difference is within experimental error. I was not having any failures in either case.
And dont forget to try even lower values of threads - if you use GPU acceleration for rendering, 2-3 threads may make things even faster. Lots of testing has been done in this in the past and experimentation is the key.