The NaturallySpeaking Vocabulary Builder only accepts a few word processing file formats. Many persons have past documents in different formats that would be valuable input to the Vocabulary Builder. If you only have a few such documents, most word processors have suitable means for converting the documents to MS-DOS text format.
It may be desirable to combine many small documents into a single document for building a vocabulary. One way to do this is at a MS-DOS prompt, using a command such as:
copy *.txt largedoc.tex
This copies all your ".txt" files at once into the file largedoc.tex. Then rename largedoc.tex to largedoc.txt so Vocabulary Builder can process it. If you really have lots of text, do a "copy a*.txt alarge.tex", then "copy b*.txt blarge.tex", ... to break up the text into smaller chunks.
We've dealt with many different word processing formats to get people started with NaturallySpeaking. We have found Conversions Plus by DataViz to be an outstanding tool for this task, as it converts entire directories of documents at a time from many different formats to any other format. It does about 10 short reports or so per second. Cost of the program is about $100. At $100, for an individual needing to convert files for their own one-time use, it may be a questionable expense. For approximately $1/MB (less for higher volumes) you can e-mail us ZIP'd files and we can return documents converted to text. E-mail us for details.
Many document sets have standard headers and footers. We have available programs which will strip this information (e.g. delete first 8 lines from each file) so that it is not analyzed by the Vocabulary Builder. This is particularly appropriate when the documents are relatively short, where this header/footer information constitutes over 5% of the text. The same techniques can be helpful in deleting patient/client names from these text files.
By creating larger files containing extensive samples of your vocabulary, you can make edits to better reflect your own vocabulary needs. For instance, “2004” may be underrepresented in your past vocabulary. So we might replace all instances of “2002” with “2004” to better reflect your current vocabulary.