Cleaning up a Word document to convert into HTML or an eBook requires several tasks to be accomplished.

 

  1. Firstly, you must get rid of every little piece of Word-specific formatting. If it’s Word-specific, no other program but Word understands it.
  2. You also need to pull out any excessive, micro-managing formatting code. When I read an eBook, I like to change the font, the font size, and the page margins. I might not be able to do that if the code pins me to one format. When my Grandma looks at a web page, she needs to max the font size up to where everything is basically a header. She can’t do that if the page’s html specifies sizes. So, basically, any micro-managing is excessive and you shouldn’t do it.
  3. Another thing you need to do is to make sure the html doesn’t have extra tags, unclosed tags, closing tags with no matching opening tags, and other ugly stuff.
  4. Word likes to use “smart quotes.” Get rid of these. Not all programs like the curly quotes.
  5. Word likes to use a curly apostrophe. Get rid of those. Use the straight apostrophes.

You can “clean” up your Word document many different ways, but only a few cleaning methods are useful if you want to convert the document to a web page, eBook, or basically anything other than a Word document or a pdf.

How not to clean up your word document: Save the document as “Web Page, Filtered (html)”

save-as-web-filtered

Word has a save option called Web Page, Filtered. This saving option removes a bunch of Word-specific document formatting that only Word understands.

You can use this save feature only in these circumstances.

  • You don’t care what your resulting product looks like.
  • The resultant product is a “quick and dirty” assignment, and everyone is okay with that.

Why can’t you use this for anything more than emergencies?

The Web Page, Filtered (html) saving settings ONLY remove Word-specific formatting form the file. This is only No. 1 on the list of five things that need to be accomplished before the code is cleaned up and ready.

Check out all the stuff that’s left in a document after this saving option.

over-controlling-code

How not to clean up your word document: Visually go over the document and make sure it looks the same.

Visually inspecting the document isn’t an option. Why? Look at this graphic. How many inconsistencies can you catch in it?

formatting-inconsistencies

How many did you find? Did you get them all? How can you be sure? (Hover over this link for the answer.)

Correct ways to clean up a Word document

To correctly and completely clean up a Word docuemnt, you can do one of two things. 1. You can save it out as a text file or copy the entirety of it into Notepad. 2. The other option is to highlight all of the text and select Clear All in the styles. Go to the Home tab. Click on the little pull out box on the bottom right of the styles. Select all your text and then hit Clear All.

clear-styles

Once you’ve cleaned up your file

html-cleanup-checklist-thumbOnce you’ve cleaned up your file, try to follow these rules. Download this checklist for cleaning up Word files.

DO NOT DO
  • Do not put the file back in Word or a word processor. You might accidentally add Word stuff.
  • Do not change the font, either the actual font, the size, or the color.
  • Do not use tabs.
  • Do not use curly apostrophes or curly quotes.
  • Do edit the file in Notepad, Dream Weaver, or another text editor.
  • Use the style heading tags for Heading 1, Heading 2, …
  • Go ahead and hit Control I for italics and Control U for underlines and Control B for bolding a couple of words in a paragraph.
  • Use ordered and unordered lists. (Unordered lists are bullets. Ordered lists have numbers or letters in them.)
  • Check your HTML and make sure it looks very boring. Opening paragraph tags should just be <p>. If the code has modifying information in the paragraph tag, it’s over controlling. The same is true with the heading tags.
Share