Word and PDF to ePub conversion with the click of a button—is it really that simple?
It’s everywhere: this notion that your manuscript can be automatically converted into an EPUB- or Kindle-ready file just by using a single piece of software. Can this really be true? Can one of these programs, many of which are downloadable for free, convert your Word or PDF file into something that you can sell on Amazon, Barnes & Noble, or Apple’s iBookstore?
Suppose you have either a Word or PDF version of your book, and you want to publish on the Amazon Kindle Store, the Barnes and Noble NOOK Book Store, the Apple iBookstore, the Google eBookstore, and any other online retail site that might sell eBooks. What you need is a clean file that meets all of the minimum standards of the required formats, but also looks like a professional book!
As to the first concern, Amazon accepts three file types for the Kindle platform: .prc, .mobi, and .azw. Amazon has specific publishing guidelines for the acceptable formatting of these files (see http://kindlegen.s3.amazonaws.com/AmazonKindlePublishingGuidelines.pdf). There is only one file format for EPUB retailers, which is, not surprisingly, .epub! The minimum standards for .epub files are promulgated by the International Digital Publishing Forum (IDPF). A handy validation tool for .epub files can be found here: http://threepress.org/document/epub-validate.
Let’s not forget about the second problem, though, which is creating a book that looks professional and takes advantage of the functionality of today’s eReaders. So can a single piece of software really make your eBook come out looking like your print book? After all, what’s the use of a one-size-fits-all “meatgrinder” conversion program, if the end result is something less than professional? And, if the output does need further tweaking, can the program really be called “automatic?”
We decided it was time to put six eBook conversion methods to the test. Please be aware, none of these methods has been touted by its own proprietors as being “automatic,” and we hope these companies save their angry letters for Congress and/or the producers of The Hangover Part II. But should you do some quick searches online for ebook or ePub converters, you will find a number of resources and articles, which suggest to authors that such tools will serve as a quick, easy, and often free way to make an ebook. We are examining these products in order to provide a much-needed warning to those authors about to take that first precarious step into the jungle of do-it-yourself electronic publishing: those who travel here should be wary when they hear the seductive whisperings of “shortcuts.”
*For our testing, we used Adobe Digital Editions (ADE) to review EPUB files on our PC-based system. More information on ADE, including a free download, can be found at http://www.adobe.com/products/digitaleditions/. We also reviewed the EPUB output on our Barnes & Noble Color Nook device. To test Kindle files, we used a Kindle device, as well as Kindle Previewer 1.61. This program can be downloaded for free here.
Now, on to the testing!
This program is free and relatively easy to acquire, once the user realizes that it is not actually the full version ($1,499!) of Aspire.Words for .NET.
It is worth noting that, contrary to some reports out there, this is NOT a plug-in for Microsoft Word; it is a standalone program. Also, Aspose.Words EXPRESS will NOT convert your books from PDF files. Finally, it will also not render a Kindle-ready file. This software is EPUB-only.
At first glance, the interface is simple. Specify the name and location on your computer of a .doc or .docx file, click convert, and a message appears that an EPUB file was created. When we opened the resulting EPUB file in Adobe Digital Editions, it was immediately apparent that some very important elements were missing. The title and author name were not filled in, and neither were the chapter names. There was no table of contents, either device-navigated or inline. We then ran an IDPF validation test on the file. It didn’t pass because all of the text was crammed into a single .HTML file, which turned out to be bigger than IDPF’s maximum file size.
For Apose.Words EXPRESS, the magic word is pre-formatting. Pre-formatting is what you would need to do to your Microsoft Word file before it will be ready for conversion. And indeed, the Aspose.Words main screen has a menu item called Settings, which gives the user more control over the output. Reminiscent of Adobe InDesign’s EPUB export options, you can tell Aspose.Words to break chapters according to where the heading levels appear in your Word document. For instance, you could tell it to break before each Heading 1 or Heading 2, or both. This method would assume, of course, that the Word file had been pre-formatted so that the chapter headings were already marked in this way. As another pre-formatting measure, the book’s metadata (e.g., title, author, publisher) would need to be coded into the Word file. In Word 2010, this is accomplished with what are called the “backstage” features of the File Menu. All told, this can leave one needing to study-up on Microsoft Word in order to take advantage of the conversion program.
Lest we appear to be dwelling only on the negative, there were a few things this converter did well straight out of the gate. Our images looked really good. (The default resolution is 92 dpi.) In all but one case, the software preserved the position, relative size, and clarity of the images. It even floated the images, allowing text to flow to the left or right! Once we pre-formatted the Table of Contents page as a Heading, it rendered in the EPUB file with hyperlinks (it kept the table’s page numbers, however, which are irrelevant to an ebook).
Book Glutton is an organization that offers several types of services, from web design to EPUB and all points in between. Like Feedbooks (see page 5), these guys will help you distribute your ebook online. Unlike Feedbooks, however, you can actually sell your book through Book Glutton. Their HTML to EPUB Converter runs online through their website and costs $5 per conversion. As the name suggests, your book must be in HMTL form before you can begin. Sounds simple enough, right? Ah, but then you read further and find that your book’s chapters must be contained in separate HTML files—formatted according to XHTML 1.1 standards—and referenced by an index that is used by the converter to build a table of contents. Any images or external style data (i.e., CSS) must be stored in separate folders. Lastly, you need to package these files into a .zip file, which can then be uploaded to the conversion program. Whew! That’s a lot of work!
The upshot of this process is that you’re doing most of the formatting yourself. You are assembling the elements of an EPUB file, and paying Book Glutton five bucks to polish it up and make sure it will pass IDPF validation. This is not to disparage Book Glutton’s contribution; it is valuable to have professionals on your side, and $5 is inexpensive for this kind of service (consider it one less Venti Cinnamon Dolce Frappaccino at Starbucks). However, this conversion method cannot even begin to approach any known definition of automatic.
We tried saving our Word files as HTML files but kept getting an error message from Book Glutton. We came very close to giving up, but then remembered that it’s possible to import Word files straight into Adobe Dreamweaver. For those not familiar, this program is a powerful website design tool. An EPUB file is really like a mini-website in the way the files are organized, and sure enough, Dreamweaver knew exactly how to translate our Word files into proper HTML, the primary language of the web. (Unfortunately, if you don’t already have Dreamweaver, you would have to find another way or shell out $385 to buy the program.)
In the end, however, we were unable to reach any conclusions about Book Glutton. Unfortunately, there appeared to be a glitch in their website interface when it came time to order the final EPUB output file. It was hard to say for sure whether it was the fault of our file (which passed the “preflight” validation) or the website itself. We emailed the company for help, but as of this article’s release, we have yet to hear back from Book Glutton.
Of all the methods reviewed in this article, Calibre is by far the most well-established. Calibre has been the big kid on the DIY block for a long time, with near-constant updates to its software. (Mr. Goyal has literally issued three minor fixes to the program since yesterday, when drafting of this article began!) Calibre is a standalone piece of software and is free to download. One caveat must be issued early, however: Calibre does NOT convert from MS Word files.
So we downloaded Calibre and uploaded our PDF file to its “library” interface. Easy enough. The conversion took a long time, presumably because there were a large number of images. Unfortunately, the book didn’t pass validation; it seems Calibre had not added any alt= attributes to the images’ HTML tags. This is a required element for every image, according to IDPF standards. However, all we had to do was open the file in Sigil, an open-source EPUB editing program, and resave the file. After that simple step, poof! The converted book passed its validation with flying colors.
On the plus side, Calibre rendered all the images clearly and proportionately. Unlike Aspose.Words EXPRESS, however, Calibre did not preserve the text-wrapping around images. In addition, our drop caps were not preserved—the first letter of each chapter was left out entirely. Also, strangely enough, certain centered elements (e.g., chapter heads) were rendered as left-aligned in the output.
So instead of pre-formatting, which most of the other programs require, Calibre may sometimes require cleanup after the fact—a bit of post-formatting, if you will. (Yet another opportunity to mention Sigil. Check out this free program for cleaning up your already-converted EPUB file.)
In Calibre’s defense, its conversion platform does offer so many options and tweaks. It is quite possible that some of the drawbacks mentioned here could be cured by adjusting some of the myriad conversion options. However, as the thrust of this article is to examine these programs as “automatic conversion” tools, such trial-and-error experiments are beyond our present scope. However, for those who have the time and patience to spare, the workflow of PDF-to-Calibre-to-Sigil can be a powerful and useful approach.
Based in France, Feedbooks bills itself as a “cloud publishing and distribution service.” In this context, cloud simply means “online clearinghouse.” Feedbooks stores thousands of eBooks on its servers and distributes them through its own websites and those of its affiliates. In short, Feedbooks’ goal is to be an all-in-one solution, not just for converting your print book to EPUB (or Kindle), but also for publishing and distributing your book. But here’s the catch: all of the self-published books offered on the site are FREE. You can’t get paid for selling your book on Feedbooks, period.
Okay, but what about the conversion? First, you need to register with the site. This part is easy: you are prompted for an email address, user name, and password. Select “Publish” from the dropdown menu, and you are taken to an online editor. Here, you must upload your book piece-by-piece into a series of text editor boxes. That’s right, folks—they won’t accept your polished PDF or Word file as-is. You have to re-build your book using copy and paste. You tell the interface where each chapter and section break occurs, and then populate those sections with plain text. The editor box offers some word processor-like features, like font styles, drop caps, and bullet lists. It will also let you edit the HTML source of your text.
Unfortunately, we did not complete our test conversion with Feedbooks. Either the website’s editor program was being too fussy (lots of load-time waiting), or all those cups of coffee were starting to wear off, and we just got too cranky to continue. Since the company does not release the final converted file, and the process is leading you to distribute your book for free, it did not seem worth continuing the lengthy process of transferring and formatting the book data using their program. We will simply say, if you have the time to spare, and you are eager to distribute a free book, go to Feedbooks and try it out. At the very least, it’s worth browsing their vast catalogue of downloadable eBooks.
Like Feedbooks, eBookBurn offers a conversion interface that is embedded in their website. Unlike Feedbooks, the conversion is not free ($19 per conversion), but you get to keep your files—one EPUB file and one Kindle-ready file. Through reading the FAQ page, we learned that there are no do-overs in this process; you pay $19 per conversion, no matter how your book turned out.
So with our new mantra, “Better get it right the first time” repeating in our heads, we began uploading our book. Like Feedbooks, each chapter has to be loaded in separately. We were relieved to find, however, that with eBookBurn we could upload Word files directly from our computer if we chose to, bypassing that whole cut-and-paste scenario. (eBookBurn does NOT allow PDF files in its uploader. Presumably, text from a PDF file would have to be copied and pasted into the text editor pane.) We broke up our Word file into separate chapters and saved them under unique and creative titles, like “Chapter 1,” “Chapter 2,” and so on. The instructions recommended that we remove our images and upload them separately, using the “Insert Images” interface. This process involves posting your images to a website—your website—each under its own unique URL, and then providing eBookBurn with the web addresses where each image resides. This sounded especially taxing for a Monday, so we crossed our fingers and left the images where they lay: right there in our Word files. The remaining steps of the process breezed by quickly. We uploaded each chapter, clicked the “Convert” button, and it was time to pay our dues: nineteen bucks, our whole Chipotle budget for the week! Looks like we’ll be eating lunch out of the vending machine again…
At first glance, the output looked very clean. The images all appeared where they were supposed to, and at the proper proportions. Ha! Looks like we didn’t need that whole “Insert Images” thing after all….Or did we? (More on that later.) The fonts were all styled correctly, and were rendered at the proper sizes. The Device Table of Contents (the TOC that does not appear on the book’s main viewing panel but is accessed through the reading device’s control menu) came out looking good in both the EPUB and Kindle versions. The converter even created an in-book Table of Contents in the Kindle version—a required item, according to Amazon’s specifications. We are not sure why the converter placed the in-book TOC at the end of the book, but come to think of it, is there really any better reason for it to be at the beginning instead?
We decided to take a closer look at the output, using what the Supreme Court might call heightened scrutiny. First, the images: we noticed that the conversion did not retain the text-wrapping around the images as it appeared in our Word files (only Aspose.Words EXPRESS seems to do that). And when we ran our EPUB file through the validation test, it failed because some of the images had the same names in the HTML code (according to IDPF specs, items referenced by unique id tags cannot be repeated). This may be a rare problem; how many books have repeated images, after all? (One would imagine, though, that the software could be coded to assign non-repeating id tags, so that the multiple instances of the same image could have their own unique tags.)
eBookBurn also converted em-dashes into hyphens! While this may seem like a small thing to the untrained eye, my boss assures me that in the publishing business this is considered a typographical sin of biblical proportion. Also, paragraphs were rendered in the EPUB version with an extra line break between them.
We decided to peer behind the curtain, so to speak, and look at the EPUB file’s code. Much to our surprise, right there in the HTML style tag were the words, “Created by AbiWord, a free, Open Source word processor.” In other words, eBookBurn has punted part of the work to this free program. This is not unheard of, however (see Jutoh on the following page). We ran a small sample conversion through our own free copy of AbiWord, and the HTML output did not contain the formatting errors mentioned above; the mystery of the em-dashes, paragraph spacing, and text-wrapping problems remains unsolved.
We are not sure whether some clever pre-formatting would have prevented these formatting errors. With eBookBurn’s pay-per-conversion system, it’s just too expensive to experiment with different strategies. Post-formatting would also seem to be a mixed bag: sure, the paragraph spacing could be fixed through CSS styling. However, the hyphen problem would have to be fixed line-by-line in the HTML code, because a quick find-and-replace maneuver would also swap out the hyphens you would otherwise want to keep for em-dashes.
Luckily, Jutoh has a free trial version available. Otherwise, we were going to have to raid the break-room couch cushions for $39 worth of change…Unlike Feedbooks’ “cloud-based” application, Jutoh is a standalone program that downloads to, and runs from, your computer. Once we had our trial version installed, it was time to poke around. The interface is well organized, with plenty of features to adjust the output of the conversion.
Right away, we noticed that Jutoh does not accept Word or PDF files as input sources. Jutoh is apparently optimized for HTML input. This was a problem for us, given the parameters of our test. However, Word does give you the option to save your files as .rtf (Rich Text Format) or .odt (Open Document Text), and Jutoh does accept these file types. We tried converting our Word file both ways and got very different results. First of all, when importing an .rtf file, Jutoh flashes the message that it is using Calibre to convert the file, a nifty trick indeed! Our image-laden .rtf file did not convert at all. We received an error message and decided to try a simpler file that had no images. This file converted successfully, but with limitations that paralleled the other conversion programs: no drop caps, extra line breaks between paragraphs, no in-line table of contents, and each of the separate chapters that it created were titled “Chap 1.”
Next, we tried uploading an .odt file into Jutoh. Once again, there were errors aplenty, and the file would not upload. The errors had to do with svg (Selectable Vector Graphics) tags not rendering.
For its Kindle output, Jutoh exports its EPUB file to Kindlegen, Amazon’s free conversion program! (The conversion-of-a-conversion approach to design.)
Time for a third strategy: convert our Word file to HTML using Word’s “save as” feature. Then, import the HTML file into Jutoh. The third time was the charm. Once the file had been imported, we used the program to pre-format our book. Mostly this involved splitting the one big HTML file into separate chapters. This was easy to do with Jutoh’s interface, a What You See Is What You Get (WYSIWG) editor that looks and feels a bit like a word processor.
We did have one complaint about Jutoh: it doesn’t seem to give you access to the underlying HTML code, as some other WYSIWG editors do. This became a problem for us when we tried to adjust the typeface of our chapter headings. Somehow, Microsoft Word must have applied the same style tags to the headings as to the body text. So we started by selecting the chapter head text (e.g., “Chapter One”), and hit the bold button at the top of the editor screen. No problem there. However, when we tried to increase the size of the text, Jutoh seemed to want to enlarge all of the surrounding body text, too. We couldn’t find a way around this. If we could get into the HTML code and adjust the paragraph tags for the headings, it seems likely we could adjust the chapter headings independently.
All in all, Jutoh seems like a fairly good option. Some control over the underlying HTML code would be nice, for sure. However, this WYSIWYG-style program is still a much better way to go than an embedded, online program, which, although somewhat more “automatic,” gives you much less control of your output.
Reviewing this program is like writing an about-the-author blurb for an old friend. It’s hard to be objective, because we have been using this Mobipocket Creator to format Kindle books from the beginning. Perhaps it’s best to start with the negative. Mobipocket Creator is for creating eBooks in the mobipocket format (no rocket science there). Amazon adopted this format for its Kindle device. None of the other major players followed suit. Therefore, this program is for Kindle books only (and some smart phones and PDAs). It will not generate an EPUB file, and therefore cannot be used to format a book for the Barnes & Noble Nook or Apple’s iBookstore.
Time to put this old friend to the test. First, we ran a conversion using a PDF file as the source. We imported the file into the converter, added some metadata (title, author name, etc.), chose a cover image, and hit the “Build” button. Voila! It seemed so easy, we couldn’t remember why we don’t use the PDF method more often; the output looked very clean and true-to-form. The images were rendered well, and—with a few exceptions—they flowed well with the text. (As a side note, Kindle does not yet support text-wrapping with images; probably the images that were wrapped in the original PDF were the sources of the resultant “glitchiness” in the output file.)
And then it hit us: what happened to the paragraph indents? Why are none of the headings centered? The answer: PDFs do not always play nice. It is hard to coax good text formatting out of PDF files. However, Mobipocket Creator does give you the HTML file as output; conceivably, one could always load that HTML file into a good text editor (e.g., Notepad++) and manually re-style all of the text formatting.
We decided to take a look at the HTML file ourselves to try and track down some of the glitches. First off, Mobipocket Creator formatted many of the PDF file’s paragraphs using the line-break tag (<br/>) instead of the paragraph tag (<p>). This would account for the missing indents. Also, there were no page breaks in the HTML file.
We decided to test drive Mobipocket Creator using a Word file. Maybe the output would be a little better. Indeed, the output of the Word file conversion was worlds apart from the PDF version. It had good-looking chapter headings, properly indented paragraphs, well-placed images…the whole package seemed to be there.
When we popped the hood to look at the HTML file, we found a bit of a mess. When converting an ordinary file to HTML, Microsoft Word tends to dump in a ton of unnecessary code. And this is exactly what seems to have happened here: Mobipocket took the HTML output straight from Word’s own internal conversion, extraneous tags and all. For most people trying to convert a book on their own, this will not be a big deal. Who really cares what the code looks like, as long as the book looks good, right? This is mostly true.
But messy code is harder to edit, and with either method, Word or PDF, you have to do some cleanup work yourself if your file is going to meet Amazon’s minimum standards. Neither method seems to generate a Device Table of Contents, which is required by Amazon specifications. Amazon also insists that your in-book TOC not have page numbers. These vestigial remnants of your print version will have to be manually removed from the HTML file.
If you want to take your print book to the eBook marketplace, at this time, there is probably no automatic solution. For a very simple, text-only book, you may be able to come close using one of these products—or even some true upload-and-go converters, such as the one directly accessible when you upload your file to Amazon’s Kindle Direct Publishing interface. However, if your book has images, tables, or a multi-level Table of Contents, you are much better served by using a two-stage process: pre-formatting followed by post-conversion cleanup. While a couple of the methods mentioned in this examination may get you partway there, we have found that the most reliable solution for EPUB is to use two stages: Adobe InDesign software for pre-formatting and conversion to EPUB, followed by the Sigil program for later adjustments. Sigil is a wonderful tool (and free!), which allows you to edit an EPUB file without “unpacking” its contents first. InDesign, however, is most certainly not free. The most recent version (InDesign CS5.5) retails for $649. Those adventurous enough to try this method—and go it alone though the EPUB jungle—should take along a trustworthy travel guide: a copy of Elizabeth Castro’s EPUB Straight to the Point, Peachpit Press (2011).
Kindle files are a bit simpler, and therefore easier to create. Mobipocket Creator does a fine job of this, especially when converting straight from a Word file. The final output will invariably need some tweaking, however; you will need to crack open the HTML file that Mobipocket Creator generates in the conversion. There are plenty of online resources available to help you through this task, although some knowledge of the HTML language is necessary.
If wading through all of this technical jargon has gotten you feeling overwhelmed, don’t worry; you’re not alone! For many people, the best and most sane option is to hire a professional designer. There’s just no way around it—books with complex content are difficult to format as eBooks. In addition, we feel it’s important to note that there can be a big difference between ebook conversion and ebook design. (That’s a matter for a separate article.) You may be better off leaving this work to the pros, who can help you take your book to market looking as close to your original print version as possible.