PDF to EPUB conversion tools: 30% average

Please note that this report dates from a few years ago, so the results might be different at this moment. We keep this page alive to create awareness for possible issues.

 

Discrepancies also found among EPUB validation tools  

Publishers wanting to convert their library of books to ebooks may find it problematic, after a test run by the VIGC found tools for converting PDFs to EPUB files scored an average of 30% – with the lowest-performing tool scoring just 10%. And even more significant: the four EPUB validation tools we used produced different results.  The VIGC will continue its search for good tools, to support publishers and printers.

There’s no doubt that the popularity of ebooks is soaring. This trend leaves publishers facing a big challenge – they have to convert their whole back catalogue to make them available as ebooks. An obvious approach is to take the print-ready PDF files and convert them to EPUB files, the standard file format for ebooks. For printers, this presents a potential new service they can offer their customers. On the internet you can find a lot of tools for converting PDFs to EPUB files – unfortunately, however, it’s not that straightforward.

Conversion tools struggle to make the grade

The VIGC assessed 13 tools. The test began with a perfect PDF/X-4 file, which was converted via the different tools to an EPUB file. The next step was to validate the EPUB file with four different validation tools, to check if they conformed to the EPUB specifications. Then, all files were checked visually with five different EPUB viewers. In some cases there was a final conversion eg from EPUB to Amazon Kindle or to Apple iBooks.

We used a very challenging print-ready test file that contained text and images. And from a typographic point of view, we added all kinds of tricks. Eventually we ended up with a book covering nearly 30 pages. In total, we tested 65 different elements – from a simple italic, to OpenType functions like ligatures, through mathematical formulas.

Ligatures prove a challenge

An important issue highlighted by ou test is the difficulty in converting ligatures. A ligature is a 'combined glyph' – two or sometimes three letters that have been joined together for aesthetic reasons. A good example is the combination of the letters ‘f’ and ‘I’, e.g. in the word 'profiles'. In some EPUB files we found 'profles' instead of 'profiles' – the  ‘i' had been dropped. Some conversion tools didn't recognize the ligature and made an incorrect conversion. The automatic use of ligatures is the default in Adobe InDesign – ie InDesign will automatically replace certain combinations with the ligature – so you can imagine how often this happens in a document, and how often this will go wrong with some tools. You need to scan the converted files manually to check for missing letters, which simply isn’t feasible.

Validation misses the mark

Validation tools pose an even bigger problem: validation was a standard practice in our test, a simple way to partially check the quality of the conversion. We used four different validation tools and got different results. Some tools could validate one EPUB file, while another tool couldn’t. And the differences were inconsistent too – it wasn’t a case of one tool always being different from the other three. Based on our results, publishers face a big challenge in ensuring EPUB files – and subsequrently the ebooks themselves – have been converted accurately.


Strategic partners of VIGC:


Oce