Here’s a short guide on creating decent ebooks from scans using Adobe Acrobat. This will not be of interest to 98% of you, but I want to record it somewhere for those of you who may do this in the future. It is written by Iliyan Georgiev, who made the recent PoDIS ebook. Comments are welcome, as usual.
The one piece of software you’ll need that can’t be downloaded for free is Adobe Acrobat, though even this application has a 30-day free trial.
1. Scan the pages of the book using a scanner (a digital camera is a good alternative).
2. Crop the scanned images (and split the pages, if you scanned two pages at once). It’s better for an ebook to have smaller page margins. Also, cropping removes black areas and other artifacts resulting from scanning. An excellent (JPEG-only) batch cropping tool for Windows is JPEGCrops. It has some disadvantages, however, so in practice it’s best to use JPEGCrops to estimate approximate cropping parameters (width, height, x-offset, y-offset) and XnView‘s batch processing mode for the actual cropping. Both applications are free and have portable versions.
3. Assemble all images into a PDF file. Adobe Acrobat has an option to combine multiple files into a single PDF. Use the highest quality settings for the creation.
4. (OPTIONAL) Rearrange/merge/delete pages. Acrobat has excellent tools to achieve these. This can be useful for books that are published in two volumes or for extending the book with additional information, such as errata listings, images, high quality cover pages, etc.
5. Manage blank pages. It might be tempting to delete blank pages inside the book. Such pages are always intentionally left blank by the publishers, as they are important for the printing order. This is particularly important for the first few pages, as well as for the chapters. Many books are created in such a way that all chapters start on an even/odd page, and the large majority have the inner pages typeset for being printed on a specific side (left/right). If you want to optimize the page count anyway, keep in mind how the book would appear when printed out (also using “2 pages per sheet” printing).
6. Number the pages. This is an often-overlooked, but very useful, option. Apart from the default page numbering, the PDF format supports logical page numbering. This can be used to synchronize the PDF page numbers with the actual book page numbers. This is very easy to do in Acrobat and should always be done. To do this, select the necessary pages, right click on them and choose “Number Pages…”.
7. Run OCR (optical character recognition) on the PDF. This is an extremely easy way to make your scanned pages searchable and the text copy/paste-able. Acrobat has a good and easy to use built-in OCR tool. You will find it in the Document menu (Tools pane in Acrobat X). Be sure to disable image resampling, as by default OCR will resample the images, which can easily increase the file size by a huge amount! Keep in mind that OCR is a compute-intensive process and can easily take a couple of hours for a larger book.
8. Optimize document. Acrobat has an option to optimize scanned documents. This runs some image-processing algorithms on the scanned images and compresses them aggressively when it detects text. This is a vital step to keep the size of the document low. It can reduce the file size by a factor of 20! It will also make the antialiasing to look better when pages are minified, if the resolution of the original scans is high enough. This process is also compute-intensive and can easily take an hour for a larger book.
9. (OPTIONAL) Reduce the file size further by using Acrobat’s other optimization options, from which the image downsampling is the most important.
At this point the most important steps are done and you can end here and go to sleep if you see the sunrise through the window. Go on if it’s only 4 AM.
10. (OPTIONAL) Setting the initial view. Open the document properties on the Initial View tab. Here, you can set the initial page, zoom level and which panes (e.g. the bookmarks pane, see below) should be active when the document is opened.
11. (OPTIONAL) Create a PDF table of contents (TOC). The PDF format has a useful (hierarchical) bookmarking feature with a dedicated Bookmarks pane which exists also in Adobe Reader. This feature can be used to reconstruct the book’s TOC for easy document navigation. One simple way to achieve this is the following:
11.a Go to the book’s Contents page, select the chapter title’s text and hit CTRL+B (or right click and choose to add a bookmark from the context menu). Repeat this for each chapter.
11.b Structure the created bookmarks. Rearrange the bookmarks to follow the order and structure of the book’s TOC.
11.c Link the bookmarks to pages. To do this, go over all pages of the book sequentially and every time a new chapter starts, right click on the corresponding bookmark and set the destination to the current page.
12. (OPTIONAL) Create hyperlinks inside the document. The PDF format also supports hyperlinks which can perform actions (e.g. jump to a page or a web site) when clicked. Links can be either rectangles (drawn with a corresponding tool) or text. To create text links, select the text, right click on it and choose to crate a link. There are options to set the link’s appearance and behavior.
You’re done! You have the perfect ebook and you’re late for work!