20090911

Slicing scannings

Last year our department got a nice and shiny Xerox WorkCentre, with scan+mail capabilities. Wonders of technology, now I can scan those chapters I use from books that weigth tons. But... of course, the machine scans two pages each time, and thus creates a double bound PDF.

And this doesn't work well with my way to print booklets, so a working solution was to use ghostscript and ImageMagick from the command line. Automation at its finest.

First, scan whatever you have to scan. In my case, I had it reversed, and had a "black scan line" on the left side. Then:
gs -sDEVICE=jpeg -q -dBATCH -dNOPAUSE -r300x300 -sOutputFile=Page%03d.jpg -dFirstPage=1 PdfFile.pdf
This will split your PDF file into individual pages, and save them as jpg. The following step may be skipped, as I had them rotated... you may not.
mogrify -rotate 180 Page*.jpg
Now we want to get rid of that black rectangle on the right of every page. As this command cuts right sides, and I don't bother reading again ImageMagick's manual, if your black scan zone is on the other side, rotate as in the previous step, cut, and rotate back ;) Observe that mogrify needs its parameters before the file to process.
mogrify -gravity East -chop 150x0 Page%d.jpg
I suggest you try first on an individual page (with convert , not mogrify), and change the 150 pixels to be chopped as you see fit:
convert Page%d.jpg -gravity East -chop 150x0 Test.jpg
Now, look at the image dimensions, either by right-clicking and viewing image properties, or opening in some image editor (as GIMP) divide by two the width and round up. This is the cropping size.
mogrify -crop CroppingSize Page*.jpg
This will generate pair of files named PageN-0.jpg and PageN-1.jpg (if everything worked correctly). Now, delete your PageN.jpg, from the previous step, as you are ready to convert all PageN-M.jpg into PDF files.
mogrify -format pdf -page A4 Page*.jpg
To end this procedure mix them all in a (big!) pdf file
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=output.pdf Page*.pdf
Now you are ready to follow the steps here to turn it into a booklet. Hope you found this useful!

Related posts:
9 programming books I have read and somewhat liked...
Linux is a time killer (follow-up – Ditching Ubuntu: Arch Linux in my Acer Aspire One)
My first port to the Ben NanoNote: gnugo
Power to the command line
Two weeks, still loving Fluxbox
Three dee (3-dimensional file system browsers review)
Gcal: the ultra-powerful command line GNU calendar
Acer Aspire One 8.9' + Ubuntu + Fluxbox
Written by Ruben Berenguel