Friday, September 7, 2012

JPEG scans to PDF

Long ago I would scan pages of paper to JPEG images at 150dpi, more recently I have been scanning to PDF, and today I finally got around to converting all the old scans into the new format.  This presented a challenge because not all of the scan files are the same dimensions.

My first challenge was figuring out what the dimensions of each JPEG file were, which was not so hard once I realized that I was using Portable Any Map intermediates between the JPEG file and the Postscript version that gets turned into a PDF file, the intermediate format has the dimensions of the image in plain text in a header which I could just read.

The rest of the challenges were minor in comparison, either because I solved them in the past, or were solved by a quick search online.

This first script converts an incoming Portable Any Map to a properly sized postscript file with a 150dpi version of the image as the entire page: pnmtops-150dpi-filter
#!/usr/bin/perl -w
use strict;
my $ver = <STDIN>;
my $size = <STDIN>;

my ($width, $height) = ($size =~ m{^(\d+) (\d+)$});

$width = $width / 150.0 + 1.0 / 72.0;
$height = $height / 150.0 + 1.0 / 72.0;

local *OUT;
open(OUT, "|-", "pnmtops", "-dpi", "150", "-equalpixels", "-nocenter",
        "-width", $width, "-height", $height, "-setpage");

my $buff;

print OUT $ver;
print OUT $size;
while(read(STDIN, $buff, 8192)) {
    print OUT $buff;


The above script gets used by the script that I use to do the conversion of directories to pdf files: scan2pdf
if [ "$2" == "" ]; then
  echo "converts a directory of 150dpi scan images to a properly fit pdf"
  echo use: input-directory output-file

for x in "$1"/*.jpg; do
  djpeg "$x" | pnmtops-150dpi-filter
done |
ps2pdf14 - "$2"

This seems to be working well for me, it even produces PDF files with varying page sizes.

