Sunday, April 4, 2010

joining pdf files

I want to be able to split and join PDF files to allow me to remix virtual paper like I do physical paper, at least on page boundaries. Most of the pages I deal with come from scans, so are actually images of pages.

After doing a brief search I came across an iText tutorial on merging PDF files. I then took the example code and vastly simplified it for my own purposes.

This code requires the iText jar file and Java5.

package org.yi.happy.pdf;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import java.util.Arrays;

import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;

public class PdfJoinMain {
    public static void main(String[] args) throws Exception {
        if (args.length < 2) {
            System.out.println("use: output input...");
            return;
        }

        String[] pdfs = Arrays.copyOfRange(args, 1, args.length);
        OutputStream output = new FileOutputStream(args[0]);
        try {
            Document document = new Document();
            PdfCopy writer = new PdfCopy(document, output);
            document.open();
            for (String pdf : pdfs) {
                FileInputStream input = new FileInputStream(pdf);
                try {
                    PdfReader reader = new PdfReader(input);
                    for (int p = 1; p <= reader.getNumberOfPages(); p++) {
                        PdfImportedPage page = writer
                                .getImportedPage(reader, p);
                        writer.addPage(page);
                    }
                } finally {
                    input.close();
                }
            }
            document.close();
            writer.close();
        } finally {
            output.close();
        }
    }
}

The code opens the output PDF file, then for each input PDF file, copies each page from the input file into the output file, and finally closes the output file. There are try ... finally constructs to ensure that the files get closed. When the program is invoked with no arguments it prints help and exits.

I feel that this program may actually be counter-intuitive since the target comes first on the command line, instead of last; take copy and move (cp and mv) for instance, the target comes last on the command line.

The splitting code is similar, and will be posted tomorrow.

No comments:

Post a Comment