To figure out what to put on a disk, I need to figure out which keys are not stored in a particular volume set, and not stored on any volume set.
There are two volume sets, offsite and onsite.
I have a list of keys in the store (store.lst), and a table of what keys from the store exist on what volumes (volume.grid: volume-set, volume, file-name, key).
First I make a list of what keys are in each volume set already:
grep $'^onsite\t' volume.grid | cut -d $'\t' -f 4 | sort > on-onsite.lst grep $'^offsite\t' volume.grid | cut -d $'\t' -f 4 | sort > on-offsite.lstThen I take those sets away from the store list (outstanding = store - exist).
sort on-onsite.lst | comm -13 - store.lst > outstanding-onsite.lst sort on-offsite.lst | comm -13 - store.lst > outstanding-offsite.lstNow I take the intersection of those sets to find the keys not on either disk set, and the difference to find the keys that are only on one of the disk sets.
comm -12 outstanding-onsite.lst outstanding-offsite.lst > outstanding-common.lst comm -23 outstanding-onsite.lst outstanding-offsite.lst > outstanding-just-onsite.lst comm -13 outstanding-onsite.lst outstanding-offsite.lst > outstanding-just-offsite.lstAnd finally make the list in the order of preference for burning, giving priority to keys that are not on the backup disks yet.
cat outstanding-common.lst outstanding-just-onsite.lst > outstanding-onsite.lst cat outstanding-common.lst outstanding-just-offsite.lst > outstanding-offsite.lst
I am going to be embedding this into a java program, but in a pinch this data processing is not hard to do.
There were 141k keys in the store list, and 80k keys in the resulting outstanding lists, and it did not take long to run these commands.