Tuesday, April 6, 2010

ISO9660 Size Estimating

When I was coming up with the format for the backup disks I decided on a flat directory of numbered files because it was simple and usually worked well. Also it made each block addressable by the underlying operating system, externalizing some complexity, and allowed for media defects to have the least amount of impact, allowing for many partially burned disks to still be useful.

The first thought I had for filling the disks was to estimate the image size based on just the total size of the files being put on the disk, this worked well except where there were large numbers of small files (on the order of 100k files).

total = \sum{\lceil\frac{size}{2048}\rceil}2048

For a while I used this estimator with a pad factor based on the number of files, which worked fairly well but sometimes needed adjustment.

Last June I did some research and found the ISO9660 Filesystem specifications, and wrote up a better estimator for my specific case.
  • All files are 8.3 format.
  • There are no extended formats
  • All the files are in the root directory
The resulting estimator for image size in 2 KiB blocks is
174 + \lfloor\frac{count}{42}\rfloor + \sum{\lceil\frac{size}{2048}\rceil}
where the 174 is the disk overhead including padding, super blocks, and path tables; count is the number of file names on the disk, and size is the size of each file. This can be calculated incrementally with just two variables, a count and the sum of sizes so far.

This estimator came out to be right on every time.

Some day I may even use the ECMA-119 Standard (which got approved as the ISO9660 standard) to directly write a disk image instead of populating a directory and using mkisofs to make the image, but I have more important things to do before that.

Here is that same estimator as a PERL code example, calculating 1000 instances of a 2048 byte file.
#!/usr/bin/perl -w
use strict;
use POSIX;

sub sum {
    my $out = 0;
    for(@_) {
        $out += $_;
    }
    return $out;
}

my @sizes = ( 2048 ) x 1000;
my $file_count = @sizes;

my $data_size = sum(map { ceil($_ / 2048) } @sizes);
my $dir_size = floor( $file_count / 42 ) + 1;
my $overhead = 173;

my $size = $overhead + $dir_size + $data_size;

$\ = "\n";
print $size;

No comments:

Post a Comment