Saturday, December 11, 2010

Monday, December 6, 2010

wide character text file conversion

My LG Neon phone has a feature where it will save all the text messages as a text file on the memory card, however this is not a nice UTF-8 text file, rater one that is almost UTF-16 with reversed byte order, which confuses the local text editors on my Mac.

Upon investigation of the raw text file I found that the format follows
0xff 0xfe ( char 0x00 )*
where the data I want is the char bytes.

Following this state machine


I wrote a small state machine perl script to convert the file
#!/usr/bin/perl -w
use strict;

my $char;

sub unexpected;
sub read_header_2;

sub read_header_1 {
    read(\*STDIN, $char, 1) or return \&unexpected;
    ord($char) == 0xfe or return \&unexpected;
    return \&read_header_2;
}

sub read_character;

sub read_header_2 {
    read(\*STDIN, $char, 1) or return undef;
    ord($char) == 0xff or return \&unexpected;
    return \&read_character;
}

sub write_character;

sub read_character {
    read(\*STDIN, $char, 1) or return undef;
    return \&write_character;
}

sub read_null;

sub write_character {
    print $char;
    return \&read_null;
}

sub read_null {
    read(\*STDIN, $char, 1) or return undef;
    ord($char) == 0x00 or return \&unexpected;
    return \&read_character;
}

sub unexpected {
    print "unexpected situation\n";
    if(length $char) {
        print "found character: ". ord($char), "\n";
    }
    else {
        print "found enf of stream\n";
    }
    return undef;
}

my $state = \&read_header_1;
while($state) {
    $state = &$state();
}
which worked perfectly, and implemented the state machine directly.

Now I can save my text messages and have them readable.

Friday, December 3, 2010

shell script state machine

The basic structure of a shell script state machine is an endless loop with a case branch structure inside that responds to a current state variable. To end the loop "break" out of the while loop.

state="start"
while true; do
  case "$state" in 
  "start")       
    echo "initial state"
    state="next"
    ;;
  "next") 
    echo "next state"
    state="done"
    ;;
  "done") 
    echo "done"
    break
    ;;
  *)
    echo "invalid state \"$state\""
    break
    ;;
  esac
done

running the above state machine prints
initial state
next state
done

This structure is useful for programs that are easier to express using a flow chart (state chart) than a linear structure, such as an interactive script with retry options.

temporary file space snippet

Often I need a temporary directory for a shell script to do work in, one that gets cleaned up even if the script is aborted by the user with CTRL-C. I spent an hour or so figuring out how to make the cleanup happen reliably and now use the following snippet for the task of making and cleaning up the temporary directory.

tmp=/tmp/$$
mkdir $tmp || exit 1;
cleanup() { rm -r -f "$tmp"; }
trap "cleanup" EXIT

Monday, November 22, 2010

encoding video for the LG Neon

I like to load videos on my phone to take with me to share with others.

Since switching to a LG Neon phone, and getting the videos on the memory card accessible, I found that some of the videos would not play on this phone, and others do not play well. In response I decided to figure out how to encode good video for this phone.

To choose a good target video format for the phone I decided it would be good to check what format the phone records video in. Low resolution video had a size of 176x144 and used the h263 codec, high resolution video had a size of 320x240 and used the mpeg4 codec, the video frame rate was from 9.86 to 11.19 frames per second, the audio track was 8000 Hz 1 channel.

With this information I started trying different encoding settings.

Down sampling the audio made it sound flat, and keeping the sample rate the same as the source video did not seem to have an effect on playback, so I kept the source sample rate.

Using a video frame rate of 15 instead of 10 seemed to become jumpy on playback.

So, I settled on a video size of 320x240 using the mpeg4 coded with a frame rate of 10 frames per second and the audio rate kept the same as the source. I put this into a conversion script:
#!/bin/sh
for file in "$@"; do
   ffmpeg -i "$file" -vcodec mpeg4 -s 320x240 -r 10 -acodec libfaac "$file".3gp
done

Sunday, November 21, 2010

converting AMR audio files on the command line

I found some more ringtone restrictions on the LG Neon phone.

If you wish to use the bug to set a ring tone, then it must show up in the music player. For a file to show up in the music player is seems that it must be an MP3 file, and over 500kb in size (or 30 seconds).


The ring tone I wish to use is a 3 second amr file, so this is what I did.

Convert the amr file to a wave file for editing
ffmpeg -i josie-ring.amr josie-ring.wav

open the resulting wave file in an audio editor and duplicate the audio until it loops past 30 seconds, then save over the wav file.

Convert the wave file to a full size music mp3
ffmpeg -i josie-ring.wav -ab 128 -ac 2 -ar 44100 josie-ring.mp3

Finally send the resulting mp3 file to the phone (I use bluetooth, but I could also put it on the memory card and transfer it off that onto the phone).

Saturday, November 20, 2010

custom ringtones on LG Neon

I got a new phone last night because the screen on my current phone has partially failed. The new phone is a LG Neon TE356.

On my old phone the ring tone was one of my friends going "Ring Ring", and this is not a ring tone I could possibly download from the ring tone store, also the phone only allows DRM protected mp3 files to be used as ring tones.

After a little bit of searching I found a solution using a bug in the phone software.

  • Start the MP3 player (9. Tools -> 1. MP3 Player).
  • Start playing the default song that came with the phone (1. All songs -> Beautiful Sky).
  • While the song is playing, set the desired MP3 as the ring tone or message tone. (Options -> 3. Search -> MP3 File -> Options -> 3. Set As -> 1. Ringtone -> 1. Yes)

So now I can have my ring tone back.

Sunday, November 7, 2010

disabling idle sleep timeouts

I left the server on a shelf for a bit and tried to connect to it remotely, I found it asleep.

so I turned off the only sleep time I could find:
sudo gconftool-2 \
 --direct --config-source xml:readwrite:/etc/gconf/gconf.xml.defaults \
 /apps/gnome-power-manager/timeout/sleep_computer_battery --set --type integer 0

and restarted gdm
sudo /etc/init.d/gdm restart

hopefully the server doesn't go to sleep any longer.

Creating a user account with a previously set password

The user accounts that will exist on the new home server already exist on the previous server, so when they get created the passwords should be copied forward, although this might be a good time to get them set again by the users.

the existing password hashes are stored in /etc/shadow:
happy:$1$xxxxxxxx$xxxxxxxxxxxxxxxxxxxxxx:14667:0:99999:7:::

To create a user account with the same password as listed there, just provide the password hash:
sudo useradd happy -p '$1$xxxxxxxx$xxxxxxxxxxxxxxxxxxxxxx' -s /bin/bash -m

The entire contents of the home directory could also be copied to bring application settings forward as well. Be sure to make sure that the copied files end up owned by the user they were copied for, the user ids will not necessarily be the same.

Ubuntu 10.10 disabling suspend on lid close

The default settings in Ubuntu on a laptop include suspending with the cover is closed, which is reasonable for normal laptop usage, however I am building a server from a laptop with a broken screen, so it will be closed and in the corner most of the time, and not being suspended would be a good thing.

The method that I used on Ubuntu 10.04 did not seem to work, and the control panel does not seem to have the options available on this laptop, so I had to figure out another way to do it.

After reading on how to change some other power manager settings, I came up with a command that changes the settings I want to change. However this just changed it when my user was logged in to the console, to change the default required the commands I had used before and a restart to gdm to load the changed settings.

# make the changes
sudo gconftool-2 \
 --direct --config-source xml:readwrite:/etc/gconf/gconf.xml.defaults \
 /apps/gnome-power-manager/buttons/lid_ac --set --type string nothing
sudo gconftool-2 \
 --direct --config-source xml:readwrite:/etc/gconf/gconf.xml.defaults \
 /apps/gnome-power-manager/buttons/lid_battery --set --type string nothing

# restart gdm
sudo /etc/init.d/gdm restart

The current settings can be viewed with
gconftool-2 /apps/gnome-power-manager/buttons -R

Now the laptop does not suspend when the cover is closed.

Installing Ubuntu 10.10 and verifying the hard disk in the process

I have a 160GB hard drive with some bad blocks that I wish to install ubuntu onto. The installer does not have an option to verify the disk during formatting, so I needed to boot up the live cd to do that step myself. I had actually gone through the install already on this disk without marking the bad blocks, so there is a damaged file system on the disk due to the bad blocks.

After the initial installation round I have the partition table configured, so I just re-create the root filesystem with “mkfs.ext4 /dev/sda1 -c -V” to get the blank root with all the bad blocks marked.

After making the file system, start the installer and tell it to do manual partitions, then tell it to use the freshly formatted partition as root and to not format it.

The rest of the install proceeds as normal.

Saturday, November 6, 2010

three implimentations of a stack in C

I decided to write up three implementations of the basic stack data structure in C, complete with tests.

#include <stdio.h>
#include <stdlib.h>

void check(char *message, int success) {
 printf("%s %s\n", success ? "pass" : "FAIL", message);
}

/*
 * array fixed stack. create an array of length one greater than the maximum
 * stack depth, and call aInit on it to set it up.
 */
void aInit(int *stack) {
 stack[0] = 0;
}

void aPush(int *stack, int value) {
 stack[++stack[0]] = value;
}

int aPeek(int *stack) {
 if(stack[0] == 0) {
  return 0;
 }

 return stack[stack[0]];
}

int aPop(int *stack) {
 if(stack[0] == 0) {
  return 0;
 }

 return stack[stack[0]--];
}

int aSize(int *stack) {
 return stack[0];
}

void aPrint(int *stack) {
 int i;

 /* print the stack from top to bottom */
 for(i = stack[0]; i > 0; i--) {
  printf("%d ", stack[i]);
 }
 printf("\n");
}

void test_a() {
 int stack[10];
 aInit(stack);
 check("a size", aSize(stack) == 0);

 aPush(stack, 1);
 aPush(stack, 2);
 check("a size", aSize(stack) == 2);

 check("a peek", aPeek(stack) == 2);
 check("a peek", aPeek(stack) == 2);

 check("a pop", aPop(stack) == 2);
 check("a size", aSize(stack) == 1);

 aPush(stack, 3);
 check("a size", aSize(stack) == 2);
 check("a peek", aPeek(stack) == 3);
}

/*
 * struct fixed stack
 */
struct sStack {
 int size;
 int data[10];
};

void sInit(struct sStack *stack) {
 stack->size = 0;
}

int sSize(struct sStack *stack) {
 return stack->size;
}

void sPush(struct sStack *stack, int value) {
 stack->data[stack->size] = value;
 stack->size += 1;
}

int sPeek(struct sStack *stack) {
 if(stack->size == 0) {
  return 0;
 }

 return stack->data[stack->size - 1];
}

int sPop(struct sStack *stack) {
 if(stack->size == 0) {
  return 0;
 }

 stack->size -= 1;
 return stack->data[stack->size];
}

void sPrint(struct sStack *stack) {
 int i;

 /* print the stack from top to bottom */
 for(i = stack->size; i > 0; i--) {
  printf("%d ", stack->data[i - 1]);
 }
 printf("\n");
}

void test_s() {
 struct sStack stack;
 sInit(&stack);
 check("s size", sSize(&stack) == 0);

 sPush(&stack, 1);
 sPush(&stack, 2);
 check("s size", sSize(&stack) == 2);

 check("s peek", sPeek(&stack) == 2);
 check("s peek", sPeek(&stack) == 2);

 check("s pop", sPop(&stack) == 2);
 check("s size", sSize(&stack) == 1);

 sPush(&stack, 3);
 check("s size", sSize(&stack) == 2);
 check("s peek", sPeek(&stack) == 3);
}

/*
 * linked dynamic stack
 */
struct lNode {
 struct lNode *next;
 int data;
};

struct lStack {
 struct lNode *head;
};

void lInit(struct lStack *stack) {
 stack->head = NULL;
}

int lSize(struct lStack *stack) {
 int size = 0;
 struct lNode *i;

 i = stack->head;
 while(i != NULL) {
  size += 1;
  i = i->next;
 }
 return size;
}

void lPush(struct lStack *stack, int value) {
 struct lNode *head;

 head = (struct lNode*)calloc(1, sizeof(struct lNode));
 head->data = value;
 head->next = stack->head;

 stack->head = head;
}

int lPeek(struct lStack *stack) {
 if(stack->head == NULL) {
  return 0;
 }

 return stack->head->data;
}

int lPop(struct lStack *stack) {
 struct lNode *head;
 int out;

 head = stack->head;

 if(head == NULL) {
  return 0;
 }

 stack->head = head->next;

 out = head->data;

 free(head);

 return out;
}

void lPrint(struct lStack *stack) {
 struct lNode *i;

 for(i = stack->head; i != NULL; i = i->next) {
  printf("%d ", i->data);
 }
 printf("\n");
}

void lDestroy(struct lStack *stack) {
 struct lNode *head;
 struct lNode *next;

 head = stack->head;
 while(head != NULL) {
  next = head->next;
  free(head);
  head = next;
 }
 stack->head = NULL;
}

void test_l() {
 struct lStack stack;
 lInit(&stack);
 check("l size", lSize(&stack) == 0);

 lPush(&stack, 1);
 lPush(&stack, 2);
 check("l size", lSize(&stack) == 2);

 check("l peek", lPeek(&stack) == 2);
 check("l peek", lPeek(&stack) == 2);

 check("l pop", lPop(&stack) == 2);
 check("l size", lSize(&stack) == 1);

 lPush(&stack, 3);
 check("l size", lSize(&stack) == 2);
 check("l peek", lPeek(&stack) == 3);

 lDestroy(&stack);
}

int main() {
 test_a();
 test_s();
 test_l();
 return 0;
}

I hope the code is clean and clear, but there is a good chance that is is not.

Monday, October 25, 2010

Ubuntu 10.10 VNC Login Screen

I figured out how to get a graphical login screen over VNC on Ubuntu 10.10 today. The method that worked before Ubuntu 10.04 stopped working when XDMCP support was removed from gdm (source).

This procedure starts from a fresh install of Ubuntu-Desktop-10.10.

install xdm, vnc4server, and xinetd.
sudo apt-get install xdm vnc4server xinetd
When asked during installation what the default display manager should be, keep the setting as gdm.

Configure xdm to be able to answer XDMCP requests, comment out the following line in /etc/X11/xdm/xdm-config:
! SECURITY: do not listen for XDMCP or Chooser requests
! Comment out this line if you want to manage X terminals with xdm
!DisplayManager.requestPort:    0
Configure XDM to answer XDMCP requests from localhost, and to listen to just localhost by adding the following lines to /etc/X11/xdm/Xaccess:
localhost
LISTEN localhost

Configure XDM to not bring up a physical display by commenting out the following line in /etc/X11/xdm/Xservers:
#:0 local /usr/bin/X :0 vt7 -nolisten tcp

Configure the startup script to allow XDM to start despite gdm taking care of the screen by removing /etc/X11/default-display-manager:
sudo mv /etc/X11/default-display-manager /etc/X11/default-display-manager.disable

Add the VNC port definition to /etc/services if it has not already been added:
vnc 5900/tcp

Configure the VNC incoming port by creating /etc/xinetd.d/vnc:
service vnc
{
        only_from = localhost 192.168.0.0/24
        disable = no
        id = vnc
        socket_type = stream
        protocol = tcp
        wait = no
        user = nobody
        server = /usr/bin/Xvnc4
        server_args = -inetd -query localhost -once -SecurityTypes=None -pn -fp /usr/share/fonts/X11/misc/,/usr/share/fonts/X11/75dpi/,/usr/share/fonts/X11/100dpi/ -desktop Ubuntu
        log_on_failure += USERID
}
In this configuration connections are restricted to the local network (192.168.0.*).

After all these pieces are done, restart the services to load the new configurations:
sudo /etc/init.d/xdm restart
sudo /etc/init.d/xinetd restart

Now you should be able to use VNC to get a login screen.

There is a problem with gnome in this setup where it has a keyboard shortcut assigned to 'd', which can be fixed by going into System -> Preferences -> Keyboard Shortcuts and disabling, or reassigning the "Hide all normal windows and set focus to the desktop" shortcut key (source). This may happen because the default key binding is Mod4+D, and there is no Mod4 modifier key on the VNC connection.

Thursday, October 14, 2010

removing duplicate files

The challenge: Remove from one directory any file that exists in another directory, however the name of the files can not be used.

my solution:

first, make a list of the hashes of the files in each directory
find . -type f -print0 | xargs -0 shasum -a 256 > ~/tmp/files.lst
edit the file lists if necessary.

take a set intersection of the hashes
cut -f 1 -d ' ' list.txt | sort > hash.txt
comm -12 hash1.txt hash2.txt > clean.txt
the result is a list of just hashes in common between the lists.

list all the files with the selected hashes, using a perl script
#!/usr/bin/perl -w
use strict;

if(@ARGV != 2) {
    die "use: select hashes list\n";
}

my %hashes;
local *IN;
open(IN, "<", $ARGV[0]) or die "open: $!";
while(<IN>) {
    chomp;
    $hashes{$_} = 1;
}
close(IN);

open(IN, "<", $ARGV[1]) or die "open: $!";
while(<IN>) {
    my ($hash) = m,^(\S+), or next;
    $hashes{$hash} or next;
    print $_;
}
close(IN);

invoked like
perl select.pl clean.txt list1.txt | cut -c 67- > remove.txt
which results in a list of file names in the first list for files that also exist in the second list.

finally, remove the files
while read x ; do rm -- "$x"; done < ~/tmp/remove.txt

Tuesday, September 7, 2010

scanbuttond: filtering syslog

In a previous post I started setting up scanner button support on debian, tonight I filtered the messages that appear every two seconds in the log stating that there are no scanners attached.

I opted to just drop all the scanbuttond messages, so I added a file /etc/rsyslogd.d/scanbuttond.conf with the contents
:app-name, equals, "scanbuttond"  ~
and that dropped all the scanbuttond messages.

Another night I will make the buttons do something.

Wednesday, September 1, 2010

furniture building work

I was working at a furniture warehouse, and when they tried me on building I just went with it, and read the directions as I went. Apparently that makes me a natural at building furniture, so I get to do it more now. Building simple things based on directions seems meditative.

Thursday, August 5, 2010

vacation

My perfect vacation is hanging out with my friends wherever they may be for long periods of time. My friends are spread out over at least three provinces so this involves more than a little travel. Also, I go to a few festivals with some of them, which makes the time even better, because of the people and atmosphere combined.

I go to music festivals more for the people than the music.

Wednesday, July 14, 2010

gtkpod on macbook

I can finally use gtkpod on my MacBook laptop, though it is not completely painless. This may be the last item that was keeping the Linux laptop around.

This was done by running gtkpod on Debian in a virtual machine running in VirtualBox, the ipod file system was attached with sshfs, and the destination where I copied the music to was also attached using sshfs.

First I made sure I could SSH into the MacBook, this was done a while ago and I can't exactly remember how to turn on the SSH server.

Then I installed VirtualBox.

Next I installed a basic Debian instance, and installed the openssh-server, sshfs, and gtkpod packages.

I ssh into the virtual machine with the -X option to enable X11 forwarding, make sure iTunes is not running, mount the iPod file system over sshfs, and launch gtkpod. At this point gtkpod is on the screen.

Monday, July 5, 2010

server from laptop, round three

I re-started the building of the home server from the broken laptop, using Debian Lenny this time. After a basic install, I have added openssh-server and avahi-daemon to allow for remote access.

Samba will follow shortly.

Sunday, July 4, 2010

Push API vs Pull API

I have found that Pull APIs are much harder to implement than Push APIs, but often the Pull API is much easier to use.

First I shall show an example of each, suppose I want an API for generating the numbers from 1 to 100.

As a Pull API I get the following.

public class RangePull {
    private int i;

    public RangePull() {
        i = 1;
    }

    public boolean hasNext() {
        return i <= 100;
    }

    public Integer next() {
        int out = i;
        i++;
        return out;
    }

    public static void main(String[] args) {
        RangePull r = new RangePull();
        while (r.hasNext()) {
            Integer i = r.next();
            // do something with i
            System.out.println(i);
        }
    }
}
as a Push API I get the following.
public class RangePush {
    public interface Visitor {
        void accept(int i);
    }

    public void visit(Visitor v) {
        for (int i = 1; i <= 100; i++) {
            v.accept(i);
        }
    }

    public static void main(String[] args) {
        RangePush r = new RangePush();
        r.visit(new Visitor() {
            @Override
            public void accept(int i) {
                // do something with i
                System.out.println(i);
            }
        });
    }
}

The push version is smaller, and in some ways simpler, but the control is inverted so it is not as versatile.

For example, only the pull version can be used to do a side by side comparison of the range generator results, since two of them would be iterating at the same time.

It is also trivial to turn a Pull API into a Push API, by just looping over the result generator and calling back to the visitor.

I have found that the complexity of changing a Push API into a Pull API is braking up the algorithm at the result emitting step, which often requires it to be re-worked as a state-machine. This is especially challenging where the algorithm is recursive.

Friday, July 2, 2010

scanner buttons (introduction)

I am lazy when it comes to usage of my scanner, it has buttons on the front, and I expect to be able to scan using those buttons without having the screen on the computer disrupted. This is not the case on my Mac, but is the configuration I have on my Linux laptop. I have also recently found VirtualBox to work nicely enough to play with, so I may be able to set up the scanner in Linux in a box on the Mac.

So far I installed scanbuttond, and see that the scanner is detected and button presses detected.

apt-get install scanbuttond

Later I will look up the rest of the configuration and return to this mini-project.

Tuesday, June 8, 2010

coffee pancakes

I tried instant coffee in my pancakes this morning, I just made it part of the dry part.

The batter was brown, and after topping with molasses I couldn't even taste the coffee.

I think it worked well enough, though I do not know if it worked for the wake up effect.

Sunday, June 6, 2010

server from laptop, round two

I got gifted a larger hard drive for the laptop based home server, 160 GB, the original disk is 40 GB.

I decided to do a fresh install using the newest ubuntu-desktop (ubuntu-alternate) distribution, 10.04, and have encountered some challenges.

After the first install the hard disk would not boot, which I found was due to a bad block, so I eventually figured out how to create the file system on the disk myself and verify the disk in the process.

After the second install the system froze when the graphical display came up, so far I have found how to boot into a rescue mode with a text prompt, from which I installed openssh-server, and successfully logged into the server.

At this point it is time to leave it for the night and sleep, next I shall figure out how to disable the graphical display.

The easiest option would seem to be installing ubuntu-server instead, but I wish to serve graphical desktops from this server as well.
Today I was doing some asterisk call routing debugging from remote, so I needed a SIP client.

Initially I used Siphon on my jail broken iPod Touch, which worked well enough to do the testing I needed to do.

Afterwards I decided to try and find a working sip client for the Mac I am using...

I started by looking at sip clients on wikipedia, first I tried Blink, it was noted as recommended by apple, but it didn't successfully connect to the sip server, so I moved on.

Linphone took too long to install.

Finally I tried Telephone, it was quite small and simple, and just worked.

So now I have a working SIP client on my mac laptop.

Saturday, June 5, 2010

windows installation challenge

I have been given a challenge: Reinstall Windows XP on a laptop that can not boot from a cdrom or floppy.
I think I have found a way as well, and will post more when I actually do it.

  • make the hard disk dos bootable.
  • copy the windows install files to it.
  • boot from the hard disk.
  • start the windows install.
Soon the results will be posted, with more details.

Wednesday, June 2, 2010

purpose vs reward

I found a video that seems to explain to me why money does not seem to be very motivating. It is all about purpose instead of reward.



Found via Coding Horror: The Vast and Endless Sea.

Friday, May 21, 2010

index bloom filter

I experimented with a bloom filter to reduce the number of index files I need to search when looking for the volumes that contain a particular block.

I found that the filter was quite effective at reducing the number of indexes to search for single blocks, but when I went to search for several thousand blocks I would get at least one false positive in each index. Because I was dealing with probabilities this stands to reason.

parallel index searching

I tried searching the archive system index files in parallel (two at a time) and the wall time went down a little bit. This is a sign that the problem is likely IO bound, since there are no locks involved.

real    1m39.428s
user    2m23.523s
sys     0m7.205s

Tuesday, May 18, 2010

Commons-CLI repeated options

I was curious about how repeated options are handled in commons-cli... Here is a test with the conclusion of that curiosity.

    /**
     * An option that may be specified more than once.
     * 
     * @throws ParseException
     */
    @Test
    public void testMultipleInstance() throws ParseException {
        Options o = new Options().addOption("a", "apple", true, "Apples");

        String[] args = { "-a", "one", "-a", "two" };
        CommandLine c = new GnuParser().parse(o, args);

        assertArrayEquals(new String[] { "one", "two" }, c
                .getOptionValues("apple"));

        assertEquals("one", c.getOptionValue("apple"));
    }

Commons-CLI test

I started using commons-cli yesterday, and was curious about the case where an option is given, but not defined. In these situations I end up either writing tests to check the condition, or small example programs to exercise the condition. In this case I wrote a JUnit test.

It turns out that options that are not defined raise an error condition.

package org.yi.happy.archive;

import static org.junit.Assert.assertArrayEquals;

import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.GnuParser;
import org.apache.commons.cli.Options;
import org.apache.commons.cli.ParseException;
import org.apache.commons.cli.UnrecognizedOptionException;
import org.junit.Test;

/**
 * Experimental tests for the commons-cli library.
 */
public class CommandLineTest {
    /**
     * Show what happens when an invalid option is given.
     * 
     * @throws ParseException
     */
    @Test(expected = UnrecognizedOptionException.class)
    public void testBadOption() throws ParseException {
        Options o = new Options().addOption("a", "apple", true, "Apples");

        String[] arguments = { "--orange", "5" };
        new GnuParser().parse(o, arguments);
    }

    /**
     * End the argument list to catch an option with dashes.
     * 
     * @throws ParseException
     */
    @Test
    public void testArguments() throws ParseException {
        Options o = new Options().addOption("a", "apple", true, "Apples");

        String[] arguments = { "--", "--orange", "5" };
        CommandLine c = new GnuParser().parse(o, arguments);

        assertArrayEquals(new String[] { "--orange", "5" }, c.getArgs());
    }

}

Tuesday, May 11, 2010

compressed index searching

I am continuing to try to make searching my disk indexes faster, this time I tried compressing the index files to reduce the IO required to get them into memory, however the run time went up.

First, the uncompressed index has a size of 3017824 K, and the times of two consecutive scans are
real    2m35.547s
user    2m18.172s
sys     0m6.146s

real    2m36.304s
user    2m17.152s
sys     0m6.242s

Next, the compressed index has a size of 1164228 K, and the times of two consecutive scans are
real    3m20.342s
user    3m4.939s
sys     0m8.805s

real    3m20.269s
user    3m3.866s
sys     0m8.699s

Since the system time went up I wonder if some extra buffering would work, also since this machine has two cores I wonder if I could do the scan in parallel.

Uncompressed with buffering makes it slower:
real    2m42.024s
user    2m15.048s
sys     0m6.214s

Compressed with buffering is slower too:
real    3m34.062s
user    3m3.281s
sys     0m5.488s

I ran each timing multiple times, and the times were mostly the same for each run.

So far straight reading of uncompressed data is fastest when scanning the whole data set.

At a later time I will try and make a parallel version.

Saturday, May 8, 2010

Sustainable Test-Driven Development

I just watched "InfoQ: Sustainable Test-Driven Development" and saw some things I really should do in my own tests.

One thing I should do is make builders for when I am putting structures together, and the idea of having methods that are hiding the details of what is going on so that the tests essentially declare the intent or feature.

Friday, May 7, 2010

tee and digest output stream

I was working on storing files in Happy Archive, the way it is constructed currently I store files by writing the contents to an output stream that makes chunks, encodes, encrypts, and stores the chunks.

I also wanted a hash of the whole data stream to add to the file meta-data.

I considered making a decorator output stream that would calculate the hash of all the data that passed through it, but then also decided that it would be simpler to build a sink that does the hashing, and a tee to send the data to the two sinks (hashing, and encoding). Another argument for not using a filter is that the hashing does not change the data that is passing through.

The tee looks like this:
package org.yi.happy.archive;

import java.io.IOException;
import java.io.OutputStream;

/**
 * An output stream that writes to two output streams.
 */
public class TeeOutputStream extends OutputStream {
    private final OutputStream out1;
    private final OutputStream out2;

    /**
     * create an output stream that writes to two output streams.
     * 
     * @param out1
     *            the first stream to write to.
     * @param out2
     *            the second stream to write to.
     */
    public TeeOutputStream(OutputStream out1, OutputStream out2) {
        try {
            this.out1 = out1;
        } finally {
            this.out2 = out2;
        }
    }

    @Override
    public void write(int b) throws IOException {
        try {
            out1.write(b);
        } finally {
            out2.write(b);
        }
    }

    @Override
    public void write(byte[] b) throws IOException {
        try {
            out1.write(b);
        } finally {
            out2.write(b);
        }
    }

    @Override
    public void write(byte[] b, int off, int len) throws IOException {
        try {
            out1.write(b, off, len);
        } finally {
            out2.write(b, off, len);
        }
    }

    @Override
    public void flush() throws IOException {
        try {
            out1.flush();
        } finally {
            out2.flush();
        }
    }

    @Override
    public void close() throws IOException {
        try {
            out1.close();
        } finally {
            out2.close();
        }
    }
}

Also the hashing output stream looks like this:
package org.yi.happy.archive;

import java.io.IOException;
import java.io.OutputStream;
import java.security.MessageDigest;

import org.yi.happy.archive.key.HashValue;

/**
 * An output stream that calculates the digest of whatever is written to it.
 */
public class DigestOutputStream extends OutputStream {
    private MessageDigest md;
    private HashValue hash;
    private long size;

    /**
     * set up an output stream that calculates the digest of whatever is written
     * to it.
     * 
     * @param md
     *            the {@link MessageDigest} to use. The {@link MessageDigest} is
     *            assumed to be freshly created and not shared.
     */
    public DigestOutputStream(MessageDigest md) {
        this.md = md;
        this.hash = null;
        this.size = 0;
    }

    @Override
    public void write(int b) throws IOException {
        if (md == null) {
            throw new ClosedException();
        }

        md.update((byte) b);
        size += 1;
    }

    @Override
    public void write(byte[] b) throws IOException {
        if (md == null) {
            throw new ClosedException();
        }

        md.update(b);
        size += b.length;
    }

    @Override
    public void write(byte[] b, int off, int len) throws IOException {
        if (md == null) {
            throw new ClosedException();
        }

        md.update(b, off, len);
        size += len;
    }

    @Override
    public void close() throws IOException {
        if (md == null) {
            return;
        }

        hash = new HashValue(md.digest());
        md = null;
    }

    /**
     * Get the final digest value.
     * 
     * @return the final digest value.
     * @throws IllegalStateException
     *             if {@link #close()} has not been called.
     */
    public HashValue getHash() throws IllegalStateException {
        if (hash == null) {
            throw new IllegalStateException();
        }
        return hash;
    }

    /**
     * @return the number of bytes written to this stream.
     */
    public long getSize() {
        return size;
    }
}

HashValue is a value object that represents strings of bytes that are hashes. ClosedException is an IOException.

I also noticed that there is a DigestOutputStream in the Java standard library which is an output stream filter.

Wednesday, May 5, 2010

pancake syrup

To go with the pancakes that I make, I make syrup too.

Take a cup (250 ml) of brown sugar, add two cups of water, and boil the mix down until it just starts to foam, or thickens slightly to not pour like water off the spoon.

Remove from heat and add some vanilla for flavor and serve.

If it is too thick, boil it again and add some water.

If it is too runny, boil it again.

Tuesday, May 4, 2010

Mac OSX Quicktime Codecs

For playing video on various platforms I often use VLC, it worked well until I ran into a file using an audio codec that was not included in it. Since I installed it on this Mac from a binary package I do not know how to add more codecs to it.

In response I tried the other players that were installed here, DivX Player and Quicktime Player, neither of them could handle the audio codec either, but gave a useful error message which I searched for and found the Perian codec pack for Mac OSX.

It worked perfectly, all the media that I have is now recognized by the player that came with the computer.

Tuesday, April 27, 2010

ubuntu custom kernel build script

As I customized my kernel on a ubuntu laptop, I found myself looking up and running the same commands each time I made a change, so I wrote myself a little script to do the work.

#!/bin/bash
cd /usr/src/linux
make-kpkg clean
INSTALL_MOD_STRIP=1 CONCURRENCY_LEVEL=3 make-kpkg --initrd \
 --append-to-version=-custom kernel_image kernel_headers modules_image

I put the script in my path and called it build-kernel, so I could use it with
sudo build-kernel

This made things easy.

Monday, April 26, 2010

searching for many lines in a text file

I spent the day working on my archive system project trying to improve the speed of searching the indexes for batches of keys.

I remember this being slow even when using a relational database, taking several minutes when the list and index were both in the database. If the list was not in the database it was painfully slow.

I have the indexes, sorted by key, one file per volume (768 files).

I have a list of keys to find in a file (24309 entries).

I try to do a binary search of each index file for each key and it takes about 27 seconds per file to do the search, and I have not wanted to wait to see how long the entire search takes (a quick estimate is 6 hours). I would expect that if the indexes were merged into one sorted file then it would still take 27 seconds to search that and get the whole answer I am looking for, but that is extra management to deal with. I am already caching reads in the binary search, they only improved the performance slightly when they were added.

To do a straight linear scan of all the files and pick out the lines I am looking for takes 177 seconds (almost 3 minutes).

Just reading all of the files with no processing
cat */* > /dev/null
takes 63 seconds.

The size of the data set on disk is almost 3G.
excited:~/archive.d/index$ du -h
1.5G ./offsite
1.4G ./onsite
2.9G .

The total number of lines of text in the data set is 26283702.

I am not even sure if it is possible to do faster than the linear scan for performance without merging all of the files together, which I would rather not do to save the size of the volume set and name fields.

reformat iso date

I was given a tiny challenge, write a function that changes "yyyy-mm-dd HH:MM" into "mm/dd/yyyy".

First I wrote a test:
package org.yi.happy;

import static org.junit.Assert.assertEquals;

import org.junit.Test;

public class DateReformatTest {
    @Test
    public void test1() {
        String have = DateReformat.isoToAccess("2010-02-01 14:40");

        assertEquals("02/01/2010", have);
    }

}

Then I wrote the code, and used a regular expression to do the work:
package org.yi.happy;

public class DateReformat {

    public static String isoToAccess(String value) {
        return value.replaceAll(
                "^(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d) \\d\\d:\\d\\d$",
                "$2/$3/$1");
    }

}

All it does is capture some groups of digits, then re-arranges them. If it does not match it just returns the original string.

Saturday, April 24, 2010

binary search of a sorted text file

This is my second time implementing a binary search of a sorted text file, the previous time was many years ago using PERL, this time it is in Java.

Here is the code to do a binary search of a sorted text file.

/**
 * Find the position of the start of the first line in the file that is
 * greater than or equal to the target line, using a binary search.
 * 
 * @param file
 *            the file to search.
 * @param target
 *            the target to find.
 * @return the position of the first line that is greater than or equal to
 *         the target line.
 * @throws IOException
 */
public static long search(RandomAccessFile file, String target)
        throws IOException {
    /*
     * because we read the second line after each seek there is no way the
     * binary search will find the first line, so check it first.
     */
    file.seek(0);
    String line = file.readLine();
    if (line == null || line.compareTo(target) >= 0) {
        /*
         * the start is greater than or equal to the target, so it is what
         * we are looking for.
         */
        return 0;
    }

    /*
     * set up the binary search.
     */
    long beg = 0;
    long end = file.length();
    while (beg <= end) {
        /*
         * find the mid point.
         */
        long mid = beg + (end - beg) / 2;
        file.seek(mid);
        file.readLine();
        line = file.readLine();

        if (line == null || line.compareTo(target) >= 0) {
            /*
             * what we found is greater than or equal to the target, so look
             * before it.
             */
            end = mid - 1;
        } else {
            /*
             * otherwise, look after it.
             */
            beg = mid + 1;
        }
    }

    /*
     * The search falls through when the range is narrowed to nothing.
     */
    file.seek(beg);
    file.readLine();
    return file.getFilePointer();
}

I tested this on a 33MB text file, it took about 15ms to find a line where grep took 600ms. I will be trying to make this faster using caching, and the result will likely be posted later.

Friday, April 23, 2010

reordering columns in a text grid

I just wanted to quickly move a column around in a grid file (tab separated text data file), and had to look up a few commands to do it.

cut - I use this program quite often to select columns from a grid.

paste - join two files line by line with corresponding lines separated with a tab.

file=2007-01-23-1520
out=index.txt
paste <(cut -d $'\t' -f 3 $file) <(cut -d $'\t' -f 1,2,4,5,6 $file) | sort > $out

Basically the two cuts run in parallel, and are piped into paste which reads each a line at a time and outputs the pasted lines, the rest of the command line just sorts the result and saves it to a file.

I am doing this to try and do a binary search on the text file.

Monday, April 19, 2010

finding and visualizing disk usage

I was working with someone to clean up their hard disk because it was getting full, and it was not very apparent where the space was going, so I introduced them to WinDirStat, and it showed the sore spots for usage.

If you are on Linux, the equivalent that came first is KDirStat.

I like it when cleaning up, but usually I end up using
du | sort -n
and working my way from the bottom of the list.

Sunday, April 18, 2010

large set calculations on the command line

I was working on Happy Archive a little bit, burning some backup disks.

To figure out what to put on a disk, I need to figure out which keys are not stored in a particular volume set, and not stored on any volume set.

There are two volume sets, offsite and onsite.

I have a list of keys in the store (store.lst), and a table of what keys from the store exist on what volumes (volume.grid: volume-set, volume, file-name, key).

First I make a list of what keys are in each volume set already:
grep $'^onsite\t' volume.grid | cut -d $'\t' -f 4 | sort > on-onsite.lst
grep $'^offsite\t' volume.grid | cut -d $'\t' -f 4 | sort > on-offsite.lst
Then I take those sets away from the store list (outstanding = store - exist).
sort on-onsite.lst | comm -13 - store.lst > outstanding-onsite.lst
sort on-offsite.lst | comm -13 - store.lst > outstanding-offsite.lst
Now I take the intersection of those sets to find the keys not on either disk set, and the difference to find the keys that are only on one of the disk sets.
comm -12 outstanding-onsite.lst outstanding-offsite.lst > outstanding-common.lst
comm -23 outstanding-onsite.lst outstanding-offsite.lst > outstanding-just-onsite.lst
comm -13 outstanding-onsite.lst outstanding-offsite.lst > outstanding-just-offsite.lst
And finally make the list in the order of preference for burning, giving priority to keys that are not on the backup disks yet.
cat outstanding-common.lst outstanding-just-onsite.lst > outstanding-onsite.lst
cat outstanding-common.lst outstanding-just-offsite.lst > outstanding-offsite.lst

I am going to be embedding this into a java program, but in a pinch this data processing is not hard to do.

There were 141k keys in the store list, and 80k keys in the resulting outstanding lists, and it did not take long to run these commands.

Wednesday, April 14, 2010

streaming radio capture

I was offered the challenge of capturing an mp3 stream of live radio for later listening, so I threw together a short shell script for that purpose.

#!/bin/sh
url=http://out1.cbc.icy.abacast.com:80/cbc-cbcr1hfx-96
time=60
out=stream-`date +%Y-%m-%d-%H%M`.mp3

curl -s -G "$url" > "$out" &
curl_pid=$!

function cleanup() {
  kill $curl_pid
}

trap cleanup EXIT

sleep $time

# cleanup gets implicitly called here

url is the raw stream address, found in the mp3 playlist, the example stream is cbc radio halifax.

time is the length to record in seconds.

output is what to name the output file (stream-yyyy-mm-dd-HHMM.mp3).

Tuesday, April 13, 2010

dumping a table from a sqlite database

While trying to extract the text messages from an iPhone backup, I found a need to extract a table to a text file to make it easier to work with. This extraction ended up being the final result, since it was good enough and could be imported into a spreadsheet for easy viewing.

The table extraction just does a grid escape of the results of a select statement.
#!/usr/bin/env python
import sys, sqlite3

if len(sys.argv) != 3:
  print "expected database file name and table name"
  sys.exit()
file = sys.argv[1]
table = sys.argv[2]

def grid_escape(x):
  if x == None:
    return "\\N"
  x = unicode(x)
  return x.replace("\\", "\\\\").replace("\t", "\\t").replace("\n","\\n")

conn = sqlite3.connect(file)
c = conn.cursor()

c.execute('select * from ' + table)
print "\t".join([col[0] for col in c.description]).encode('utf-8')
for row in c:
  print "\t".join(map(grid_escape,row)).encode('utf-8')

conn.close()
the grid escape function demonstrates how simple it is to encode that format, which is part of why I like it, it is trivial to produce in many languages.

Monday, April 12, 2010

setting gnome options

While searching for how to disable the face browser (list of available users on the login screen) in gnome 2.22, I found the command line method for updating system wide gnome settings.

First to disable the face browser and shutdown buttons on the gnome login screen
sudo gconftool-2 --direct \
 --config-source xml:readwrite:/etc/gconf/gconf.xml.defaults \
 --type bool --set /apps/gdm/simple-greeter/disable_user_list true
sudo gconftool-2 --direct \
 --config-source xml:readwrite:/etc/gconf/gconf.xml.defaults \
 --type bool --set /apps/gdm/simple-greeter/disable_restart_buttons true

And while initially installing the server I wanted to change the power settings for the laptop lid, ideally from the command line, I see that I can do these with these commands,
sudo gconftool-2 --direct \
 --config-source xml:readwrite:/etc/gconf/gconf.xml.defaults \
 --type string --set /apps/gnome-power-manager/buttons/lid_ac blank
sudo gconftool-2 --direct \
 --config-source xml:readwrite:/etc/gconf/gconf.xml.defaults \
 --type string --set /apps/gnome-power-manager/buttons/lid_battery blank
to make the closing of the lid just blank the screen instead of going to sleep. I actually set these through the power configuration applet when I did them the first time.

Sunday, April 11, 2010

samba listening to ipv4 and ipv6

I was having a problem with windows boxes connecting to the server that is being built. After some investigating I figured out that samba was not listening on ipv4, just ipv6.

The solution that worked for me I found on "Flameeyes's Weblog: Tip of the day: if Samba 3.4 fails to work...".

I had to add listen directives to smb.conf.
[global]
interfaces = 127.0.0.1 eth0
bind interfaces only = yes

Now the file server is operational for all the computers on the network.

simple samba configuration

I am building a new home server from a laptop with a smashed screen, and I just configured the file server part of it.

After stripping out the comments and most of the configuration directives from the stock smb.conf file, I was left with a nice short file that works. The objective was to have the home directory available for users that are configured, and a public directory for anyone, also there should be no permissions restrictions on the public directory, people should be able to modify each others files.

The resulting /etc/samba/smb.conf file is
[global]
dns proxy = no
map to guest = Bad User
guest account = guest
unix extensions = no

server string = Home server
log file = /var/log/samba/log.%m
max log size = 1000
syslog = 0

[homes]
browsable = no
read only = no
valid users = %S

[public]
read only = no
guest ok = yes
force user = guest
path = /export/public

I had also created a guest account for this setup.
sudo useradd guest -m -p \*

And of course the public directory to share
sudo mkdir /export
sudo mkdir /export/public
sudo chown guest /export/public

The server is behind a home router, which acts as a firewall, I would not put a configuration like this directly attached to the internet.

Saturday, April 10, 2010

dumping the list of tables in a sqlite database

I was poking through an iPhone backup for a friend, trying to recover the text message log after he accidentally deleted a bunch of messages. I found that the text messages were stored in a SQLite database. It also turned out that the backup was before the time that the messages were deleted, so most of the deleted messages were intact.

I decided to try and dump the whole database to the simple grid format discussed previously. First I needed a list of tables to dump, and I decided to make this an exercise in learning some Python database API.

Here is a Python program to dump the list of tables in a SQLite database.

#!/usr/bin/python
import sys, sqlite3

if len(sys.argv) != 2:
  print "expected database file name"
  sys.exit()
file = sys.argv[1]

conn = sqlite3.connect(file)
c = conn.cursor()

c.execute('select name from sqlite_master where type=\'table\'')
for table in c:
  print table[0]

conn.close()

Wednesday, April 7, 2010

VNC Login on Ubuntu 9.10

I am setting up a new home server to replace the existing one, in the hope of reducing power consumption. The new server is a laptop with a broken screen.

I just configured a VNC Login screen, so here are the notes that I made. (The previous post used an earlier version of Ubuntu.)

The first requirement for the VNC login screen is a XDMCP server, such as GDM. The new GDM configuration applet does not provide much, but after a quick search I found a guide on Enabling XDMCP on Karmic Koala.

I created the file /etc/gdm/custom.conf containing
[xdmcp]
Enable=true
and restarted gdm
sudo restart gdm
which enabled XDMCP.

I also installed a VNC server
sudo apt-get install vnc4server
which will provide the X server for VNC logins.

Finally, an inetd needed to be installed
sudo apt-get install xinetd
so that the vnc service can be configured.

To configure the VNC service I created the configuration file /etc/xinetd.d/vnc
service vnc
{
    type = UNLISTED
    port = 5900
    only_from = localhost 192.168.0.0/24
    disable = no
    socket_type = stream
    protocol = tcp
    wait = no
    user = nobody
    server = /usr/local/bin/xvnc-inetd
}
and the script /usr/local/bin/xvnc-inetd
#!/bin/sh
exec /usr/bin/Xvnc \
 -inetd \
 -query localhost \
 -once \
 -SecurityTypes=None \
 -pn \
 -extension XFIXES
which was named in the configuration. The script is just to keep the line lengths down, making things look cleaner.

Finally I restarted xinetd.
sudo /etc/init.d/xinetd restart

And it works.

The problems I have with the setup at the moment is that the login screen lists the users instead of asking for the name to be typed, and allows shutdown of the system from the login screen.

VNC Login on Ubuntu

I have a laptop running Ubuntu Desktop 9.04, which has been sitting on a shelf getting occasional remote use. I am also in the process of building a replacement home server for the house from a broken laptop, in a hope to replace the current home server with one that uses less power. So, I went about figuring out how to get a gdm login screen over VNC, for remote graphical login.

I found that I needed to use the vnc4server package instead of the other options for vnc servers on the laptop so that the keyboard would translate properly into the remote session.

sudo apt-get install vnc4server

I already knew that I needed to turn on XDMCP, but figuring out how was a little bit of a challenge since each display manager has the option in a different place, and it also changes with versions of the display managers. I eventually found a graphical tool to turn it on under "System -> Administration -> Login Window", in there I just set "Remote -> Style -> Same as Local" and XDMCP was turned on.

Now I needed to have an inetd to launch the vncserver from, on incoming connections.

sudo apt-get install xinetd

and add a vnc service name to /etc/services (if it is not already listed in that file)

vnc 5600/tcp

and add the vnc service to xinetd (/etc/xinetd.d/vnc)

service vnc
{
        only_from = localhost 192.168.0.0/24
        disable = no
        id = vnc
        socket_type = stream
        protocol = tcp
        wait = no
        user = nobody
        server = /usr/bin/Xvnc4
        server_args = -inetd -query localhost -once -SecurityTypes=None -extension XFIXES -pn -fp /usr/share/fonts/X11/misc/,/usr/share/fonts/X11/75dpi/,/usr/share/fonts/X11/100dpi/
        log_on_failure += USERID
}

The long line starting with server_args is server_args = -inetd -query localhost -once -SecurityTypes=None -extension XFIXES -pn -fp /usr/share/fonts/X11/misc/,/usr/share/fonts/X11/75dpi/,/usr/share/fonts/X11/100dpi/, which I could put into a script somewhere to make the line shorter, if I feel the need later.

connections are restricted to the local network only, after restarting xinetd it finally worked.

I will be setting this up again on the server that I am building, and if something is missing I will update it then.

Tuesday, April 6, 2010

ISO9660 Size Estimating

When I was coming up with the format for the backup disks I decided on a flat directory of numbered files because it was simple and usually worked well. Also it made each block addressable by the underlying operating system, externalizing some complexity, and allowed for media defects to have the least amount of impact, allowing for many partially burned disks to still be useful.

The first thought I had for filling the disks was to estimate the image size based on just the total size of the files being put on the disk, this worked well except where there were large numbers of small files (on the order of 100k files).

total = \sum{\lceil\frac{size}{2048}\rceil}2048

For a while I used this estimator with a pad factor based on the number of files, which worked fairly well but sometimes needed adjustment.

Last June I did some research and found the ISO9660 Filesystem specifications, and wrote up a better estimator for my specific case.
  • All files are 8.3 format.
  • There are no extended formats
  • All the files are in the root directory
The resulting estimator for image size in 2 KiB blocks is
174 + \lfloor\frac{count}{42}\rfloor + \sum{\lceil\frac{size}{2048}\rceil}
where the 174 is the disk overhead including padding, super blocks, and path tables; count is the number of file names on the disk, and size is the size of each file. This can be calculated incrementally with just two variables, a count and the sum of sizes so far.

This estimator came out to be right on every time.

Some day I may even use the ECMA-119 Standard (which got approved as the ISO9660 standard) to directly write a disk image instead of populating a directory and using mkisofs to make the image, but I have more important things to do before that.

Here is that same estimator as a PERL code example, calculating 1000 instances of a 2048 byte file.
#!/usr/bin/perl -w
use strict;
use POSIX;

sub sum {
    my $out = 0;
    for(@_) {
        $out += $_;
    }
    return $out;
}

my @sizes = ( 2048 ) x 1000;
my $file_count = @sizes;

my $data_size = sum(map { ceil($_ / 2048) } @sizes);
my $dir_size = floor( $file_count / 42 ) + 1;
my $overhead = 173;

my $size = $overhead + $dir_size + $data_size;

$\ = "\n";
print $size;

Monday, April 5, 2010

splitting a PDF file into one page chunks

Yesterday I found how to split and merge PDF files using the iText Java library.

Today I will post the split program, which I simplified and cleaned up from an example.

package org.yi.happy.pdf;

import java.io.FileOutputStream;

import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;

public class PdfSplitMain {
    public static void main(String[] args) throws Exception {
        if (args.length < 1) {
            System.out.println("use: infile [outbase]");
            return;
        }

        String inFile = args[0];

        String outBase;
        if (args.length < 2) {
            if (inFile.endsWith(".pdf")) {
                outBase = inFile.substring(0, inFile.length() - 4) + "-";
            } else {
                outBase = inFile;
            }
        } else {
            outBase = args[1];
        }

        PdfReader reader = new PdfReader(inFile);
        try {
            for (int i = 1; i <= reader.getNumberOfPages(); i++) {
                String outFile = outBase + String.format("%04d", i) + ".pdf";

                Document document = new Document();
                FileOutputStream output = new FileOutputStream(outFile);
                try {
                    PdfCopy writer = new PdfCopy(document, output);
                    document.open();
                    PdfImportedPage page = writer.getImportedPage(reader, i);
                    writer.addPage(page);
                    document.close();
                    writer.close();
                } finally {
                    output.close();
                }
            }
        } finally {
            reader.close();
        }
    }
}

Basically the input file is opened, and each page from it is written to a separate output file.

After figuring out these programs, with the intention of being able to remix PDF files along page boundaries, I learned that the Preview application of the MAC that I am using can edit PDF files very easily, so I will probably be using that instead most of the time.

Sunday, April 4, 2010

joining pdf files

I want to be able to split and join PDF files to allow me to remix virtual paper like I do physical paper, at least on page boundaries. Most of the pages I deal with come from scans, so are actually images of pages.

After doing a brief search I came across an iText tutorial on merging PDF files. I then took the example code and vastly simplified it for my own purposes.

This code requires the iText jar file and Java5.

package org.yi.happy.pdf;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import java.util.Arrays;

import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;

public class PdfJoinMain {
    public static void main(String[] args) throws Exception {
        if (args.length < 2) {
            System.out.println("use: output input...");
            return;
        }

        String[] pdfs = Arrays.copyOfRange(args, 1, args.length);
        OutputStream output = new FileOutputStream(args[0]);
        try {
            Document document = new Document();
            PdfCopy writer = new PdfCopy(document, output);
            document.open();
            for (String pdf : pdfs) {
                FileInputStream input = new FileInputStream(pdf);
                try {
                    PdfReader reader = new PdfReader(input);
                    for (int p = 1; p <= reader.getNumberOfPages(); p++) {
                        PdfImportedPage page = writer
                                .getImportedPage(reader, p);
                        writer.addPage(page);
                    }
                } finally {
                    input.close();
                }
            }
            document.close();
            writer.close();
        } finally {
            output.close();
        }
    }
}

The code opens the output PDF file, then for each input PDF file, copies each page from the input file into the output file, and finally closes the output file. There are try ... finally constructs to ensure that the files get closed. When the program is invoked with no arguments it prints help and exits.

I feel that this program may actually be counter-intuitive since the target comes first on the command line, instead of last; take copy and move (cp and mv) for instance, the target comes last on the command line.

The splitting code is similar, and will be posted tomorrow.

Saturday, April 3, 2010

grid file format

I do data analysis on the command line at times. One of the tools that gets much use is cut, which extracts delimited columns from lines. Another that gets well used is PERL, to make simple summary or translation filters.

When doing this kind of data analysis I like to use tab and newline delimited text, which I like to call grid. I find CSV to be harder to work with because it does not have delimiters that can not also show up in the values.

grid := line*
line := field* ( cr | lf | cr lf )
field := ( [ not cr lf tab "\" ] | "\r" | "\n" | "\t" | "\\" )* | "\N"

This is actually an 8-bit clean format, any binary value can be put in the field as long as the escape character (backslash), newline characters, and tab character are escaped. Finally this format can also support database NULL values, with "\N". I typically use UTF-8 encoded text in this format.

I originally saw this format in the MySQL manual, it is the import data format.

The main reason I use this format is that it is extremely easy to parse and write, and it also works with many of the stock text processing tools that come with the shell.

Friday, April 2, 2010

reversing log

I was tutoring, and one particular type of problem came up as difficult, solving for a variable in a log.

2 = \log x
To solve this, first exponent both sides on 10, which will reverse the log. The hard part seems to be here.
10^2 = 10^{\log x}
and simplify (10^{\log x} = x).
100 = x

To make is harder, multiply the x by a constant inside the log.
2 = \log 2x
put both sides as exponents of 10
10^2 = 10^{\log 2x}
simplify
100 = 2x
divide both sides by 2
\frac{100}{2} = \frac{2x}{2}
simplify
50 = x

and this time add something to the x inside the log
2 = \log\left( 2x + 1 \right)
put both as exponents of 10
10^2 = 10^{\log\left( 2x + 1 \right)}
simplify
100 = 2x + 1
subtract 1 from both sides
100 - 1 = 2x + 1 - 1
simplify
99 = 2x
divide both sides by two
\frac{99}{2} = \frac{2x}{2}
simplify
\frac{99}{2} = x

what can I do to make it harder? Two layers of log? two layers of logs of different bases? that should be plenty hard.

2 = \ln\log x
The \ln x is the outside log, so it needs to be dealt with first, make both sides exponents of e
e^2 = e^{\ln\log x}
simplify
e^2 = \log x
make both sides exponents of 10
10^{e^2} = 10^{\log x}
simplify
10^{e^2} = x

And now for one that looks a bit scary, the log is in an exponent.
2 = e^{\log x}
First there is the e^x to deal with, so take the natural logarithm of both sides.
\ln 2 = \ln e^{\log x}
simplify
\ln 2 = \log x
and put both sides on an exponent of 10, as in the previous solutions
10^{\ln 2} = 10^{\log x}
simplify
10^{\ln 2} = x

It is all a matter of removing one layer at a time to get to the thing being solved for.

all good?

Thursday, April 1, 2010

revealing the algebraic falacy, 2 != 1

Previously I used only basic algebraic manipulation I showed that 2 = 1, which is also false, now I am going to show where it went wrong, by doing the same steps as before, using the same forms as before.

a = b

Substitute b for a in the entire rest of the article, since they are the same.

b = b

we can do whatever we want to an equation as long as we do the same thing to both sides, so first add b

b + b = b + b

partially simplify

2b = b + b

subtract 2b

2b - 2b = b + b - 2b

partially group like terms

2b - 2b = b + (1-2)b

simplify

2b - 2b = b + (-1)b

adding a negative number is the same as subtracting

2b - 2b = b - b

factor out (b - b)

2(b - b) = 1(b - b)

divide by (b - b), which is also known as zero.

\frac{2(b - b)}{b - b} = \frac{1(b - b)}{b - b}

re-factor to isolate the one, now we have zero over zero, anything can happen

2\left(\frac{b - b}{b-b}\right) = 1\left(\frac{b - b}{b-b}\right)

reduce the one, and we are just not allowed to get here

2(1) = 1(1)

simplify

2 = 1

Division by zero is not allowed, and zero over zero is just undefined.

Monday, March 29, 2010

pancakes

After buying pancake mix for a while, and finding it expensive, I got a recipe for pancake mix from my mother, and an extremely tolerant recipe at that.

  • First take two cups (500 ml) of flour, white or whole wheat works, rice flour does not work well, but 1/4 rice flour mixed with the rest being white or whole wheat works well.
  • Add two table spoons (30 ml) of baking powder, add more to get fluffier pancakes, less to get flatter ones.
  • Add two table spoons (30 ml) of sugar, I currently use Turbinado Brown sugar (raw spun sugar cane), but any sugar works.
  • Add a bit of salt (1 ml), optional.
  • Mix well.

Now you have pancake mix.

To make the batter just mix it with equal parts fluid including 1/8 part oil, increase the liquid to get thinner pancakes, decrease the fluid to get thicker pancakes. I usually use water for the fluid, but milk works well too, even part milk. I haven't tried rice milk in the batter yet, but it should work fine as well.

Bring a frying pan to medium heat, so that a drop of water boils on contact, and pour the mix onto the pan and cook, flipping when the bottom half is cooked.

Personally I have a few common variations that I prepare.
  • Add raisins to the pancake batter, or other fruit, such as blueberries.
  • Add coco to the mix, for chocolate pancakes.
  • Replace a shot of the fluid with hard liquor for spiked pancakes, they go well at some parties.
  • Combination of above.
  • Leave out the sugar for deep fry batter.

Enjoy.

Sunday, March 28, 2010

algebraic fallacy, 2 = 1

Using only basic algebraic manipulation, and trying to not skip any steps, I am going show that 2 = 1 .

a = b

we can do whatever we want to an equation as long as we do the same thing to both sides, so first add a

a + a = b + a

simplify

2a = b + a

subtract 2b

2a - 2b = a + b - 2b

group like terms

2a - 2b = a + (1-2)b

simplify

2a - 2b = a + (-1)b

adding a negative number is the same as subtracting

2a - 2b = a - b

factor out (a - b)

2(a - b) = 1(a - b)

divide by (a - b)

\frac{2(a - b)}{a-b} = \frac{1(a - b)}{a-b}

re-factor to isolate the one

2\left(\frac{a - b}{a-b}\right) = 1\left(\frac{a - b}{a-b}\right)

reduce the one

2(1) = 1(1)

simplify

2 = 1

where did it go wrong? The answer will be given in a later post.

My class was given this puzzle in high school one day, and I was not the first one to see the mistake, but it was quite memorable.

Saturday, March 27, 2010

identifying multiples of 9 (nine)

I was given a simple method to identify multiples of 9 in grade school.

Given a number, it is a series of digits d_1d_2d_3...d_n.

Add up all the digits \sum_{i=1}^nd_i.

If the resulting number is a multiple of nine (repeat the process if it has more than one digit), or is nine, then the original number is a multiple of nine.

For example, take 81, it is the digits 8 and 1, add the digits, 8 + 1, the result is 9, therefore 81 is a multiple of 9.

For another example, take 123456, it is the digits 1, 2, 3, 4, 5, and 6, add the digits, 1 + 2 + 3 + 4 + 5 + 6, the result is 21, which is multiple digits, add the digits, 2 + 1, the result is 3, therefore 123456 is not a multiple of 9.

minimal Apache HTTPD WebDAV configuration

I recently decided to see what the minimal configuration Apache HTTPD needed to allow MacOSX to attach using webdav, as in the minimal configuration to make a HTTP based file server with Apache HTTPD.

The environment:
  • MacOSX client and server
  • The Apache HTTPD modules installed at /usr/libexec/apache2/
  • A working directory at /Users/happy/2010-03-23-dav
  • A data directory at /Users/happy/2010-03-23-dav/data
  • Apache HTTPD version Apache/2.2.13
The minimal configuration ended up being:

# start with "apachectl -f /Users/happy/2010-03-23-dav/httpd.conf -k start"

ServerRoot "/Users/happy/2010-03-23-dav"
Listen 8090

LoadModule dav_module /usr/libexec/apache2/mod_dav.so
LoadModule dav_fs_module /usr/libexec/apache2/mod_dav_fs.so

DAVLockDB logs/DAVLock
<Directory "/Users/happy/2010-03-23-dav/data">
DAV On
</Directory>

DocumentRoot "data"

PidFile logs/httpd.pid
LockFile logs/accept.lock 

This configuration does no authentication or access control, and runs the server as the user that launched it (I launched it as a regular user).

After also doing the same on a Linux machine, the only change was path names, I found that it is the network link rather than the protocol that is limiting my file transfer speed between the two computers.

Friday, March 26, 2010

setting up blogger to format latex expressions

I want to be able to post mathematical expressions and have them formatted nicely.

After some searching I found yourequations.com which had a nice little script to do the job, and did some digging on how Google Docs was doing the rendering of the expressions.

From there I found Google Chart Tools.

After gathering all that information I wrote up a script block to add to the blog template.

To install this just add it to a Blogger layout block after the Blog Posts layout block.

<script type="text/javascript">
var tags = [ "pre", "code" ];
for(var i = 0; i < tags.length; i++) {
  var eqn = document.getElementsByTagName(tags[i]);
  for (var j = 0; j < eqn.length; j++) {
    var e = eqn[j];
    if (e.getAttribute("lang") != "eq.latex") { 
      continue;
    }
    if (e.innerHTML.match(/<img.*?>/i)) {
      continue;
    }

    var str = e.innerHTML.
      replace(/<br>/gi,"").
      replace(/<br \/>/gi,"").
      replace(/&lt;/gi,"<").
      replace(/&gt;/gi,">").
      replace(/&amp;/gi,"&");

    var url_str = escape(str).
      replace(/\+/g, "%2B");

    e.innerHTML = "<img " +
      "src=\"http://chart.apis.google.com/chart" +
      "?cht=tx&chf=bg,s,ffffff00" +
      "&chl=" + url_str + "\" " +
      "title=\"" + str + "\" alt=\"" + str + "\" " +
      "class=\"eq_latex\" align=\"middle\" " +
      "border=\"0\" />";
  }
}
</script>
After installing the layout block just enclose the expressions in <pre lang="eq.latex"> and </pre>, or <code lang="eq.latex"> and </code>. For example, <code lang="eq.latex">\int_{0}^{1}xdx</code> renders as \int_{0}^{1}xdx.

setting up blogger to format source code

I want to be able to post source code with basic syntax highlighting.

After some searching I found a JavaScript module that would do it using markup. After more searching I found a Google hosted copy of the script to use on blogger, in use on JQuery HowTo.

The software used to do this is google-code-prettify: syntax highlighting of code snippets in a web page. It keeps it simple and does what it says it does well.

To install the functionality I added the following block of code to a Blogger layout block after the Blog Posts layout block.

<!--
from: http://jquery-howto.blogspot.com/2009/02/new-code-highlighter-for-bloggercom.html
-->
<script src="http://www.gstatic.com/codesite/ph/2429932258724909799/js/prettify/prettify.js">
</script>
<script type="text/javascript">
prettyPrint();
</script>

<style type="text/css">

/* Pretty printing styles. Used with prettify.js. */

.str { color: #080; }
.kwd { color: #008; }
.com { color: #800; }
.typ { color: #606; }
.lit { color: #066; }
.pun { color: #660; }
.pln { color: #000; }
.tag { color: #008; }
.atn { color: #606; }
.atv { color: #080; }
.dec { color: #606; }
pre.prettyprint { padding: 2px; border: 1px solid #888; overflow:auto; }

@media print {
  .str { color: #060; }
  .kwd { color: #006; font-weight: bold; }
  .com { color: #600; font-style: italic; }
  .typ { color: #404; font-weight: bold; }
  .lit { color: #044; }
  .pun { color: #440; }
  .pln { color: #000; }
  .tag { color: #006; font-weight: bold; }
  .atn { color: #404; }
  .atv { color: #060; }
}
</style>

To use it in a post just involves wrapping the code in <pre class="prettyprint"> and </pre>, or <code class="prettyprint"> and </code>, and ensuring that the ampersands and angle brackets are properly escaped using &amp;, &lt;, and &gt;.

Update: Added automatic scroll bars for wide lines.

Introduction

I have decided that it is time for me to start a professional blog.  I plan to post easy to follow examples of things I can do in math, programming, and application configuration, among other topics that come up.