Monday, December 6, 2010

wide character text file conversion

My LG Neon phone has a feature where it will save all the text messages as a text file on the memory card, however this is not a nice UTF-8 text file, rater one that is almost UTF-16 with reversed byte order, which confuses the local text editors on my Mac.

Upon investigation of the raw text file I found that the format follows
0xff 0xfe ( char 0x00 )*
where the data I want is the char bytes.

Following this state machine


I wrote a small state machine perl script to convert the file
#!/usr/bin/perl -w
use strict;

my $char;

sub unexpected;
sub read_header_2;

sub read_header_1 {
    read(\*STDIN, $char, 1) or return \&unexpected;
    ord($char) == 0xfe or return \&unexpected;
    return \&read_header_2;
}

sub read_character;

sub read_header_2 {
    read(\*STDIN, $char, 1) or return undef;
    ord($char) == 0xff or return \&unexpected;
    return \&read_character;
}

sub write_character;

sub read_character {
    read(\*STDIN, $char, 1) or return undef;
    return \&write_character;
}

sub read_null;

sub write_character {
    print $char;
    return \&read_null;
}

sub read_null {
    read(\*STDIN, $char, 1) or return undef;
    ord($char) == 0x00 or return \&unexpected;
    return \&read_character;
}

sub unexpected {
    print "unexpected situation\n";
    if(length $char) {
        print "found character: ". ord($char), "\n";
    }
    else {
        print "found enf of stream\n";
    }
    return undef;
}

my $state = \&read_header_1;
while($state) {
    $state = &$state();
}
which worked perfectly, and implemented the state machine directly.

Now I can save my text messages and have them readable.

No comments:

Post a Comment