Friday, October 11, 2013

Multi-media message save format and parsing

I have yet to have a phone that can save multi-media messages, so I took it upon myself to manually save them.

I use a very simple text based key-value format to save them. The keys are free text, and the values may contain arbitrary text, including multiple lines.

small
 value
multi-line
 line 1
 line 2
 line 3

Using this format I just choose some field names for the parts of the message, and separated messages with a field name of "done". Field names may be freely repeated.

type
 incoming
subject
 A Picture/Video Message!
date
 2012-11-27 23:38
from
 +15555555555
 Sender Name
to
 +15656565656
 My Name
text
 the text of the message goes here...
image
 imagejpeg_3.jpg
 5e35003a0e65df1065982c07587cf147ccb76b5d.jpg
done

the first line of a from or to is the number, the second line is the name of the contact. the first line of an image is the file name in the message, the second is the file name of the image in the directory. The other fields names are type, subject, date, text, and slide.

I broke the parsing of the records into usable data structures in Google Apps Script in two phases, the first one extracted the blocks of fields, and the other turned the fields into a data structure.

function forEachRecord(blob, body) {
  var fields = new Array();
  var field = { name: "", value: new Array() };
  
  var lines = blob.getDataAsString().split(/\r?\n/, -1);
  for(var lineIndex = 0; lineIndex < lines.length; lineIndex++) {
    var line = lines[lineIndex];

    // indented lines are the field value
    if(line.substr(0, 1) == " ") {
      field.value.push(line.substr(1));
      continue;
    }
    
    // blank lines are completely ignored
    if(line == "") {
      continue;
    }
    
    // starting a new field
    field = { name: line, value: new Array() };
    
    // records end when a field name of "done" is found
    if(line == "done") {
      body(fields);
      fields = new Array();
      continue;
    }

    // this field is part of the record
    fields.push(field);
  }
  
  if(fields.length) {
    body(fields);
  }
}

function forEachMessage(blob, body) {
  forEachRecord(blob, function(fields) {
    var m = {
      type: "",
      subject: "",
      date: "",
      from: new Array(),
      to: new Array(),
      body: new Array(),
    };
    
    for(var fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
      var f = fields[fieldIndex];
      
      if(f.name == "type") {
        m.type = f.value[0];
        continue;
      }
      
      if(f.name == "to") {
        m.to.push({ address: f.value[0], name: f.value[1] });
        continue;
      }
      
      if(f.name == "from") {
        m.from.push({ address: f.value[0], name: f.value[1] });
        continue;
      }
      
      if(f.name == "date") {
        m.date = f.value[0];
        continue;
      }
      
      if(f.name == "image") {
        m.body.push({ type: "image", name: f.value[0], file: f.value[1] });
        continue;
      }
      
      if(f.name == "text") {
        m.body.push({ type: "text", value: f.value.join("\n") });
        continue;
      }
      
      if(f.name == "slide") {
        m.body.push({ type: "slide" });
        continue;
      }
      
      // if we get here we found an unknown field
    }
    
    body(m);
  });
}

This looks similar to a loop in the function that uses it.

forEachMessage(file.getBlob(), function(message) {
  messages.push(message);
});

Turning the parsed messages into an easily human readable output is not difficult from here, and that was my overall objective.

No comments:

Post a Comment