Tuesday, May 11, 2010

compressed index searching

I am continuing to try to make searching my disk indexes faster, this time I tried compressing the index files to reduce the IO required to get them into memory, however the run time went up.

First, the uncompressed index has a size of 3017824 K, and the times of two consecutive scans are
real    2m35.547s
user    2m18.172s
sys     0m6.146s

real    2m36.304s
user    2m17.152s
sys     0m6.242s

Next, the compressed index has a size of 1164228 K, and the times of two consecutive scans are
real    3m20.342s
user    3m4.939s
sys     0m8.805s

real    3m20.269s
user    3m3.866s
sys     0m8.699s

Since the system time went up I wonder if some extra buffering would work, also since this machine has two cores I wonder if I could do the scan in parallel.

Uncompressed with buffering makes it slower:
real    2m42.024s
user    2m15.048s
sys     0m6.214s

Compressed with buffering is slower too:
real    3m34.062s
user    3m3.281s
sys     0m5.488s

I ran each timing multiple times, and the times were mostly the same for each run.

So far straight reading of uncompressed data is fastest when scanning the whole data set.

At a later time I will try and make a parallel version.

No comments:

Post a Comment