Matt from off the internets contacted me about a simple little bug in Surftrackr, which I would have caught if I'd just thought about it a bit more carefully. This is roughly equivalent to something a co-worker said to me once upon a time: "I pressed the wrong combination of keys." I did both. I failed to think, and I pressed the wrong combination of keys.
Let's suppose you upload a file called access.log via the Surftrackr admin interface. It will end up in a directory named after the date:
logs/2008/02/01/access.log
If you upload a second file with the same name, Django itself takes care of the name-clash, so the latter file will be saved as:
logs/2008/02/01/access_.log
Note the underscore in the name. Multiple underscores are added, as required, to ensure uniqueness. So far, so good.
If you've set up your Surftrackr installation to parse logfiles using a cron task (as recommended) the program media/logfiles.py will do the following:
If you think about that, you'll see the bug. It looks like this: if a file is moved out of the way, any subsequent files uploaded on the same day will be allowed by Django to keep their original names, eg access.log, rather than being renamed as access___.log or whatever. Hence, the name of the newly-uploaded file will possibly be in preferences_filelock and so the new file won't be processed. Just using the name like this is not the right way to do it.
I thought about using MD5 hashing to checksum the uploaded files, but that's a bit expensive, particularly if your logfiles.py is fired off every few minutes. The solution is much simpler, and is included in the latest release. If you don't want to download the whole thing again, you can do this yourself:
Edit the file preferences/models.py and locate this line:
upload_logfile = models.FileField(upload_to='logs/%Y/%m/%d', blank=True, null=True, \
Change it to this:
upload_logfile = models.FileField(upload_to='logs/%Y/%m/%d/%s', blank=True, null=True, \
Note the /%s in the upload_to value. That increases the granularity of the time used to calculate the name of the upload directory, which should ensure uniqueness. If you make this change, it's a good idea to restart your Apache process, otherwise mod_python keeps its cache of the old code and it will take an indeterminate length of time for the new code to be used.
Another thing Matt suggested is to tail the logfile on the server and load it into Surftrackr "live" rather than requiring an upload. This is a neat idea and I'll be working on it soon.
There were also a couple of omissions in the installation docs:
I feel it is worth mentioning in the "1: Requirements" section of
"How to install Django and Surftrackr", that the following additional python library packages may
be required...
On my Debian box, I had to do the following:
apt-get install python-mysqldb
apt-get install python-pygooglechart
I've updated the installation docs with these instructions. Thanks to Matt for this valuable feedback, and if you have any suggestions of your own or (heaven forfend!) you've spotted any other bugs or mis-features, please contact me by email.
Simon Burns
3 Feb 2008