If you've used the Django framework for your web scripts, you'll appreciate how powerful, flexible and easy-to-use it is. For me, it encapsulates the Unix ethos of simplicity combined with power, and I've been inclined to use it for every web project possible — including this website.
Sometimes, I need to write command-line scripts to populate a database with information from some other source, like a webpage or third-party database. Since Django uses MySQL (among others) for its backend storage, it's fairly easy to use Perl or any other language for this task. Problem is, it's very boring and fiddly, so now I tend to use Django instead, which is really easy.
A couple of jobs ago, I was asked to put together a script which would monitor Amazon's website for price changes, and email shop managers when this occurred. The idea was they could then match Amazon's price on selected key titles. As an aside, I think this is a bit rude and sneaky, but I'm just the programmer.
The Django model looks like this:
from django.db import models
from intranet.phonebook.models import Location
class Book(models.Model):
isbn = models.CharField(maxlength=16)
title = models.CharField(maxlength=255)
shop = models.ManyToManyField(Location, filter_interface=True)
def __str__(self):
return self.title + ' [' + self.isbn + ']'
class Meta:
ordering = ['title', 'isbn']
class Admin:
list_display = ['title', 'isbn']
ordering = ['title', 'isbn']
list_filter = ['shop']
search_fields = ['title', 'isbn']
class Price(models.Model):
amount = models.FloatField(max_digits=6, decimal_places=2)
date = models.DateField()
book = models.ForeignKey(Book)
def __str__(self):
return str(self.amount)
class Meta:
ordering = ['date']
class Admin:
list_display = ['book', 'date', 'amount']
ordering = ['book', 'date', 'amount']
date_hierarchy = 'date'
search_fields = ['book', 'amount']
list_filter = ['book']
That's all fairly standard stuff, but if you need to check the details take a look at the Django Project's documentation page.
The Location object is imported from the company phonebook. For reference, it looks like this:
class Location(models.Model):
division = models.ForeignKey(Division)
region = models.ForeignKey(Region)
name = models.CharField(maxlength=100)
delivery_address = models.TextField(blank=True)
postal_address = models.TextField()
phone = models.CharField(maxlength=50)
fax = models.CharField(maxlength=50, blank=True)
map = models.URLField(blank=True)
aerial_image = models.URLField(blank=True,verify_exists=False)
def __str__(self):
return self.name
class Meta:
ordering = ['name']
class Admin:
list_display = ['name']
ordering = ['name']
The details of each book are filled in from the standard Django admin interface:

Command line script
The command-line script is written in Python, as is Django itself. I've often heard Python derided as a "beginner's language" or "just for teaching", as though being easy to use is a drawback. As a Python beginner, I have to say it's been an easy ride so far. Anyway, the key thing is to start with a few lines of magic to make the Django database API available to your script:
import sys, os, re, urllib2, datetime, random
from time import sleep
# Preamble so we can use Django's DB API
sys.path.append('/data1/')
os.environ['DJANGO_SETTINGS_MODULE'] = 'intranet.settings'
# Load up Django
from django.db import models
from intranet.amazon.models import Price, Book
You only need import sys, os on the first line — the other modules are there for other purposes later in the code, as is from time import sleep. The next few lines merit some explanation:
| sys.path.append('/data1/') | The "intranet" project (of which this "amazon" application is a part) lives in /data1 on my file-system path. Adjust to your requirements. |
| os.environ['DJANGO_SETTINGS_MODULE'] = 'intranet.settings' | The full path to the settings.py file for the "intranet" project is /data1/intranet/settings.py. This line says to use that file to get details of my project, such as the database name and password, etc. Sure, it's a bit confusing when you might expect it to say os.environ['DJANGO_SETTINGS_MODULE'] = 'intranet/settings.py' but we're using Python, not bash. |
That's all you need to start using the Django API in your script. Everything after that point depends on what you want to accomplish with it. In my case, I was scraping some information from Amazon's website (a book price) and recording that price against the book object/record in the DB.
Here's the whole thing. It loads up a list of book objects; iterates through them, scraping a price from Amazon's UK site; then stores the price in the database. The "sleep" command is there so it doesn't hit Amazon too hard — I realise they can probably take it, but I didn't want to be too rude.
import sys, os, re, urllib2, datetime, random
from time import sleep
# Preamble so we can use Django's DB API
sys.path.append('/data1/')
os.environ['DJANGO_SETTINGS_MODULE'] = 'intranet.settings'
# Load up Django
from django.db import models
from intranet.amazon.models import Price, Book
price = re.compile(r'<td><b class="price">.(\d+\.\d{2,2})</b>')
print "Starting at: " + str(datetime.datetime.now())
proxy_handler = urllib2.ProxyHandler({'http': 'http://172.17.2.1:3128/'})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
# Iterate books; use ISBN to get price from Amazon; save price in DB
books = Book.objects.all()
for book in books:
mvps = urllib2.urlopen('http://www.amazon.co.uk/exec/obidos/ASIN/' + book.isbn)
page = mvps.read().split("\n")
for line in page:
if price.search(line):
thenum = price.search(line)
print "ISBN: " + book.isbn + " Price: " + str(thenum.group(1))
price_obj = Price.objects.create(amount=thenum.group(1), date=datetime.datetime.now(), book=book)
price_obj.save()
sl_time = random.randint(5,60)
print "Sleeping for " + str(sl_time) + " seconds"
sleep(sl_time)
print "Finishing at: " + str(datetime.datetime.now())
All the real Django-specific action is contained in these two lines:
price_obj = Price.objects.create(amount=thenum.group(1), date=datetime.datetime.now(), book=book)
price_obj.save()
That's all that's required to create a new price object, relate it to the relevant book, and save it to the database. Mailng the info to the shops was handled by another script, which I might post here sometime — it's a bit more complex and shows off more of the API and integrating it with standard Python tasks. Stay tuned :)