Scripts on Scripts: June 2012

Thursday, 28 June 2012

Python, datetimes and timezones

Datetimes, UTC and Timezones

This week I was fortunate enough to be able to explore the issue of datetimes and making them timezone-aware in Python. I didn't want to use a ready-made module like pytz or dateutil to do this. My discoveries ended up not being used in production for my work, but they were very useful none the less!

The conventional wisdom is that datetime information for applications should be stored according to the Universal Coordinated Time (UTC) standard. This is especially important for a website with international users, and the storage of its datetime information in a database. When the data is first captured, say by a form being filled out, the data is probably in the client's local time. Then it gets stored in in UTC format internally by the application. When later it needs to be retrieved, it is converted back to the client's local time. For example:

a user's client sends datetime data from the "NZDT" timezone
the app stores a calculated datetime in the database in the "UTC" timezone
when the user wants the data back, it is converted back again to a datetime for the local "NZDT" timezone.

In this way, there will always be a consistent timezone for the storage of data, and a relevant timezone for retrieved data to be displayed in.

Timezones are an interesting thing, and they are based around the idea of offsets from the UTC (aka GMT) datetime value, which is the datetime always used as a base. As an example, the normal timezone for New Zealand is "NZST", which has an offset of +12 hours. You can find a list of timezone codes here. Timezones are also further complicated by the idea of Daylight Savings time, which may add or subtract additional hours to the UTC offset. Once Daylight Savings kicks in in my country, the timezone becomes "NZDT" and the offset becomes +12+1 hours.

Apply this to Python

Python has an abstract class named tzinfo, which you are meant to implement yourself for setting timezones on datetime.datetime objects. After reading the Python docs for tzinfo online, I was a bit puzzled about how exactly to do this. Luckily, I found a great answer on StackOverflow (like usual) and proceeded from there. Here it is:
class Zone(tzinfo):
    """ Sets some properties for the tzinfo abstract class. """
    def __init__(self, offset, isdst, name):
        self.offset = offset
        self.isdst = isdst
        self.name = name


    def utcoffset(self, dt):
        return timedelta(hours=self.offset) + self.dst(dt)


    def dst(self, dt):
        if self.isdst:
            dst = timedelta(hours=1)
        else:
            dst = timedelta(0)
        return dst


    def tzname(self, dt):
         return self.name

I found the Daylight Savings start and end dates for my country, New Zealand, that I would need as well for calculations. These rules stated that:

"Daylight Saving commences on the last Sunday in September, when 2.00am becomes 3.00am, and it ends on the first Sunday in April, when 3.00am becomes 2.00am."

So, here is my implemention of the timezones I needed in a dictionary. This is for NZ time without daylight savings (NZST), NZ time including daylight savings (NZDT), and UTC time (UTC).

 tzone = {"GMT": Zone(0, False, "GMT"),

            "NZST": Zone(12, False, "NZST"),

            "NZDT": Zone(12, True, "NZDT")

            }

So, say you had an ordinary datetime value like:

dt = datetime.datetime(2012, 6, 28, 0, 0, 0)

This is a timezone-naive datetime object, since there is nothing set for the optional parameter at the end, "tzinfo". To make it timezone aware for my local timezone, NZDT, you'd do as follows:

dt = dt.replace(tzinfo=tzone["NZDT"]) 

Then to get this back as UTC datetime:
utc_dt = dt.astimezone(tz=tzone["GMT"])

To verify this (remember the local offset is +12 hours):

print(dt.strftime("%d/%m/%Y %H:%M:%S %Z"))

28/06/2012 00:00:00 NZST

print(utc_dt.strftime("%d/%m/%Y %H:%M:%S %Z"))

27/06/2012 12:00:00 GMT

Next, I wrote two functions that would give me the datetime values for the start and end of the daylight savings period for any given year, according to those pesky rules above. Then I thought: "How the heck am I going to test that my program can set a datetime with a correct timezone based on a correct daylight savings setting?" A divine inspiration hit me, and I decided to loop through all the days in the year 2012, and for each day: do just that!

For each day between 1/1/2012 till 31/12/2012 (inclusive), my script would calculate two datetime objects: one for the datetime at midnight (00:00:00) and one for after 3am, when Daylight Savings might have kicked in. And it did just that.

So, I added some more calculations to turn each of those local datetimes into datetimes with the timezone set as UTC. This was so I could check that UTC conversion worked too. Below is the script in full, gentle reader.

#!/usr/bin/env python

#-*- coding: utf-8 -*-

from datetime import datetime, tzinfo, timedelta
from calendar import monthrange as cal_monthrange

class Zone(tzinfo):
    """ Sets some properties for the tzinfo abstract class. """
    def __init__(self, offset, isdst, name):
        self.offset = offset
        self.isdst = isdst
        self.name = name


    def utcoffset(self, dt):
        return timedelta(hours=self.offset) + self.dst(dt)


    def dst(self, dt):
        if self.isdst:
            dst = timedelta(hours=1)
        else:
            dst = timedelta(0)
        return dst


    def tzname(self, dt):
         return self.name


def get_start_month_get_last_sunday(year, month, NZDT, get_naive=True):

""" Returns a datetime.datetime object for the start of the Daylight Savings period for NZ. """

    days_in_month = cal_monthrange(year, month)[1]
    days = [datetime(year, month, day) for day in range(20, days_in_month + 1)]
    days = [day for day in days if day.weekday() % 6 == 0] # filter
    last_sunday = days[-1]
    if get_naive:
        ret_val = datetime(year, month, last_sunday.day, 2, 0, 0)
    else:
        ret_val = datetime(year, month, last_sunday.day, 2, 0, 0, tzinfo=NZDT)
    return ret_val

def get_end_month_first_sunday(year, month, NZST, get_naive=True):
    """ Returns a datetime.datetime object for the end of the Daylight Savings period for NZ. """
    days = [datetime(year, month, d) for d in range(1, 8)]
    days =[day for day in days if day.weekday() % 6 == 0] # filter
    first_sunday = days[0]
    if get_naive:
        ret_val = datetime(year, month, first_sunday.day, 3, 0, 0)
    else:
        ret_val = datetime(year, month, first_sunday.day, 3, 0, 0, tzinfo=NZST)
    return ret_val


def is_dst(dt, tzone, dst_start, dst_end):

""" Returns True if the datetime is within the Daylight Savings period. """

    if dt > dst_start:
        dst = True # In DST zone for next year
    elif dt > dst_end:
        dst = False # Not in DST zone for current year
    else:
        dst = True # In DST zone started in previous year
    return dst


def get_local_zoned_dt(dt, tzone, dst_start, dst_end):

""" Returns the datetime.datetime object with its tzinfo assigned, based on the daylight savings period. """

    return (dt.replace(tzinfo=tzone["NZDT"])
            if is_dst(dt, tzone, dst_start, dst_end)
            else dt.replace(tzinfo=tzone["NZST"]))


def create_csv_for_year(start, end, dst_start, dst_end, tzone):

""" Loop through the datetimes in a datetime range. Write data to a CSV file for inspection. """

    display_patn = "%d/%m/%Y %H:%M:%S %Z"
    one_day = timedelta(days=1)
    after_3am = timedelta(hours=3, minutes=1)
    logf = open("data/log.csv", "w")
    hdrs = "Local DT,UTC DT,Later DT,Later UTC DT"
    logf.write(hdrs + "\n")
    ind = start
    while ind <= end:
        # Get datetime values for local and UTC time.
        ind = get_local_zoned_dt(ind, tzone, dst_start, dst_end)
        ind_utc = ind.astimezone(tz=tzone["GMT"])
        # Get datetime values for local and UTC time, 3 hours later.
        later_dt = ind + after_3am
        later_dt = get_local_zoned_dt(later_dt, tzone, dst_start, dst_end)
        later_dt_utc = later_dt.astimezone(tz=tzone["GMT"])

        row = [ind.strftime(display_patn),
                ind_utc.strftime(display_patn),
                later_dt.strftime(display_patn),
                later_dt_utc.strftime(display_patn),
                ]
        logf.write(",".join(row) + "\n")
        ind += one_day
    logf.close()

if __name__ == "__main__":
    tzone = {"GMT": Zone(0, False, "GMT"),
            "NZST": Zone(12, False, "NZST"),
            "NZDT": Zone(12, True, "NZDT")
            }
    # Set timezone naive query start and end parameters.
    start = datetime(2012, 1, 1, 0, 0, 0)
    end = datetime(2012, 12, 31, 23, 59, 59)
    # Get timezone naive datetime values for the start and end of the Daylight

# Savings Period.

    dst_start_naive = get_start_month_get_last_sunday(start.year, 9, None, get_naive=True)
    dst_end_naive = get_end_month_first_sunday(start.year, 4, None, get_naive=True)

    # Get local datetime values for query start & end.
    start = get_local_zoned_dt(start, tzone, dst_start_naive, dst_end_naive)
    end = get_local_zoned_dt(end, tzone, dst_start_naive, dst_end_naive)
    # Get local datetime values for Daylight Savings start & end.
    dst_start = get_local_zoned_dt(dst_start_naive, tzone, dst_start_naive, dst_end_naive)
    dst_end = get_local_zoned_dt(dst_end_naive, tzone, dst_start_naive, dst_end_naive)
    # Write data for datetimes in the query range to a CSV file.
    create_csv_for_year(start, end, dst_start, dst_end, tzone)

Sunday, 17 June 2012

Mercurial, Apache and mod_wsgi

In a previous post, you might have read that I was given a mission to set up a Mercurial code repository for users within a private network, to be accessed via passwords. What this meant was, use Apache web server to serve up a Mercurial repository as a live URL, with basic HTTP authentication via passwords.

Normally I access things stored in Mercurial through ssh protocol, not HTTP, so this was something I hadn't tried in a long time. My memory doesn't stretch back more than six months! So, I wrote down what I had to do to achieve this. I needed to make Mercurial, Apache, mod_wsgi and Python play nicely together.

I thought I would share the things I did to get things going. I'm assuming that you already know something about:

Apache HTTP server
Linux, the command line and apt for packages
Python and WSGI
Mercurial version control system (cloning)

If you're interested, the server was running slightly older versions of things: Ubuntu 10.04 and Python 2.6x. That's why I went with the slightly old mod_wsgi for Apache. Even though I did this on Ubuntu, I dare say these instructions would be almost identical for someone using Linux Mint, Crunchbang or Debian.

Here are the steps to follow, for a project named "booklister"

On the server

Make sure you have copied your project's source files over to the server first,
e.g. by zipping them and using scp

Install Apache and Mercurial

apt-get install apache2 mercurial

Install the mod_wsgi interface for Python
apt-get install libapache2-mod-wsgi

Restart Apache
/etc/init.d/apache2 restart

Install the Python module for Mercurial

pip install mercurial

Create an "hgusers" group for all users who will be working with the repository

groupadd hgusers

Add users into the group, using this format:
usermod -a -G GROUPNAME USERNAME

usermod -a -G hgusers scott
usermod -a -G hgusers anotheruser

You can view the users in the new group:
cat /etc/group

Setup password access for Mercurial via Apache: create user passwords for Apache authentication using the htpasswd utility

htpasswd -cb /etc/apache2/hgpasswd scott scottspassword
htpasswd -b /etc/apache2/hgpasswd anotheruser anotherpassword

Create a directory which will hold the Mercurial repositories.
Set permissions on it for the Apache user and the hgusers group
cd /var/lib
mkdir hg_repos
chown -R www-data:hgusers hg_repos
chmod -R g+rwx hg_repos

Initialise a Mercurial repository, and copy source project files into it. Add them all into version control.

hg init booklister
cd booklister
cp -r /home/scott/temp/booklister/* .
hg add

Create an hgrc file (for Mercurial) in the repository's .hg directory

vim .hg/hgrc

[ui]
username = "scott@mycompany.com"

[trusted]
groups = hgusers
users = scott, anotheruser

[web]
allow_read = scott, anotheruser
allow_push = scott, anotheruser
allow_archive = gz, zip, bz2
push_ssl=False

Create an hgweb.wsgi file (for mod_wsgi) in the repository directory

vim hgweb.wsgi

#!/usr/bin/env python

""" File for Mercurial, Apache and mod_wsgi http access.
Enable demandloading to reduce startup time.
"""

from mercurial import demandimport
from mercurial.hgweb import hgweb

# Path to repo or hgweb config to serve (see 'hg help hgweb')
config = "/var/lib/hg_repos/booklister"
demandimport.enable()
application = hgweb(config)

Make an initial commit for the new files, with a comment

hg commit -m "First commit"

Correct permissions on the repository files that may be new or changed

cd /var/lib/hg_repos
chown -R www-data:hgusers booklister
chmod -R g+rwx booklister

Edit the Apache config file (or a virtual host file). Add Directory and Location blocks.

vim /etc/apache2/apache2.conf

WSGIScriptAlias /booklister /var/lib/hg_repos/booklister/hgweb.wsgi
WSGIDaemonProcess booklister user=www-data group=hgusers threads=1 processes=10
<Directory /var/lib/hg_repos/booklister>
Options ExecCGI FollowSymlinks
AddHandler wsgi-script .wsgi
AllowOverride None
Order deny,allow
deny from all
allow from vvv.ww.xxx.y/zz
</Directory>

ScriptAlias /booklister "/var/lib/hg_repos/booklister/hgweb.wsgi"
<Location /booklister>
AuthType Basic
AuthUserFile /etc/apache2/hgpasswd
AuthName "Book Lister project repo"
Require valid-user
</Location>

Restart Apache
/etc/init.d/apache2 restart

Locally

Clone the remote repository in a desired location. You'll now have a local copy!

cd /home/scott/ws/py
hg clone http://myserver/booklister

-Enter user name and password

scott
scottspassword

Create a local rc file for Mercurial. You can do this in your home ~ directory,
or in the clone project's folder. I usually put mine in the project's folder.

cd booklister
vim .hg/hgrc

[ui]
username = Scott Davies <scott@mycompany.com>
password = scottspassword

[paths]
default = http://myserver://booklister

Try making a change to file, then try to push the change to the repository

hg commit -m "Test change"
hg push

pushing to http://myserver//booklister
http authorization required
realm: Book Lister project repo
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files

Wednesday, 13 June 2012

Latest Python on Crunchbang Linux

Introductory Spiel

It's been an interesting week and a half for me. I've been writing a system script in Python for backing up some source code files, transferring them between various servers and then setting them up in the Mercurial version control system as repositories. Because of this mission, I have (re)discovered the wonderful usefulness of a number of Python modules, including:

subprocess: for setting up an ssh connection and port forwarding.
elementTree: for storing server logins and directory paths in XML.
paramiko: for ssh logins, executing BASH scripts and sftp transfers.

Anyway, the end result of all this was that I needed to test out some user and ssh logins with keys. (I'll also need to test HTTP password access to a Mercurial repository, done through Apache and mod_wsgi. But I digress!) I didn't want to mess up my nice, tidy Ubuntu install, so I thought: give me a virtual machine running a lightweight Linux with Python. I'll be able to mess that up all I like by setting up dummy user accounts!

After trying out a few lightweight distros I settled on Crunchbang Linux as the Linux to install in my VM. It's derived from Debian, a distro I used to like a lot. By default, it runs the lightweight Openbox window manager and uses only an awesomely low amount of RAM. So I installed Crunchbang 10, ran most of the options in the handy post-installation script, and then...

I realised it has the same problem I usually had with Debian, which is that its default Python installed version is rather old, i.e. 2.6.6 from over 16 months ago. Right, I thought, time to solve this issue. What I did was download and compile Python 2.7.3 from source, then make a shortcut to it.

Do the Python Dance

Here are the steps I followed, gentle blog reader:

Switch to root-ish user:

sudo -i

(Enter password)

Install Debian packages for Python:

apt-get install build-essential zlib1g-dev libncurses5-dev

apt-get install libgdbm-dev libbz2-dev libreadline5-dev

apt-get install libssl-dev libdb-dev libxml2 libxslt1-dev

apt-get install python-setuptools python-dev python-pip

# If you know you'll inevitably use MySQLdb too:

apt-get install libmysqlclient-dev

pip install virtualenv

There is a problem with a libsqlite3 package version for Python. To solve it, start Synaptic and search for libsqlite3-0. (Note: this removes the Iceweasel browser, but this sucks anyway.)

Go to >Packages >Force version >3..7.3-1 (stable)

Click >Apply, and apply again

Then install the trouble-maker in the terminal:

apt-get install libsqlite3-dev

Browse to a directory, like /home/scott/downloads. Download the latest Python source, and unzip it.

wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz

tar xvf Python-2.7.3.tgz

I wanted to store my latest Python version in the /opt directory. So I moved it there.

mv Python-2.7.3 /opt

cd /opt/Python-2.7.3/

Then run the compile and install commands.(These can take a while.)

./configure --prefix=/opt/python27 --enable-shared

make

make install

Linux needs to find the new Python library object file. Test that you can run the Python 2.7 intrepreter with:

LD_LIBRARY_PATH=/opt/python27/lib /opt/python27/bin/python

If this works, you will want to make it work permanently by:

(a) editing shell profile files:

vim ~/.profile (for a non-root user, vim ~/.bashrc instead)

add this:

export LD_LIBRARY_PATH=/opt/python27/lib

(b) Setting this in the config file so the system can find the library information:

vim /etc/ld.so.conf

add this:

include /opt/python27/lib

ldconfig

(d) Quitting out of a shell, then starting it up again.Then type:

/opt/python27/bin/python

-you should see the Python 2.7 prompt!

Make a link (shortcut) to run Python 2.7.

ln -s /opt/python27/bin/python /usr/bin/python27

-then you can just type python27 to get the Python you need and love! :-D

Afterwards, it's a good idea to quit being root user, and as a normal user set up a virtualenv that uses your new Python version only. I created my virtualenv in my /home/scott/code/py folder, and named it very originally, env27.

cd ~/code/py

virtualenv -p /opt/python27/bin/python env27

cd env27

source bin/activate

python

Python 2.7.3 (default, Jun 13 2012, 22:26:54)

[GCC 4.4.5] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>>

Monday, 4 June 2012

Why I Like Pylint!

I was playing around last week with some code checkers for Python. Three of the main ones seem to be PyChecker, Pylint and pyFlakes. The one I decided to go with was Pylint. Its package can be installed quickly via pip or easy_install.

The great thing I found about a code validator is that it points out the mistakes or things that are wrong or unnecessary in your code that you might have missed yourself. When a file gets above 250 lines for me, I have to admit I start to forget some of the things I might have written at the beginning. Pylint finds, for example, variables that are created or even assigned values, but never used. It also pointed out to me that it is dangerous to put a default arguments for a list in a method signature, e.g. def my_method(self, my_val=[]). (In this case, apparently it's better to be something like my_val=None.) I found I also have a nasty habit of making "wild imports" of the contents of modules when I'm tired, i.e. a statement like from my_module import *, which is bad! It's better to be specific, like from my_module import a, b, c.

The first thing with Pylint that I did was follow the easy tutorial on the official site. This gives you some common things to try out on the command line. I found this was much easier (especially when you're first learning) to run on the CLI, rather than using Pylint integrated with Eclipse and Pydev, which is my usual IDE of choice. You just run Pylint on a Python file and view the output, which typically is a long ASCII report with more information than you'll really care about. It does however, give you a sometimes funny comment on the quality of your code. My first result for this was:

Global evaluation
-----------------
Your code has been rated at -2.5/10

Great eh? :-D Encouraging for the first time validation of any of my Python scripts! After fixing five of the things it warned me about, I did get that rating back to 10/10.

Anyway, after playing around I found it was helpful to run files on the CLI with these two options:

pylint --reports=n --include-ids=y my_file.py

Then of course, I found there is a .pylintrc config file you can set in your home directory to contain the most common options you want Pylint to use by default every time you run it (so you won't have to put token options in your pylint command statement).

So, of course you create a .pylintrc config file in your home directory (or other location). You can copy the example .pylintrc in the Pylint source code examples directory and paste it, or run an rc file generation command (which I didn't do - so call me a geek, I'm starting to love reading the source of OSS programs).

vim ~/.pylintrc

In the REPORTS section, I altered these:

output-format=colorized

include-ids=yes

reports=no

The next time I ran Pylint on a file, it found the configuration file for my user and deployed using the options set in there. But then I also realised, sometimes I will want to run Pylint on all the Python files in a package or directory, not just one. So, I need a second .pylintrc file for these options because I want it to write text files containing the Pylint checking output.

cp ~/.pylintrc ~/.pylintrc2
vim ~/.pylintrc2

I went to the REPORTS section again, and altered these options:

files-output=yes
output-format=text

When running Pylint again, I needed it to point to the second .pylintrc file instead of the default, and I received text output files (one for each Python file) containing lots of lovely warning messages. I ran Pylint it on all the Python scripts in a directory (and its subdirectories!):

pylint --rcfile=~/.pylintrc2 my_python_dir

Well, now I've been using Pylint for a week, I can't imagine ever not using it! It's so helpful and invaluable at finding the bad things I miss myself. I guess it does have its downside, which is sometimes providing an overload of warnings about issues that won't actually stop a program running. It also can't find some things that are imported, e.g. functions from a wildcard import statement. However, it's fast and reliable and follows PEP8 recommendations. Merci beaucoup, Logilab Pylint people!