Thursday 28 June 2012

Python, datetimes and timezones

Datetimes, UTC and Timezones


This week I was fortunate enough to be able to explore the issue of datetimes and making them timezone-aware in Python. I didn't want to use a ready-made module like pytz or dateutil to do this. My discoveries ended up not being used in production for my work, but they were very useful none the less!

The conventional wisdom is that datetime information for applications should be stored according to the Universal Coordinated Time (UTC) standard. This is especially important for a website with international users, and the storage of its datetime information in a database. When the data is first captured, say by a form being filled out, the data is probably in the client's local time. Then it gets stored in in UTC format internally by the application. When later it needs to be retrieved, it is converted back to the client's local time. For example:
  • a user's client sends datetime data from the "NZDT" timezone
  • the app stores a calculated datetime in the database in the "UTC" timezone 
  • when the user wants the data back, it is converted back again to a datetime for the local "NZDT" timezone. 
In this way, there will always be a consistent timezone for the storage of data, and a relevant timezone for retrieved data to be displayed in.

Timezones are an interesting thing, and they are based around the idea of offsets from the UTC (aka GMT) datetime value, which is the datetime always used as a base. As an example, the normal timezone for New Zealand is "NZST", which has an offset of +12 hours. You can find a list of timezone codes here. Timezones are also further complicated by the idea of Daylight Savings time, which may add or subtract additional hours to the UTC offset. Once Daylight Savings kicks in in my country, the timezone becomes "NZDT" and the offset becomes +12+1 hours.

Apply this to Python


Python has an abstract class named tzinfo, which you are meant to implement yourself for setting timezones on datetime.datetime objects. After reading the Python docs for tzinfo online, I was a bit puzzled about how exactly to do this. Luckily, I found a great answer on StackOverflow (like usual) and proceeded from there. Here it is:
class Zone(tzinfo):
    """ Sets some properties for the tzinfo abstract class. """
    def __init__(self, offset, isdst, name):
        self.offset = offset
        self.isdst = isdst
        self.name = name
       
       
    def utcoffset(self, dt):
        return timedelta(hours=self.offset) + self.dst(dt)
       
       
    def dst(self, dt):
        if self.isdst:
            dst = timedelta(hours=1)
        else:
            dst = timedelta(0)
        return dst
           
           
    def tzname(self, dt):
         return self.name

 I found the Daylight Savings start and end dates for my country, New Zealand, that I would need as well for calculations. These rules stated that:
"Daylight Saving commences on the last Sunday in September, when 2.00am becomes 3.00am, and it ends on the first Sunday in April, when 3.00am becomes 2.00am."

So, here is my implemention of the timezones I needed in a dictionary. This is for NZ time without daylight savings (NZST), NZ time including daylight savings (NZDT), and UTC time (UTC).

 tzone = {"GMT": Zone(0, False, "GMT"),
            "NZST": Zone(12, False, "NZST"),
            "NZDT": Zone(12, True, "NZDT")
            }

So, say you had an ordinary datetime value like:
dt = datetime.datetime(2012, 6, 28, 0, 0, 0)
This is a timezone-naive datetime object, since there is nothing set for the optional parameter at the end, "tzinfo". To make it timezone aware for my local timezone, NZDT, you'd do as follows:
dt = dt.replace(tzinfo=tzone["NZDT"])
Then to get this back as UTC datetime:
utc_dt = dt.astimezone(tz=tzone["GMT"])

To verify this (remember the local offset is +12 hours):
print(dt.strftime("%d/%m/%Y %H:%M:%S %Z"))
28/06/2012 00:00:00 NZST
print(utc_dt.strftime("%d/%m/%Y %H:%M:%S %Z"))
27/06/2012 12:00:00 GMT

Next, I wrote two functions that would give me the datetime values for the start and end of the daylight savings period for any given year, according to those pesky rules above. Then I thought: "How the heck am I going to test that my program can set a datetime with a correct timezone based on a correct daylight savings setting?" A divine inspiration hit me, and I decided to loop through all the days in the year 2012, and for each day: do just that!

For each day between 1/1/2012 till 31/12/2012 (inclusive), my script would calculate two datetime objects: one for the datetime at midnight (00:00:00) and one for after 3am, when Daylight Savings might have kicked in. And it did just that.

So, I added some more calculations to turn each of those local datetimes into datetimes with the timezone set as UTC.  This was so I could check that UTC conversion worked too. Below is the script in full, gentle reader.

#!/usr/bin/env python
#-*- coding: utf-8 -*-


from datetime import datetime, tzinfo, timedelta
from calendar import monthrange as cal_monthrange


class Zone(tzinfo):
    """ Sets some properties for the tzinfo abstract class. """
    def __init__(self, offset, isdst, name):
        self.offset = offset
        self.isdst = isdst
        self.name = name
       
       
    def utcoffset(self, dt):
        return timedelta(hours=self.offset) + self.dst(dt)
       
       
    def dst(self, dt):
        if self.isdst:
            dst = timedelta(hours=1)
        else:
            dst = timedelta(0)
        return dst
           
           
    def tzname(self, dt):
         return self.name
        
        
def get_start_month_get_last_sunday(year, month, NZDT, get_naive=True):
    """ Returns a datetime.datetime object for the start of the Daylight Savings period for NZ. """
    days_in_month = cal_monthrange(year, month)[1]
    days = [datetime(year, month, day) for day in range(20, days_in_month + 1)]
    days = [day for day in days if day.weekday() % 6 == 0]  # filter
    last_sunday = days[-1]
    if get_naive:
        ret_val = datetime(year, month, last_sunday.day, 2, 0, 0)
    else:
        ret_val = datetime(year, month, last_sunday.day, 2, 0, 0, tzinfo=NZDT)
    return ret_val


def get_end_month_first_sunday(year, month, NZST, get_naive=True):
    """ Returns a datetime.datetime object for the end of the Daylight Savings period for NZ. """
    days = [datetime(year, month, d) for d in range(1, 8)]
    days =[day for day in days if day.weekday() % 6 == 0]  # filter
    first_sunday = days[0]
    if get_naive:
        ret_val = datetime(year, month, first_sunday.day, 3, 0, 0)
    else:
        ret_val = datetime(year, month, first_sunday.day, 3, 0, 0, tzinfo=NZST)
    return ret_val
   

def is_dst(dt, tzone, dst_start, dst_end):
    """ Returns True if the datetime is within the Daylight Savings period. """
    if dt > dst_start:
        dst = True  # In DST zone for next year
    elif dt > dst_end:
        dst = False # Not in DST zone for current year
    else:
        dst = True  # In DST zone started in previous year
    return dst
   

def get_local_zoned_dt(dt, tzone, dst_start, dst_end):
    """ Returns the datetime.datetime object with its tzinfo assigned, based on the daylight savings period. """
    return (dt.replace(tzinfo=tzone["NZDT"])
            if is_dst(dt, tzone, dst_start, dst_end)
            else dt.replace(tzinfo=tzone["NZST"]))
           
           
def create_csv_for_year(start, end, dst_start, dst_end, tzone):
    """ Loop through the datetimes in a datetime range. Write data to a CSV file for inspection. """
    display_patn = "%d/%m/%Y %H:%M:%S %Z"
    one_day = timedelta(days=1)
    after_3am = timedelta(hours=3, minutes=1)
    logf = open("data/log.csv", "w")
    hdrs = "Local DT,UTC DT,Later DT,Later UTC DT"
    logf.write(hdrs + "\n")
    ind = start
    while ind <= end:
        # Get datetime values for local and UTC time.
        ind = get_local_zoned_dt(ind, tzone, dst_start, dst_end)
        ind_utc = ind.astimezone(tz=tzone["GMT"])
        # Get datetime values for local and UTC time, 3 hours later.
        later_dt = ind + after_3am
        later_dt = get_local_zoned_dt(later_dt, tzone, dst_start, dst_end)
        later_dt_utc = later_dt.astimezone(tz=tzone["GMT"])
       
        row = [ind.strftime(display_patn),
                ind_utc.strftime(display_patn),
                later_dt.strftime(display_patn),
                later_dt_utc.strftime(display_patn),
                ]
        logf.write(",".join(row) + "\n")
        ind += one_day
    logf.close()


if __name__ == "__main__":
    tzone = {"GMT": Zone(0, False, "GMT"),
            "NZST": Zone(12, False, "NZST"),
            "NZDT": Zone(12, True, "NZDT")
            }
    # Set timezone naive query start and end parameters.
    start = datetime(2012, 1, 1, 0, 0, 0)
    end = datetime(2012, 12, 31, 23, 59, 59)   
    # Get timezone naive datetime values for the start and end of the Daylight
    # Savings Period.
    dst_start_naive = get_start_month_get_last_sunday(start.year, 9, None, get_naive=True)
    dst_end_naive = get_end_month_first_sunday(start.year, 4, None, get_naive=True)
   
    # Get local datetime values for query start & end.
    start = get_local_zoned_dt(start, tzone, dst_start_naive, dst_end_naive)
    end = get_local_zoned_dt(end, tzone, dst_start_naive, dst_end_naive)
    # Get local datetime values for Daylight Savings start & end.
    dst_start = get_local_zoned_dt(dst_start_naive, tzone, dst_start_naive, dst_end_naive)
    dst_end = get_local_zoned_dt(dst_end_naive, tzone, dst_start_naive, dst_end_naive)
    # Write data for datetimes in the query range to a CSV file.
    create_csv_for_year(start, end, dst_start, dst_end, tzone)

No comments:

Post a Comment