Saturday 7 July 2012

Pymongo and Datetimes for MongoDB

You might have read in an earlier post how I recommended storing datetime values in a database with a UTC timezone set. Well, I thought I'd go over doing this with a real database. MongoDB is quite a trendy NoSQL database at the moment, and one Python module you can use for accessing a MongoDB database is PyMongo.

Installing


Follow the steps on the MongoDB website for installing it in your OS (I'm only going through this for Linux, particularly Ubuntu). Once you've updated and altered your /etc/apt/sources.list, install it!
sudo apt-get install mongodb

Then check it's running. Hopefully, you should see an OK message.
sudo service mongodb status

Activate your virtualenv for your Python version, then install Pymongo.
source /pathtomyenv/bin/activate
pip install pymongo

I just want to sort out timezones!


OK, so here's a bit of background. Pymongo makes some helpful assumptions for you when dealing with timezone information for datetimes. It assumes you want to store datetimes in a MongoDB database with UTC timezone value, and will convert things automatically for you (although there are some options you can set). Here's a little diagram that hopefully illustrates the main ideas:



So let's see what happens when we create a datetime object with a local timezone set to it, and send it to MongoDB. I'm making a database named "school_info", with a collection inside it named "students". I want to insert a document for a student with a datetime object as a field value (for "birth_date"). I'm going to put a complete Python source listing at the end for you, but here is the relevant part first:

tzone = {"GMT": Zone(0, False, "GMT"),
            "NZST": Zone(12, False, "NZST"),
            "NZDT": Zone(12, True, "NZDT")
            }
student = {"first_name": "Jane",
    "birth_date": datetime(2003, 3, 17, tzinfo=tzone["NZDT"]),
}
...
# Open connection to MongoDB
conn = Connection()
# Database is named "school_info".
db = conn.school_info
# Insert into the "students" collection
db.students.insert(student)

This snippet will add the student record to the students collection, with the datetime value adjusted from local "NZDT" timezone to a "UTC" timezone value. When we retrieve the student document again from MongoDB, we should be able to see this:
student_check = db.students.find_one({"first_name": "Jane"})
print(student_check["birth_date"].strftime("%d/%m/%Y %H:%M:%S, timezone: %Z"))

But - the output is this!
16/03/2003 11:00:00, timezone:

-Note how there is no timezone assigned! It's a naive representation of a datetime WITHOUT its UTC timezone. The datetime value has gone back by the UTC offset value for the local timezone, which was +12+1 hours for NZDT. What makes this worse is that we can't convert this naive datetime (i.e. having no timezone assigned to it) to a local timezone-aware datetime.

The solution: try giving the retrieved datetime object a UTC timezone first, then try converting it to a local timezone value.
utc_b_date = student_check["birth_date"].replace(tzinfo=tzone["GMT"])

Then it SHOULD convert back to local the timezone:
local_b_date = utc_b_date.astimezone(tzone["NZDT"])
print(local_b_date.strftime("%d/%m/%Y %H:%M:%S, timezone: %Z"))

The output:
17/03/2003 00:00:00, timezone: NZDT

Here is the full code for this datetime/timezone storage example, as promised. (Scroll to the end of this example, and you'll see a further example which is a unittest to prove this is the way things happen.)

#!/usr/bin/env python


"""A script to show default MongoDB storage of datetime and timezone values if you don't calculate and explicitly set UTC values on datetimes before storing them. """


from datetime import datetime, tzinfo, timedelta

from pymongo import Connection


class Zone(tzinfo):
    """ Sets some properties for the tzinfo abstract class. """
    def __init__(self, offset, isdst, name):
        self.offset = offset
        self.isdst = isdst
        self.name = name
       
       
    def utcoffset(self, dt):
        return timedelta(hours=self.offset) + self.dst(dt)
       
       
    def dst(self, dt):
        if self.isdst:
            dst = timedelta(hours=1)
        else:
            dst = timedelta(0)
        return dst
           
           
    def tzname(self, dt):
         return self.name


if __name__ == "__main__":
    tzone = {"GMT": Zone(0, False, "GMT"),
            "NZST": Zone(12, False, "NZST"),
            "NZDT": Zone(12, True, "NZDT")
            }
    display_patn = "%d/%m/%Y %H:%M:%S, timezone: %Z"

    student = {"first_name": "Jane",
        "birth_date": datetime(2003, 3, 17, tzinfo=tzone["NZDT"]),
    }
    print("The student will be added with this birth_date:")
    print(student["birth_date"].strftime(display_patn))
    msg = """-Note: this will be stored by Pymongo in MongoDB automatically with
a UTC timezone\n"""
    print(msg)
   
    # Open connection to MongoDB
    conn = Connection()
    # Database is named "school_info". Clear the students collection first.
    db = conn.school_info
    db.students.drop()
    # Insert into the "students" collection
    db.students.insert(student)

    student_check = db.students.find_one({"first_name": "Jane"})
    print("When the student is retrieved from MongoDB, the birth_date is:")
    print(student_check["birth_date"].strftime(display_patn))
    msg = """-Note how there is no timezone assigned! It's a naive representation of a datetime WITHOUT its UTC timezone.\n"""
    print(msg)
   
    print("Try converting the retrieved value to the local timezone:")
    try:
        local_b_date = student_check["birth_date"].astimezone(
tzone["NZDT"])
    except ValueError as e:
        msg = """--> Error: {0}.\n-This causes an error because a naive datetime can't be converted to a timezoned datetime.\n""".format(str(e))
        print(msg)

    print("Now, try giving the retrieved value a UTC timezone first.")
    utc_b_date = student_check["birth_date"].replace(
tzinfo=tzone["GMT"])
    print("Then it SHOULD convert back to local the timezone.")
    local_b_date = utc_b_date.astimezone(tzone["NZDT"])
    print(local_b_date.strftime(display_patn))

The output:
The student will be added with this birth_date:
17/03/2003 00:00:00, timezone: NZDT
-Note: this will be stored by Pymongo in MongoDB automatically with a UTC timezone

When the student is retrieved from MongoDB, the birth_date is:
16/03/2003 11:00:00, timezone:
-Note how there is no timezone assigned! It's a naive representation of a datetime WITHOUT its UTC timezone.

Try converting the retrieved value to the local timezone:
--> Error: astimezone() cannot be applied to a naive datetime.
-This causes an error because a naive datetime  can't be converted to a timezoned datetime.

Now, try giving the retrieved value a UTC timezone first.
Then it SHOULD convert back to local the timezone.
17/03/2003 00:00:00, timezone: NZDT

Let's Get Testing


If we take a local-timezoned datetime, and convert it to a UTC-timezoned datetime, that UTC datetime value should match what MongoDB stores and returns to us later, right? Right! So let's prove it with a unit test case. We can pull information back out of MongoDB, then use operations to establish the local and UTC timezones calculated upon our datetime values.


#!/usr/bin/env python


"""A simple script to test datetime timezone storage, retrieval and conversion for Pymongo."""


import unittest
from datetime import datetime, tzinfo, timedelta

from pymongo import Connection


class Zone(tzinfo):
    """ Sets some properties for the tzinfo abstract class. """
    def __init__(self, offset, isdst, name):
        self.offset = offset
        self.isdst = isdst
        self.name = name
       
       
    def utcoffset(self, dt):
        return timedelta(hours=self.offset) + self.dst(dt)
       
       
    def dst(self, dt):
        if self.isdst:
            dst = timedelta(hours=1)
        else:
            dst = timedelta(0)
        return dst
           
           
    def tzname(self, dt):
         return self.name


class Tester(unittest.TestCase):


    def setUp(self):
        # Open connection to MongoDB
        conn = Connection()
        # Database is named "school_info"
        self.db = conn.school_info
       
       
    def tearDown(self):
        # Clear test data
        self.db.students.drop()
       
   
    def test_local_and_utc(self):
        """Test case! """
        tzone = {"GMT": Zone(0, False, "GMT"),
                "NZST": Zone(12, False, "NZST"),
                "NZDT": Zone(12, True, "NZDT")
                }
        display_patn = "%d/%m/%Y %H:%M:%S, timezone: %Z"

        # Local client timezone datetime
        local_b_date = datetime(2003, 3, 17, tzinfo=tzone["NZDT"])
        # Convert to a UTC timezone datetime for MongoDB storage. Because it
        # will be UTC already, it won't be converted to a different calculated
        # datetime by Pymongo.
        utc_b_date = local_b_date.astimezone(tzone["GMT"])
        student = {"first_name": "Jane",
                    "birth_date": utc_b_date,
        }
        # Insert into the "students" collection
        self.db.students.insert(student)
       
        # Retrieve student again for checking
        student_check = self.db.students.find_one({"first_name": "Jane"})
        # Set properly as UTC timezone
        utc_b_date_check = student_check["birth_date"].replace(
tzinfo=tzone["GMT"])
        # Then convert to local timezone
        local_b_date_check = utc_b_date.astimezone(tzone["NZDT"])
       
        # Check that retrieved timezone-aware data matches earlier data before storage
        self.assertEqual(local_b_date, local_b_date_check)
        self.assertEqual(utc_b_date, utc_b_date_check)
   
   
if __name__ == "__main__":
    unittest.main()

And the output:
.
----------------------------------------------------------------------
Ran 1 test in 0.004s

OK