Understanding Test Driven Development with Django

Test-driven Development (TDD) has been getting a lot of attention these days. While I understand the importance of testing, I was sceptical of Test-Driven Development for a long time. I mean, why not Development-Driven Testing or Develop Then Test Later? I thought figuring out the tests before one can write even a single line of code would be impossible.

I was so wrong.

Test-driven Development will help you immensely in the long run. We will soon see how. We will approach TDD from a sceptical viewpoint and then try to create a simple URL shortener like bit.ly using TDD. I will conclude with my evaluation of the pros and cons of the technique; to which you may directly jump to as well.

What is TDD?

Test-driven development (TDD) is a form of software development where you first write the test, run the test (which will fail first) and then write the minimum code needed to make the test pass. We will elaborate on these steps in detail later but essentially this is the process that gets repeated.

This might sound counter-intuitive. Why do we need to write tests when we know that we have not written any code and we are certain that it will fail because of that? It seems like we are not testing anything at all.

But look again. Later, we do write code that merely satisfies these tests. That means that these tests are not ordinary tests, they are more like Specifications. They are telling you what to expect. These tests or specifications will directly come from your client’s user stories. You are writing just enough code to make it work.

For instance, your client needs a website (in Django) with a Article model that specifies a headline and a body. The very first thing to do in TDD would be to write a test which checks this specification. Then you run the test. Watch it fail. Then write three lines of code in models.py to create an Article class with a headline and body.

But wait a minute. Isn’t this cheating? Your client actually wanted you to create a Blog application. But your three lines are no where near the functionality of a blog. This is an uncomfortable thought that bothers you when you start with TDD.

Just ignore it for now. It is not that much of a big deal. In fact, even your client wouldn’t be too bothered about it. They would be happy that you are working on their application one spec at a time. They can actually watch the progress (and trust me they love that, especially management).

In any form of engineering, this is a common way of building things - there is a plan and we build it bit-by-bit based on it. Before a building is constructed, a blueprint is drawn and progress is made floor-by-floor. Prior to every mobile phone being manufactured, there are detailed CAD models to ensure that every part fits each other perfectly. So why should software engineering be any different?

How to do TDD?

Now, let’s see how test-driven development is done. Just follow this simple procedure:

Decide what the code will do: This is usually told to you by the client.
Write a test: The test should pass only if the code does that.
Run the test: It will fail.
Write some code: Just enough to make it pass.
Run the test: If it fails go to step 4.
Refactor the code: Tests will ensure that it doesn’t break specs.
Rinse and Repeat: Take another spec/user-story/feature and go to step 1.

Now that you know that tests are like specifications, you might be seeing some method in this madness. But this method is more familiar to you than you might think.

This is, in fact, quite similar to the Scientific method which is the basis of modern science. Let’s recollect what scientific method teaches us:

Define a question
Gather information and resources (observe)
Form an explanatory hypothesis
Test the hypothesis by performing an experiment and collecting data in a reproducible manner
Analyse the data
Interpret the data and draw conclusions that serve as a starting point for new hypothesis (go to step 3)
Publish results
Retest (frequently done by other scientists)

Compare this with the previous steps for TDD and notice the similarities. Tests in TDD take the role of experiments in Science. Your theory is only good if the experiments are repeatable and verifiable. You will see that the same will hold good for tests in your project’s source code when you work with other developers.

In fact, let’s look at collaborative software development happening in large open source projects. Almost all of them would have a good collection of tests. While writing test cases seem like a good idea, are there any good reasons to write tests before code?

Why do TDD?

So it seems that that TDD is not a arbitrary practice after all. But we still don’t have to follow it. There are plenty of ways to develop software.

But TDD does come with its own set of advantages, some of which are obvious and some which are not:

It is live documentation that grows, lives and changes with your code.
Improves design
Catches future errors
Long-term time savings
Reduces technical debt and hence risk
Avoid manual one-off tests. Eventually, you will add and re-add test data to test by hand. Too hackey.

These are advantages gathered from various developers who practice TDD. Each of them merit a detailed explanation. But to summarise, TDD brings with it a lot of benefits of testing by making it a mandatory part of your development cycle. You might think that you will add tests later, but sometimes you never get around to doing it.

Code written by passing simple, focused test cases tend to be more modular and hence better designed. It is a pleasant side effect of the process but you will certainly notice it.

How is TDD done?

To understand TDD better we will try to implement an entire Django project by writing test cases first and then the code. We will be creating a URL shortening services which takes a long url and converts it into a short one (possibly for fitting into a twitter message).

You can also watch the screencast below to see how the site is created.

User Stories

Imagine that after a long call with your client, you have distilled their needs into the following user stories:

The short URL must be always smaller than the original URL.
If you give the short URL, you must be able to recover the original URL.
The home page must have a form to enter the long URL.
Submitting the home page form should show the short URL.
Clicking on the short URL must redirect to the original (long) URL.

We have also roughly ordered the stories so that the core functionality comes first.

Create the Project

Note: You will need at least Python 2.7 and Django 1.6 to follow the rest of this post. Earlier versions of Python and Django had some differences in unit testing tools.

Typically URL shortener sites will have a short name like http://ti.ny. So let’s call our project tiny

django-admin.py startproject tiny
cd tiny
./manage.py startapp shorturls

Configure DATABASES in tiny/settings.py to use a simple file-based sqlite3 database and add the app shorturls in INSTALLED_APPS as well. Next, synchronise the database with:

./manage.py syncdb

Writing the First Test

Add your first test case to shorturls/tests.py:


from django.test import TestCase
from .models import Link


class ShortenerText(TestCase):
    def test_shortens(self):
        """
        Test that urls get shorter
        """
        url = "http://www.example.com/"
        l = Link(url=url)
        short_url = Link.shorten(l)
        self.assertLess(len(short_url), len(url))

This test creates a simple Link object given a (long) URL. It creates a short URL by using a class method called shorten() and asserts that it is shorter in length. Note that we are not saving the Link object into the database. Whenever a test can avoid touching the database, it must jump at the opportunity. Django uses an in-memory database while testing but even that can take some time. Faster the unit tests are, the more likely you are to use them.

Also, notice that we get a better error message when we use the assert...() functions (see the Sidenote below).

Run:

./manage.py test shorturls

As expected, it fails. Now build a model for this to work in shorturls/models.py:

from django.db import models

class Link(models.Model):
    url = models.URLField()

    @staticmethod
    def shorten(long_url):
        return ""

We are cheating by returning a zero length string but we know that this will pass the test.

./manage.py test shorturls

Sidenote: Choosing the right assertion

There are several assert functions provided by the unittest module in Python and several more provided by Django in the TestCase class. Initially, they might feel redundant. After all, assert is a keyword built into Python. Every conceivable assertion function can be replaced by an assert keyword checking for the truth of a Python expression.

The real difference is when the assertion fails. Here are three equivalent assertions along with the typical error message when it fails. Compare them yourself:

    assert len(url) < len(docs_url)
AssertionError

    self.assertTrue(len(url) < len(docs_url))
AssertionError: False is not true

    self.assertLess(len(url), len(docs_url))
AssertionError: 101 not less than 58

Clearly, the self.assert...() functions have a clearer error message. So it is worthwhile to familiarise ourselves with the assert API unless you plan to you something like pytest.

Second Test: Recovering the url

“But you wouldn’t clap yet. Because making something disappear isn’t enough; you have to bring it back. That’s why every magic trick has a third act, the hardest part, the part we call “The Prestige”.” — Christopher Priest, The Prestige

Add another test to shorturls/tests.py:

def test_recover_link(self):
    """
    Tests that the shortened then expanded url is the same as original
    """
    url = "http://www.example.com/"
    l = Link(url=url)
    short_url = Link.shorten(l)
    l.save()
    # Another user asks for the expansion of short_url
    exp_url = Link.expand(short_url)
    self.assertEqual(url, exp_url)

Our bluff gets called. We now need a real way to shorten urls and recover them. Unlike what might come to you intuitively, the URL is not mapped to a more compact encoding like a zip file. The allowable character set for a URL is pretty limited. Any kind of ‘string compression’ would reach its limits.

Instead we do something very simple. We know that, once saved into the database, each Link object can be uniquely identified by an integer - its primary key. Simple add this primary key to the domain’s URL and we have a short URL that can be mapped back to the original URL with a database lookup.

Change shorturls/models.py to this:

from django.db import models

class Link(models.Model):
    url = models.URLField()

    @staticmethod
    def shorten(link):
        l, _ = Link.objects.get_or_create(url=link.url)
        return str(l.pk)

    @staticmethod
    def expand(slug):
        link_id = int(slug)
        l = Link.objects.get(pk=link_id)
        return l.url

Now the tests should pass.

Third Test: Home Page With a Form

Add to shorturls/tests.py:

from django.core.urlresolvers import reverse
...

def test_homepage(self):
    """
    Tests that a home page exists and it contains a form.
    """
    response = self.client.get(reverse("home"))
    self.assertEqual(response.status_code, 200)
    self.assertIn("form", response.context)

This test fails because we don’t have any views mapped in our urls.py. In the spirit of minimum effort, let’s use Django’s class based view. Since the submission of a form would create a new Link object, let’s use a CreateView instead of a TemplateView. A CreateView will generate the form for free and it will be useful later on.

Replace contents of shorturls/views.py with:

from django.views.generic.edit import CreateView
from .models import Link


class LinkCreate(CreateView):
    model = Link
    fields = ["url"]

Create shorturls/templates/shorturls/link_form.html:

<form method="post">{% csrf_token %}
{{ form.as_p }}
<input type="submit" value="Shorten" />
</form>

Replace contents of tiny/urls.py with:

from django.conf.urls import patterns, include, url
from shorturls.views import LinkCreate

urlpatterns = patterns('',
    url(r'^$', LinkCreate.as_view(), name='home'),
)

Now the test should pass.

Fourth Test: Form Returns a Short URL

Add to shorturls/tests.py:

def test_shortener_form(self):
    """
    Tests that submitting the forms returns a Link object.
    """
    url = "http://example.com/"
    response = self.client.post(reverse("home"),
                                {"url": url}, follow=True)
    self.assertEqual(response.status_code, 200)
    self.assertIn("link", response.context)
    l = response.context["link"]
    short_url = Link.shorten(l)
    self.assertEqual(url, l.url)
    self.assertIn(short_url, response.content)

(This test is designed to work with Django URLField’s default behaviour to add trailing slashes. This needs to be agreed with your client, of course. In my case, I simply had to ask the mirror.)

Now we need to think of what gets shown when the form is submitted. Obviously, there would be the short URL. Once again we can use a ready made class based view for this - the DetailView.

Add to shorturls/views.py:

from django.views.generic import DetailView
...

class LinkCreate(CreateView):
    model = Link
    fields = ["url"]

    def form_valid(self, form):
        # Check if the Link object already exists
        prev = Link.objects.filter(url=form.instance.url)
        if prev:
            return redirect("link_show", pk=prev[0].pk)
        return super(LinkCreate, self).form_valid(form)


class LinkShow(DetailView):
    model = Link

Change tiny/urls.py to:

from shorturls.views import LinkCreate
from shorturls.views import LinkShow

urlpatterns = patterns('',
    url(r'^$', LinkCreate.as_view(), name='home'),
    url(r'^link/(?P<pk>\d+)$', LinkShow.as_view(), name='link_show'),
)

Now add the template for the DetailView. Create shorturls/templates/shorturls/link_detail.html with the following contents:

<p> Short Link: /r/{{ object.id }}
<p> Original Link: {{ object.url }}

Note that we haven’t created the short link redirection code for /r/ yet.

Only hitch we have now is that Django doesn’t know where to go after the form is submitted. One way to solve that is by adding get_absolute_url() to the Link model.

Change shorturls/models.py to:

from django.db import models
from django.core.urlresolvers import reverse


class Link(models.Model):
    url = models.URLField()

    def get_absolute_url(self):
        return reverse("link_show", kwargs={"pk": self.pk})

Now if the form is submitted, it gets redirected to the new DetailView. The tests should pass.

Fifth Test: Short URL Must Redirect To The Long URL

Our next and final test actually tests if the short URLs work:

def test_redirect_to_long_link(self):
    """
    Tests that submitting the forms returns a Link object.
    """
    url = "http://example.com"
    l = Link.objects.create(url=url)        
    short_url = Link.shorten(l)
    response = self.client.get(
        reverse("redirect_short_url",
                kwargs={"short_url": short_url}))
    self.assertRedirects(response, url)

The final bit of the puzzle is the reverse lookup or redirected the user with the short URL to the original URL. As you might have guessed, time for yet another class based view - RedirectView.

Add to shorturls/views.py:

from django.views.generic.base import RedirectView
...


class RedirectToLongURL(RedirectView):

    permanent = False

    def get_redirect_url(self, *args, **kwargs):
        short_url = kwargs["short_url"]
        return Link.expand(short_url)

Add to tiny/urls.py:

from shorturls.views import RedirectToLongURL
...

    url(r'^r/(?P<short_url>\w+)$', RedirectToLongURL.as_view(),
              name='redirect_short_url'),

Now the tests should pass.

Refactoring For Shorter URLs

We will now see how tests prevent regression. The short URLs we generate are nothing but primary keys of Link instances in the database. This scheme works swimmingly well until a smart guy notices that we are wasting too many characters compared to other URL shorteners.

A shortener like bit.ly creates short urls which are a mix of alphabets (lower and upper case), numbers and symbols (like hyphens and underscores). However, we use only numbers. Mathematically speaking, we can use a base higher than 10 if we have more than ten symbols. This can lead to shorter numbers. For e.g. the decimal “255” can be represented as “FF” in hexadecimal, which saved 1 byte!

Notice that we have deliberately avoided testing the short URL format. This gives us flexibility to use any short URL representation we like and we are not limited to just decimals. So we create a module which can convert a decimal to a higher base with more symbols and back.

Create a new file shorturls/basechanger.py with the following contents:

CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGUIJKLMNOPQRSTUVWXYZ"
BASE = len(CHARS)


def decimal2base_n(n):
    if n >= BASE:
        return decimal2base_n(n // BASE) + CHARS[n % BASE]
    else:
        return CHARS[n]


def base_n2decimal(n):
    if len(n) > 1:
        return base_n2decimal(n[:-1]) * BASE + CHARS.index(n[-1])
    else:
        return CHARS.index(n[0])

If you suspect anything wrong, just play along ;) . Since all the URL shortening and expansion logic resides in the models (as it should), we need to change that as well.

Change shorturls/models.py to:

from django.db import models
from django.core.urlresolvers import reverse
from .basechanger import decimal2base_n, base_n2decimal


class Link(models.Model):
    url = models.URLField()

    def get_absolute_url(self):
        return reverse("link_show", kwargs={"pk": self.pk})

    @staticmethod
    def shorten(link):
        l, _ = Link.objects.get_or_create(url=link.url)
        return str(decimal2base_n(l.pk))

    @staticmethod
    def expand(slug):
        link_id = int(base_n2decimal(slug))
        l = Link.objects.get(pk=link_id)
        return l.url

Now running your tests should work as if nothing happened. Your smart guy will be pleased that the short URLs are now shorter. It is a win-win, folks!

Note that we have not removed the hard-coded short URL path in our DetailView template. Let’s do that as well.

First, add a method to shorturls/models.py:

def short_url(self):
    return reverse("redirect_short_url",
                   kwargs={"short_url": Link.shorten(self)})

Change shorturls/templates/shorturls/link_detail.html:

<p> Short Link: <a href="{{ object.short_url }}">{{ object.short_url }}</a>
<p> Original Link: {{ object.url }}

Subtle Bug: More Tests Needed

Everything goes great for a while until user testing, when you get a strange bug report:

Critical: Some short URLs get redirected to the wrong site.

By following TDD, the only way to fix your code would be to create a test to reproduce your bug. Since the problem only happens after a certain number of short URLs are created, you design a test to create a large number of short URLs and compare it with the original URL.

Add a new test case to shorturls/tests.py:

import random, string
...


def test_recover_link_n_times(self):
    """
    Tests multiple times that after shortening and expanding
    the original url is recovered.
    """
    TIMES = 100
    for i in xrange(TIMES):
        uri = "".join(random.sample(string.ascii_letters, 5))
        url = "https://example.com/{}/{}".format(i, uri)
        l = Link.objects.create(url=url)
        short_url = Link.shorten(l)
        long_url = Link.expand(short_url)
        self.assertEqual(url, long_url)

Running the test gives you a mysterious error.

AssertionError: 'https://example.com/55/KPAOz' != u'https://example.com/42/mDcHw'

The hint of what is wrong is in the URLs themselves. After running a debugger, you realise that the 55th character is same as the 42nd character - the symbol ‘H’. A simple typo that was actually overlooked when I was writing this article.

I learnt two lessons from this. First, to never underestimate testing. The more tests you can write, the better your code becomes. Second, to never attempt to list the alphabets by hand. We are no longer in kindergarten and we are certainly not expected to remember all of it. That’s why Python has string module with such silly strings kept ready for you.

So the fix is a simple change in the first line of shorturls/basechanger.py:

import string
CHARS = string.digits+string.ascii_lowercase+string.ascii_uppercase

So once again, we have a 100% success rate in our tests.

The completed source code can be found at Github

So What’s Wrong With TDD?

The biggest problem I faced while learning TDD was wearing two different hats alternatingly. The Tester Hat first makes you think the most specific way to break your code and the Code Hat wants you to write terse code that works in the most general way possible. Perhaps, after some practice, the cognitive load was greatly reduced. But, I would leave you with this pithy statement for some thought:

“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” — Brian Kernighan

Of course, TDD practitioners themselves ackowledge that TDD is not for exploratory programming. In other words, you cannot create a map if you don’t know where you are or where you want to go. It is also expected that you have a certain amount of expertise in the language (in this case, Python/Django), to clearly see how the test case should be written in advance. So I wouldn’t expect a beginner to learn TDD along with learning programming.

It, of course, goes without saying due to its counter intuitive nature, it takes some time to get comfortable with TDD.

Conclusion

TDD is a design technique that needs a little bit of extra time for planning ahead. Some studies put this between 15-35% increase in development time. So, I wouldn’t use it for one time scripts or a quick work. But I will strongly consider it whenever I need to build production quality sites.

I might also not consider it for version 1 of something that I am working on. Maybe version 2 onwards, when the development will get faster and I am more familiar with the domain.

It doesn’t necessarily improve the client’s experience. If they are more interested in development time rather than the quality of code, then they might not appreciate TDD designed code itself. This is when BDD is much more effective in engaging them and “showcasing” the effectiveness of the methodology.

Django tests are pretty fast. So I am not too slowed down by the runs. I work on the models first, then the views, then the system tests and finally the templates. I prefer smaller, faster, non-redundant, black-boxy and functionality-oriented tests. Perhaps you want to read that again. It takes a while to design a good test case.

However, Pytest is a better option without learning all the legacy JUnit functions. It is a lot better that the default Django testing tool and I would prefer to use Pytest as my default test runner.

TDD relies on the inherent human nature to fix things. Writing tests are important but not essential. We might forget them. TDD will actually show a ‘broken’ status if any test fails, this is a great incentive to fix tests.

I tend to make a lot of design decisions while writing test cases. This is a good side effect of TDD. Lot of corner case which are missed while coding get importance upfront and by the time you are coding you are aware of them, rather than other way around. This is why the design is better for TDD.

It takes a lot of time to be good at TDD.

Comments →

An Easy Guide to Install Python or Pip on Windows

[Updated 2017-04-15] These steps might not be required in latest Python distributions which are already shipped with pip.

I enjoy studying other people’s development environments. They give me plenty of insights into their aesthetic choices and their productivity shortcuts. But I have seen a lot of horrible environments cobbled together by flaky batch files that are one update away from blowing up, often leaving no option but to reinstall everything from scratch. Unfortunately, they have all been Windows environments.

Recently, I tried to install Python and pip on a Windows laptop. Though I have installed Python on Windows XP and Windows Servers for several years till 2010; there have been a lot of changes both in the Windows world such as Powershell and the Python world. So there is a lot of confusing information out there for the Python beginner. And then there is Python 3. Sigh!

Suffice to say I wanted to share what I learnt in a video aimed at the Python beginner or anyone struggling to install Python on Windows. The video and the high-level transcript follows:

What is covered?

Installing Python 2 or Python 3 on Windows 7
Installing pip from PowerShell
Setting up a virtualenv
Installing via pip

How to install Python / Pip on Windows 7 (or 8)

Download the MSI installer from http://www.python.org/download/. Select 32 bit or 64 bit based on the System Settings which opens by pressing Win+Break
Run the installer. Be sure to check the option to add Python to your PATH while installing.
Open PowerShell as admin by right clicking on the PowerShell icon and selecting ‘Run as Admin’
To solve permission issues, run the following command:
```
Set-ExecutionPolicy Unrestricted
```
Enter the following commands in PowerShell to download the bootstrap scripts for easy_install and pip:
```
mkdir c:\envs
cd c:\envs

(new-object System.Net.WebClient).DownloadFile('https://bootstrap.pypa.io/ez_setup.py',   'c:\envs\distribute_setup.py')

(new-object System.Net.WebClient).DownloadFile('https://raw.github.com/pypa/pip/master/contrib/get-pip.py', 'c:\envs\get-pip.py')

python c:\envs\distribute_setup.py
python c:\envs\get-pip.py
```
Once these commands run successfully, you can delete the scripts get-pip.py and distribute_setup.py

HTTP Issue with distribute_setup.py?
[Updated 2015-03-05] The script distribute_setup is no longer available at its old location. Thanks to Rudhra for sharing the new location.
Now typing easy_install or pip should work. If it doesn’t it means the Scripts folder is not in your path. Run the next command in that case (Note that this command must be run only once or your PATH will get longer and longer). Make sure to replace c:\Python33\Scripts with the correct location of your Python installation:
```
setx PATH "%PATH%;C:\Python33\Scripts"
```
Close and reopen PowerShell after running this command.
To create a Virtual Environment, use the following commands:
```
cd c:\python
pip install virtualenv
virtualenv acme
.\acme\Scripts\activate.ps1
pip install IPython
ipython3
```
That’s it! I suppose the same steps work for Windows 8 as well. But I don’t have a machine to try it out. Do let me know if this worked for you.

Comments →

Binary Clock - Django Nibbles

Django can be used to build great websites. But it is also really good for solving small problems quickly. Introducing a new series called “Django Nibbles”, to help you learn an aspect of Django say Templates through a short and simple problem. You can either solve the problem yourself or follow my step-by-step solution.

Q. Create a page that displays a binary clock showing the current time (when the page was loaded).

For instance, if the time is “23:55:02” then we should see:

○ ○ ○ ○ ○ ○
○ ○ ● ● ○ ○
● ● ○ ○ ○ ●
○ ● ● ● ○ ○

Each column represents a binary digit when read from top to bottom. For more details, read the Binary Clock wiki page.

Django Feature: Views

Time Given: 1 Hour

Go ahead. Try it yourself before reading further!

A. We will be using Python 3 (not 2.7) and Django 1.6 for solving this. Both are the latest versions at the time of writing.

Setting up the project

This section can be skipped if you know the basics of setting up a project in Django (which is simpler in Django 1.6)

On Linux command-line (indicated by the ‘$’ prompt), enter the following to create the binclock project:

$ cd ~/projects
$ django-admin.py startproject binclock
$ cd binclock

Getting the current time

Open binclock/urls.py and replace everything with these lines:

from django.conf.urls import patterns, url

urlpatterns = patterns('',
    url(r'^$', 'binclock.views.home', name='home'),
)

Create a new file binclock/views.py with the following lines:

from django.http import HttpResponse
from django.utils.timezone import localtime, now

def home(request):
    clock = str(localtime(now()))
    return HttpResponse(clock, content_type="text/plain; charset=utf-8")

The home view function returns the current time in plain text. We are importing the now function from django.utils.timezone to get the current time in a timezone-aware format. Calling localtime converts it to the timezone defined in the project’s settings.py file.

Invoking HttpResponse gives you a low-level control over the exact string to return to the browser i.e. the first argument. You can also mention the content-type which, by default, is set to HTML i.e. “text/html; charset=utf-8”. Plain text is good enough for our purpose. We will be using unicode characters, so we need to mention the charset as well.

Try running the server now:

$ ./manage.py runserver

If you open the browser and point it to the default URL http://127.0.0.1:8000/, you should see the date and time displayed. Unless you are in the UTC timezone, this might be different from your current timezone.

Let’s change this to your current timezone (find yours from the timezone list). Open binclock/settings.py and find the line starting with TIME_ZONE and change it your timezone. Since I live in Bangalore, my line would look like this:

TIME_ZONE = 'Asia/Calcutta'

Now refresh the browser page and you should see your current time.

Converting to binary

We will convert to the binary format step-by-step. Notice that each digit has to be converted in Packed BCD format rather than its binary form. For e.g. 12 in binary is 1100 but we need it in BCD form, which is 0001 0010.

Python prompt will be indicated by “»>” (actually, I had used IPython). Open up the shell using ./manage.py shell command:

>>> def bcd_digit(x):
...     return [(x & 8) >> 3, (x & 4) >> 2, (x & 2) >> 1, x & 1]
... 
>>> bcd_digit(5)
[0, 1, 0, 1]

We are using Python’s bitwise operators ‘&’ and ‘»’ here. To extract the four rightmost bits, we use a bitmask and then shift the result right. We can convert a single decimal digit to a list of binary digits using bcd_digit. But numbers in a clock can have upto two digits.

>>> def bcd_2digits(x):
...     return [bcd_digit(x // 10), bcd_digit(x % 10)]
... 
>>> bcd_2digits(12)
[[0, 0, 0, 1], [0, 0, 1, 0]]

Now we can covert two digit numbers. Let’s try to convert an actual datetime object into binary digits.

>>> from datetime import datetime
>>> def bcd_hhmmss(dt):
...     return bcd_2digits(dt.hour) + bcd_2digits(dt.minute) + \
...         bcd_2digits(dt.second)
... 
>>> t = bcd_hhmmss(datetime(2013,11,16,23,55,2))
>>> t
[[0, 0, 1, 0], [0, 0, 1, 1], [0, 1, 0, 1], [0, 1, 0, 1], [0, 0, 0, 0], [0, 0, 1, 0]]

This is close to the output we want. But the rows have to converted to columns (by transposition). We also need to convert the numbers into unicode characters, for a better appearance.

>>> char_one = chr(0x25CF)
>>> char_zero = chr(0x25CB)
>>> def disp_unicode(clock_matrix):
...     return "\n".join(" ".join(char_one if p == 1 else char_zero for p in line
...                               ) for line in list(zip(*clock_matrix)))
... 
>>> disp_unicode(t)'○ ○ ○ ○ ○ ○\n○ ○ ● ● ○ ○\n● ● ○ ○ ○ ●\n○ ● ● ● ○ ○'
>>> print(disp_unicode(t))
○ ○ ○ ○ ○ ○
○ ○ ● ● ○ ○
● ● ○ ○ ○ ●
○ ● ● ● ○ ○

The disp_unicode function uses a lot of Python tricks. We can find the transpose of any matrix using the zip(*matrix) trick. We are also using nested list comprehensions to iterate row by row and then column by column. The inline if statement (also called a conditional expressions) transforms bits into pretty unicode characters.

Finally, we tie this to our view function in views.py:

def home(request):
    clock = disp_unicode(bcd_hhmmss(localtime(now())))
    return HttpResponse(clock, content_type="text/plain; charset=utf-8")

The full source can be found in github.

Now, try to enjoy reading time a bit ;)

Exercise to Reader: Change the code to show the date instead, in “yyyy-mm-dd” format.

Comments →

Real-time Applications and will Django adapt to it?

While talking about Django at PyCon India this year, the most common question I got was whether it supports Single Page applications or an event driven architecture. I was quick to point out that such realtime applications are usually a handled by separate services at the backend (running on a different port?) and a JavaScript MVC framework on the front. Sometimes, Django supported by a REST library like Django REST or Piston is used as the backend too.

But the more I thought about it, those questions felt more and more important. Then, I had a chance to read about Meteor. I tried some of their tutorials and they were fantastic. I was so impressed at that point that I felt this framework had the power to change web development in a big way. I was even tempted to write an article similar to Why Meteor will kill Ruby on Rails for Django. But then, better sense prevailed and I dug deeper.

So I wrote this post to really understand the Real-time web application phenomena and pen my thoughts around it. It will look at how a traditional web framework like Django might tackle it or whether it really can. It is not a comprehensive analysis by any means but some of my thoughts regarding it.

What is a Real-time Application?

Real-time web applications are what used to be known as AJAX applications or Single-page applications. Except that, now we are not talking about just submitting a form without refreshing anymore. Upon an event, the server is supposed to push data into the browser enabling a two-way communication (thereby, making the terms Server and Client moot). To the Facebook crowd, this means that the application can notify them about events like your friends liking your post or starting a new chat conversation live - as and when it happens.

The application will give you the impression that it is “alive”, in the sense that traditional websites were pretty much stale after the page has loading. To use an analogy, traditional websites are like cooked food that get spoiled with time but real-time websites are like living, breathing organisms - constantly interacting with and responding to the environment. As we will see, both have their own advantages and purposes.

The word Real-time itself comes from the field of Real-time Computing. There are various forms of real-time computing based on how strict the systems perceives the response times. Realtime Web applications are probably the least strict and could be termed as ‘soft realtime’.

What kind of sites needs them?

Practically any kind of website can benefit from realtime updates. Off the top of my head, here are some examples:

E-commerce Site: Offers against an item updates its discounted price realtime, user reviews
News Site: Minute-by-minute update of an emerging story, errata of an article, reader reactions, polls
Investment Site: Stock prices, exchange rates, any investment trigger set by user/customer advisor
Utilities Site: Updates your utility usage in real-time, notifies of any upcoming downtimes
Micro-blogging: Pretty much everything

Of course, there are several kinds of sites which would not be ideal for the real-time treatment. Sites with relatively static content like blogs or wikis. Imagine if a blog site was designed to constantly poll for post updates and it suddenly becomes viral. This stretches the limits of the server’s scalability for a largely unchanging content. The most optimal approach today would be to pre-compute and cache the article’s contents and serve the comments via Disqus to create an almost complete realtime experience. But, as we would soon see, this could change in the future as the fundamental nature of the web changes.

Websites are Turning into Multiplayer Games

Broadly, our examples have two kinds of real-time updates - from the server itself and from the peers. The former involves notifications like changes in the discounted price or external events like stock price change. But peer updates, from other users, are becoming extremely valuable as user-generated content becomes the basis for many modern sites. Especially for the constant inflow of information which keeps such sites fresh and interesting. In addition, the social factor adds to inherent human tendencies to form groups, to know and interact with each other.

In many ways, this is exactly how a multiplayer game works. There are global events like weather changes or a sudden disaster and there are player generated events like a melee attack or a chat conversation. Games, being some of first programs that I had written; I am quite fond of them and know quite a bit about how they work. Real-time web applications are designed like multiplayer games especially those that were designed to work with a central server rather than say over a LAN. Multiplayer games have been in existence for about three decades now so real-time web applications are able to leverage much of the work that has gone into creating them.

Technically, there are several similarities too. The event loop that every game programmer writes while beginning to write a game is at the heart of most event driven servers like Node.js or Tornado. Typically, the game client and server are written in two different languages for e.g. Eve Online using Stackless Python on the server and possibly C++ with Python scripting on the client side. This is because, like web applications, the server side needs to interact with a database for bookkeeping purposes and would be more IO bound rather than CPU/GPU-bound. Thus, the needs are different and games being extremely performance hungry creations, developers often use the best language or tool or framework for the client and server sides. They often end up being different.

Of course, in the case of web applications, the de facto language was JavaScript. Over the years, several JavaScript APIs exposing the underlying system have emerged which further cemented JavaScript’s position as the client-side language of choice. However with several languages targeting JavaScript and with browsers supporting source-maps, other options are like pyjs and ClojureScript have now emerged.

How does Meteor Solve it?

Meteor and Derby claim to be a new breed of web applications frameworks which are built for the needs of the real-time web. These frameworks are written in JavaScript to eliminate the need to duplicate the logic in the client and server. While using Django or Rails, model declarations and behaviour had to be written in Python/Rails on the sever side and typically rewritten in JavaScript for MVC frameworks like AngularJS or Knockout on the client side, as well. Depending on the amount of logic shared between the client and server, this would become a development and maintenance nightmare.

These new frameworks also allow automatic synchronisation of data. In other words, part or whole of the server information is replicated between the server and all the connected clients. If you have ever programmed a multiplayer game then you would realise how hard it is to maintain a consistent state across all the clients. By automatically synchronising the database information, you have an extremely powerful abstraction for rapidly creating real-time web applications.

A high-level understanding of how real-time web frameworks work

However, like Drew (the creator of Dropbox) pointed out treating anything over the network as local accessible is a “leaky abstraction” due to network latencies. Programmers are very likely to under-engineer the various challenges that networks can bring up like a slower mobile connection or even a server crash. Users hate it when the real-time component stops working. In fact, once they start seeing ‘Connection Lost’ on a simple chat window the worse thing that could happen is that they lose their train of thought. But when an entire site becomes unresponsive, I believe they would start distrusting the site itself. Perhaps, it is critical not to entirely rely on the data synchronisation mechanism for all kinds of functionality.

Regarding the advantages of using the same language to share logic between the client and server, the previous discussion about multiplayer games comes to mind. Often the requirements of a web server are quite different from that of a client. Even if you avoid the Callback Hell with Futures, JavaScript might not be everyone’s first choice for server side programming. Until recently, it didn’t matter which language you used at the server as long as it returned the expected results say HTML, XML or JSON. People can get very attached to their favourite language and unsurprisingly so; considering the large amount of time one needs to spend in mastering every nook and corner of a programming language. Expecting everyone to adopt JavaScript might not be possible.

The payoff is, of course, that the shared data structures and logic will reduce the need to write them twice in two different languages. Unlike multiplayer games, this is a big deal in web programming due to the sheer amount of shared bookkeeping happening at both ends. However, is having JavaScript at both ends the only way out? We can think of at least one possible alternative approach. But before that, we need to look whether we can continue using traditional frameworks.

Can Django Adapt?

Realtime web is a very real challenge that Django faces. There are some elegant solutions like running Django with Gevent. But these solutions look like hacks in the face of a comprehensive solution like Meteor.

Django community is quite aware of this. At last year’s DjangoCon US, the customary “Why I Hate Django” keynote was by Geoff Schmidt, a principal developer of Meteor. He starts off, in an ominous note, by comparing advent of real-time apps as possibly an extinction event for Django similar to the asteriod impact which nearly drove dinosaurs to extinction. Actually, Geoff is a big fan of Django and he tried to adapt it to his needs but was not quite happy with the results.

And he is not alone. Guido, in his Pycon keynote Q&A, mentioned how it would be difficult for traditional web frameworks like Django to completely adapt to an event driven model. He believes that newer frameworks might actually tackle that need better.

Before we answer the question whether Django can adapt, we need to find out what Django’s shortcomings are. Or rather, which parts of the framework are beginning to show its age in the new real-time paradigm.

Templates are no longer necessary

To be exact, HTML templates getting used lesser and lesser. Just send me the meat - seems to be the mantra of the real-time applications. Content is often rendered at the client side so the wire essentially carries data in the form of XML or JSON. Originally when Django was created, all clients couldn’t support client side rendering using JavaScript. But with increasingly powerful mobile clients, the situation is quickly changing.

However this doesn’t mean that templating will no longer be required in frameworks. A case could be made for XML or JSON templates. But Python data structures can be mapped to JSON and back in a straightforward manner (just like JavaScript).

However, the previously mentioned Django solutions like Piston and Django REST does not encourage using serialised Python data structures directly for a good reason - Security. Data coming from the outside world cannot be trusted. You will need to define classes with appropriate data types to perform basic type validation. Essentially, you might end up repeating the model class definitions you already wrote for the front end.

HTTP and the WSGI interface

If you read the above examples closely, you will notice that real-time web works best for sites involving short bursts of data. For sites with long form content like blogs, wikis or even news sites, it is probably best to stick with traditional web frameworks or even static content. Even if you have a very high bandwidth connection, it would simply be too chatty to check for published information (unless you expect to have too many typos)

In fact, the web is specifically suited for the dissemination of long-form content. It works best for a request-reply mechanism for retrieval of documents (or hypertext if you like to be pedantic). It is stateless hence suited for caching, wherever possible. In fact, this explains why hacks like long-polling had to be created for the browser to support bidirectional communication until web sockets arrived.

This explains the design of WSGI, a synchronous protocol to handle request and response. It can handle only one request at a time and hence not ideally suited for creating realtime applications. Since Django is designed to be deployed on a server with a WSGI interface, this means that asynchronous code needs to bypass this mechanism.

This seems like a genuine shortcoming of Django. There might be hacks to work around it like frequent polling etc. But it would be much better if the framework can integrate with the asynchronous model.

Today, writing a REST-based API using Django and interacting with it using a JavaScript MVC library seems to be a popular way of creating a single page application. To make it realtime, you might have to fiddle around with Gevent or Tornado and web sockets.

Can you have the cake and eat it too?

It is possible to look at the language impedance mismatch problem in a different way. Why not have your favourite language, even if it is not JavaScript, run on the server and the client? Since JavaScript is increasingly used as a target language (I am refraining from calling it the ‘assembly language of the web’), why not convert the client part of the code in, say Python, into JavaScript?

If we can have an intelligent Python compiler which can say target byte codes for the server part and the shared code; and then target JavaScript (perhaps in asm.js too for better performance) and the shared code for the client part - then it might just work. Other details will also have to be worked out like event loops at both ends which can pass messages asynchronously, data synchronisation through a compact binary format and data validation. It is a project much bigger than writing a library but it might be a decent solution that is general enough to be applied to most languages.

Conclusion

Django is a good solution for a vast majority of web applications today. But the expectations are rapidly changing to show real-time updates. Realtime web applications need a lot of design changes in existing web frameworks like Django. The existing solutions require a lot of components and sometimes repetitive code. Newer frameworks like Meteor and Derby seem to be well suited for these needs and rapid development. But the design and scalability of real-time application will still be tricky. Finally, if you are a non-JavaScript developer there might be still be hope.

Comments →

Building a Hacker News clone in Django - Part 4 (AJAX and Mixin)

You are reading a post from a four-part tutorial series

Yes, all good things do come to an end. It gets even better when the ending is good. Steel Rumors was a project to help Django beginners progress to the next level from basic tutorials. It had elements which would be useful for most practical sites like user registrations and making CRUD views.

Honestly, I don’t like most video tutorials myself as they need a considerable amount of time to watch. However, if they come with a full transcript then I can skim through the text and decide if it contains tips which are worth watching on video. Sometimes, the additional commentary explaining the context is well worth the watch.

So, I set out to create Steel Rumors as something that everyone including me would enjoy watching. But it turns out creating the transcript is much harder than recording a quick video. In fact, it gets monotonous at times (I really sympathise those who work in Medical Transcription!).

The videos also get difficult to record in a longer, complex project such as this. I don’t edit out any mistakes or typos I make, since debugging those presents a valuable learning opportunity for beginners. But bringing all the details together while maintaining the continuity can get incredibly demanding.

Also most tutorials would try to show you how to use the most popular package or the easiest way to implement a feature. I deliberately avoided that, perhaps inspired by Learn Python the Hard Way. It might be okay to reinvent the wheel the first time because it will help you understand how wheels work for a lifetime. So, despite many comments telling me that it is easier to use X than Y, I stuck to the alternative which helps you learn the most.

In this tutorial, we would cover some interesting areas like how you can make Django forms work with AJAX and how a simple ranking algorithm works. As always you can choose to watch the video or read the step by step description below or follow both.

I would recommend watching all the previous parts before watching this video.

Did you learn quite a bit from this video series? Then you should sign up for my upcoming book “Building a Social News Site in Django”. It explains in a learn-from-a-friend style how websites are built and gradually tackles advanced topics like testing, security, database migrations and debugging.

Step-by-step Instructions

This is the transcript of the video. In part 3, we created a social news site where users can post and comment about rumours of “Man of Steel” but cannot vote.

The outline of Part 4 of the screencast is:

Voting with FormView
Voting over AJAX
Mixins
Display Voted Status
Ranking algorithm
Background tasks

Voting with FormView

We will add an upvote button (with a plus sign) to each headline. Clicking on this will toggle the user’s “voted” status for a link i.e. voted or did not vote. The safest way to implement it is using a ModelForm for our Vote model.

Add a new form to links/forms.py:
```
    from .models import Vote
    ...

    class VoteForm(forms.ModelForm):
        class Meta:
            model = Vote
```

We will use another generic view called FormView to handle the view part of this form. Add these lines to links/views.py

    from django.shortcuts import redirect
    from django.shortcuts import get_object_or_404
    from django.views.generic.edit import FormView
    from .forms import VoteForm
    from .models import Vote
    ...

    class VoteFormView(FormView):
        form_class = VoteForm

        def form_valid(self, form):
            link = get_object_or_404(Link, pk=form.data["link"])
            user = self.request.user
            prev_votes = Vote.objects.filter(voter=user, link=link)
            has_voted = (prev_votes.count() > 0)

            if not has_voted:
                # add vote
                Vote.objects.create(voter=user, link=link)
                print("voted")
            else:
                # delete vote
                prev_votes[0].delete()
                print("unvoted")

            return redirect("home")

        def form_invalid(self, form):
            print("invalid")
            return redirect("home")

Those print statements will be removed soon and they are definitely not recommended for a production site.

Edit the home page template to add a voting form per headline. Add lines with ‘+’ sign (removing the ‘+’ sign) to steelrumors/templates/links/link_list.html:

    {% for link in object_list %}
    + <form method="post" action="{% url 'vote' %}" class="vote_form">
        <li> [{{ link.votes }}]
      +  {% csrf_token %}
      + <input type="hidden" id="id_link" name="link" class="hidden_id" value="{{ link.pk }}" />
      + <input type="hidden" id="id_voter" name="voter" class="hidden_id" value="{{ user.pk }}" />
      + <button>+</button>
        <a href="{% url 'link_detail' pk=link.pk %}">
          <b>{{ link.title }}</b>
        </a>
        </li>
    + </form>

Add this view in steelrumours/urls.py:
```
    from links.views import VoteFormView

    url(r'^vote/$', auth(VoteFormView.as_view()), name="vote"),  
```
Refresh the browser to see the ‘+’ buttons on every headline. You can vote them as well. But you can read the voting status only from the console.

Voting with AJAX

You have already copied the static folder of the goodies pack in the previous part. But in case you haven’t, then follow this step.

Create a folder named ‘js’ under steelrumors/static our javascript files. Copy jquery and vote.js from the goodies pack into this folder.
```
    mkdir steelrumors/static/js
    cp /tmp/sr-goodies-master/static/js/* ~/proj/steelrumors/steelrumors/static/js/
```

Add these lines to steelrumors/templates/base.html within <head> block:

      <title>Steel Rumors</title>
      <link rel="stylesheet" type="text/css" href="{{ STATIC_URL }}css/main.css" />
    +  <script src="{{ STATIC_URL }}js/jquery.min.js"></script>
    +  <script src="{{ STATIC_URL }}js/vote.js"></script>
    </head>
    <body>

In views.py delete the entire class VoteFormView and replace with these three classes. We are using a mixin to implement a JSON response for our AJAX requests:

    import json
    from django.http import HttpResponse
    ...

    class JSONFormMixin(object):
        def create_response(self, vdict=dict(), valid_form=True):
            response = HttpResponse(json.dumps(vdict), content_type='application/json')
            response.status = 200 if valid_form else 500
            return response

    class VoteFormBaseView(FormView):
        form_class = VoteForm

        def create_response(self, vdict=dict(), valid_form=True):
            response = HttpResponse(json.dumps(vdict))
            response.status = 200 if valid_form else 500
            return response

        def form_valid(self, form):
            link = get_object_or_404(Link, pk=form.data["link"])
            user = self.request.user
            prev_votes = Vote.objects.filter(voter=user, link=link)
            has_voted = (len(prev_votes) > 0)

            ret = {"success": 1}
            if not has_voted:
                # add vote
                v = Vote.objects.create(voter=user, link=link)
                ret["voteobj"] = v.id
            else:
                # delete vote
                prev_votes[0].delete()
                ret["unvoted"] = 1
            return self.create_response(ret, True)

        def form_invalid(self, form):
            ret = {"success": 0, "form_errors": form.errors }
            return self.create_response(ret, False)

    class VoteFormView(JSONFormMixin, VoteFormBaseView):
        pass

Showing the Voted state

We need some indication to know if the headline was voted or not. To achieve this, we can pass ids of all the links that have been voted by the logged in user. This can be passed as a context variable i.e. voted.

Add this to LinkListView class in links/views.py:

    class LinkListView(ListView):
    ...

        def get_context_data(self, **kwargs):
            context = super(LinkListView, self).get_context_data(**kwargs)
            if self.request.user.is_authenticated():
                voted = Vote.objects.filter(voter=self.request.user)
                links_in_page = [link.id for link in context["object_list"]]
                voted = voted.filter(link_id__in=links_in_page)
                voted = voted.values_list('link_id', flat=True)
                context["voted"] = voted
            return context

Change the home page template again. Add lines with ‘+’ sign (removing the ‘+’ sign) to steelrumors/templates/links/link_list.html:

        <input type="hidden" id="id_voter" name="voter" class="hidden_id" value="{{ user.pk }}" />
      + {% if not user.is_authenticated %}
      + <button disabled title="Please login to vote">+</button>
      + {% elif link.pk not in voted %}
        <button>+</button>
      + {% else %}
      + <button>-</button>
      + {% endif %}
        <a href="{% url 'link_detail' pk=link.pk %}">

Now, the button changes based on the voted state of a headline. Try it on your browser with different user logins.

Calculating Rank Score

We are going to change the sorting order of links from highest voted to highest score. Add a new function to models.py to calculate the rank score:

    from django.utils.timezone import now
    ...

    class Link(models.Model):
    ...

        def set_rank(self):
            # Based on HN ranking algo at http://amix.dk/blog/post/19574
            SECS_IN_HOUR = float(60*60)
            GRAVITY = 1.2

            delta = now() - self.submitted_on
            item_hour_age = delta.total_seconds() // SECS_IN_HOUR
            votes = self.votes - 1
            self.rank_score = votes / pow((item_hour_age+2), GRAVITY)
            self.save()

In the same file, change the sort criteria in the LinkVoteCountManager class. The changed line has been marked with a ‘+’ sign.

    class LinkVoteCountManager(models.Manager):
        def get_query_set(self):
            return super(LinkVoteCountManager, self).get_query_set().annotate(
    +             votes=Count('vote')).order_by('-rank_score', '-votes')

Ranking Job

Calculating the score for all links is generally a periodic task which should happen in the background. Create a file called rerank.py in the project root with the following content:

#!/usr/bin/env python
import os

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "steelrumors.settings")
from links.models import Link

def rank_all():
    for link in Link.with_votes.all():
        link.set_rank()

import time

def show_all():
    print "\n".join("%10s %0.2f" % (l.title, l.rank_score,
                         ) for l in Link.with_votes.all())
    print "----\n\n\n"

if __name__=="__main__":
    while 1:
        print "---"
        rank_all()
        show_all()
        time.sleep(5)

This runs every 5 secs in the foreground.

Turn it into a background job
```
(nohup python -u rerank.py&)
tail -f nohup.out
```
Note that this is a very simplistic implementation of a background job. For a more robust solution, check out Celery.

Watching the News Dive

It is fun to watch the rank scores rise and fall for links. It is almost as fun as watching an aquarium except with numbers. But the ranking function set_rank in models.py has the resolution of an hour. This makes it quite boring.

To see a more dramatic change in rank scores change the SECS_IN_HOUR constant to small value like 5.0. Now submit a new link and watch the scores drop like a stone!

Final Comments

Steel Rumors is far from being a complete Hacker News clone. But it supports voting, submission of links and user registrations. In fact, it is quite useable at this point.

Check out a demo of Steel Rumors yourself.

Hope you enjoyed this tutorial series as much as I did while making them. If you get stuck anywhere make sure you check the github source first for reference. Keep your comments flowing!

Resources

Full Source on Github

Comments →

« Newer Page 7 of 39 Older »