Test-driven Development (TDD) has been getting a lot of attention these days. While I understand the importance of testing, I was sceptical of Test-Driven Development for a long time. I mean, why not Development-Driven Testing or Develop Then Test Later? I thought figuring out the tests before one can write even a single line of code would be impossible.
I was so wrong.
Test-driven Development will help you immensely in the long run. We will soon see how. We will approach TDD from a sceptical viewpoint and then try to create a simple URL shortener like bit.ly using TDD. I will conclude with my evaluation of the pros and cons of the technique; to which you may directly jump to as well.
What is TDD?
Test-driven development (TDD) is a form of software development where you first write the test, run the test (which will fail first) and then write the minimum code needed to make the test pass. We will elaborate on these steps in detail later but essentially this is the process that gets repeated.
This might sound counter-intuitive. Why do we need to write tests when we know that we have not written any code and we are certain that it will fail because of that? It seems like we are not testing anything at all.
But look again. Later, we do write code that merely satisfies these tests. That means that these tests are not ordinary tests, they are more like Specifications. They are telling you what to expect. These tests or specifications will directly come from your client’s user stories. You are writing just enough code to make it work.
For instance, your client needs a website (in Django) with a Article model that specifies a headline and a body. The very first thing to do in TDD would be to write a test which checks this specification. Then you run the test. Watch it fail. Then write three lines of code in models.py
to create an Article class with a headline and body.
But wait a minute. Isn’t this cheating? Your client actually wanted you to create a Blog application. But your three lines are no where near the functionality of a blog. This is an uncomfortable thought that bothers you when you start with TDD.
Just ignore it for now. It is not that much of a big deal. In fact, even your client wouldn’t be too bothered about it. They would be happy that you are working on their application one spec at a time. They can actually watch the progress (and trust me they love that, especially management).
In any form of engineering, this is a common way of building things - there is a plan and we build it bit-by-bit based on it. Before a building is constructed, a blueprint is drawn and progress is made floor-by-floor. Prior to every mobile phone being manufactured, there are detailed CAD models to ensure that every part fits each other perfectly. So why should software engineering be any different?
How to do TDD?
Now, let’s see how test-driven development is done. Just follow this simple procedure:
- Decide what the code will do: This is usually told to you by the client.
- Write a test: The test should pass only if the code does that.
- Run the test: It will fail.
- Write some code: Just enough to make it pass.
- Run the test: If it fails go to step 4.
- Refactor the code: Tests will ensure that it doesn’t break specs.
- Rinse and Repeat: Take another spec/user-story/feature and go to step 1.
Now that you know that tests are like specifications, you might be seeing some method in this madness. But this method is more familiar to you than you might think.
This is, in fact, quite similar to the Scientific method which is the basis of modern science. Let’s recollect what scientific method teaches us:
- Define a question
- Gather information and resources (observe)
- Form an explanatory hypothesis
- Test the hypothesis by performing an experiment and collecting data in a reproducible manner
- Analyse the data
- Interpret the data and draw conclusions that serve as a starting point for new hypothesis (go to step 3)
- Publish results
- Retest (frequently done by other scientists)
Compare this with the previous steps for TDD and notice the similarities. Tests in TDD take the role of experiments in Science. Your theory is only good if the experiments are repeatable and verifiable. You will see that the same will hold good for tests in your project’s source code when you work with other developers.
In fact, let’s look at collaborative software development happening in large open source projects. Almost all of them would have a good collection of tests. While writing test cases seem like a good idea, are there any good reasons to write tests before code?
Why do TDD?
So it seems that that TDD is not a arbitrary practice after all. But we still don’t have to follow it. There are plenty of ways to develop software.
But TDD does come with its own set of advantages, some of which are obvious and some which are not:
- It is live documentation that grows, lives and changes with your code.
- Improves design
- Catches future errors
- Long-term time savings
- Reduces technical debt and hence risk
- Avoid manual one-off tests. Eventually, you will add and re-add test data to test by hand. Too hackey.
These are advantages gathered from various developers who practice TDD. Each of them merit a detailed explanation. But to summarise, TDD brings with it a lot of benefits of testing by making it a mandatory part of your development cycle. You might think that you will add tests later, but sometimes you never get around to doing it.
Code written by passing simple, focused test cases tend to be more modular and hence better designed. It is a pleasant side effect of the process but you will certainly notice it.
How is TDD done?
To understand TDD better we will try to implement an entire Django project by writing test cases first and then the code. We will be creating a URL shortening services which takes a long url and converts it into a short one (possibly for fitting into a twitter message).
You can also watch the screencast below to see how the site is created.
User Stories
Imagine that after a long call with your client, you have distilled their needs into the following user stories:
- The short URL must be always smaller than the original URL.
- If you give the short URL, you must be able to recover the original URL.
- The home page must have a form to enter the long URL.
- Submitting the home page form should show the short URL.
- Clicking on the short URL must redirect to the original (long) URL.
We have also roughly ordered the stories so that the core functionality comes first.
Create the Project
Note: You will need at least Python 2.7 and Django 1.6 to follow the rest of this post. Earlier versions of Python and Django had some differences in unit testing tools.
Typically URL shortener sites will have a short name like http://ti.ny. So let’s call our project tiny
django-admin.py startproject tiny
cd tiny
./manage.py startapp shorturls
Configure DATABASES in tiny/settings.py
to use a simple file-based sqlite3
database and add the app shorturls
in INSTALLED_APPS as well. Next, synchronise the database with:
./manage.py syncdb
Writing the First Test
Add your first test case to shorturls/tests.py
:
from django.test import TestCase
from .models import Link
class ShortenerText(TestCase):
def test_shortens(self):
"""
Test that urls get shorter
"""
url = "http://www.example.com/"
l = Link(url=url)
short_url = Link.shorten(l)
self.assertLess(len(short_url), len(url))
This test creates a simple Link object given a (long) URL. It creates a short URL by using a class method called shorten()
and asserts that it is shorter in length. Note that we are not saving the Link object into the database. Whenever a test can avoid touching the database, it must jump at the opportunity. Django uses an in-memory database while testing but even that can take some time. Faster the unit tests are, the more likely you are to use them.
Also, notice that we get a better error message when we use the assert...()
functions (see the Sidenote below).
Run:
./manage.py test shorturls
As expected, it fails. Now build a model for this to work in shorturls/models.py
:
from django.db import models
class Link(models.Model):
url = models.URLField()
@staticmethod
def shorten(long_url):
return ""
We are cheating by returning a zero length string but we know that this will pass the test.
./manage.py test shorturls
Sidenote: Choosing the right assertion
There are several assert functions provided by the unittest
module in Python and several more provided by Django in the TestCase
class. Initially, they might feel redundant. After all, assert
is a keyword built into Python. Every conceivable assertion function can be replaced by an assert
keyword checking for the truth of a Python expression.
The real difference is when the assertion fails. Here are three equivalent assertions along with the typical error message when it fails. Compare them yourself:
assert len(url) < len(docs_url)
AssertionError
self.assertTrue(len(url) < len(docs_url))
AssertionError: False is not true
self.assertLess(len(url), len(docs_url))
AssertionError: 101 not less than 58
Clearly, the self.assert...()
functions have a clearer error message. So it is worthwhile to familiarise ourselves with the assert API unless you plan to you something like pytest.
Second Test: Recovering the url
“But you wouldn’t clap yet. Because making something disappear isn’t enough; you have to bring it back. That’s why every magic trick has a third act, the hardest part, the part we call “The Prestige”.” — Christopher Priest, The Prestige
Add another test to shorturls/tests.py
:
def test_recover_link(self):
"""
Tests that the shortened then expanded url is the same as original
"""
url = "http://www.example.com/"
l = Link(url=url)
short_url = Link.shorten(l)
l.save()
# Another user asks for the expansion of short_url
exp_url = Link.expand(short_url)
self.assertEqual(url, exp_url)
Our bluff gets called. We now need a real way to shorten urls and recover them. Unlike what might come to you intuitively, the URL is not mapped to a more compact encoding like a zip file. The allowable character set for a URL is pretty limited. Any kind of ‘string compression’ would reach its limits.
Instead we do something very simple. We know that, once saved into the database, each Link object can be uniquely identified by an integer - its primary key. Simple add this primary key to the domain’s URL and we have a short URL that can be mapped back to the original URL with a database lookup.
Change shorturls/models.py
to this:
from django.db import models
class Link(models.Model):
url = models.URLField()
@staticmethod
def shorten(link):
l, _ = Link.objects.get_or_create(url=link.url)
return str(l.pk)
@staticmethod
def expand(slug):
link_id = int(slug)
l = Link.objects.get(pk=link_id)
return l.url
Now the tests should pass.
Third Test: Home Page With a Form
Add to shorturls/tests.py
:
from django.core.urlresolvers import reverse
...
def test_homepage(self):
"""
Tests that a home page exists and it contains a form.
"""
response = self.client.get(reverse("home"))
self.assertEqual(response.status_code, 200)
self.assertIn("form", response.context)
This test fails because we don’t have any views mapped in our urls.py
. In the spirit of minimum effort, let’s use Django’s class based view. Since the submission of a form would create a new Link object, let’s use a CreateView
instead of a TemplateView
. A CreateView will generate the form for free and it will be useful later on.
Replace contents of shorturls/views.py
with:
from django.views.generic.edit import CreateView
from .models import Link
class LinkCreate(CreateView):
model = Link
fields = ["url"]
Create shorturls/templates/shorturls/link_form.html
:
<form method="post">{% csrf_token %}
{{ form.as_p }}
<input type="submit" value="Shorten" />
</form>
Replace contents of tiny/urls.py
with:
from django.conf.urls import patterns, include, url
from shorturls.views import LinkCreate
urlpatterns = patterns('',
url(r'^$', LinkCreate.as_view(), name='home'),
)
Now the test should pass.
Fourth Test: Form Returns a Short URL
Add to shorturls/tests.py
:
def test_shortener_form(self):
"""
Tests that submitting the forms returns a Link object.
"""
url = "http://example.com/"
response = self.client.post(reverse("home"),
{"url": url}, follow=True)
self.assertEqual(response.status_code, 200)
self.assertIn("link", response.context)
l = response.context["link"]
short_url = Link.shorten(l)
self.assertEqual(url, l.url)
self.assertIn(short_url, response.content)
(This test is designed to work with Django URLField’s default behaviour to add trailing slashes. This needs to be agreed with your client, of course. In my case, I simply had to ask the mirror.)
Now we need to think of what gets shown when the form is submitted. Obviously, there would be the short URL. Once again we can use a ready made class based view for this - the DetailView
.
Add to shorturls/views.py
:
from django.views.generic import DetailView
...
class LinkCreate(CreateView):
model = Link
fields = ["url"]
def form_valid(self, form):
# Check if the Link object already exists
prev = Link.objects.filter(url=form.instance.url)
if prev:
return redirect("link_show", pk=prev[0].pk)
return super(LinkCreate, self).form_valid(form)
class LinkShow(DetailView):
model = Link
Change tiny/urls.py
to:
from shorturls.views import LinkCreate
from shorturls.views import LinkShow
urlpatterns = patterns('',
url(r'^$', LinkCreate.as_view(), name='home'),
url(r'^link/(?P<pk>\d+)$', LinkShow.as_view(), name='link_show'),
)
Now add the template for the DetailView
. Create shorturls/templates/shorturls/link_detail.html
with the following contents:
<p> Short Link: /r/{{ object.id }}
<p> Original Link: {{ object.url }}
Note that we haven’t created the short link redirection code for /r/
yet.
Only hitch we have now is that Django doesn’t know where to go after the form is submitted. One way to solve that is by adding get_absolute_url()
to the Link
model.
Change shorturls/models.py
to:
from django.db import models
from django.core.urlresolvers import reverse
class Link(models.Model):
url = models.URLField()
def get_absolute_url(self):
return reverse("link_show", kwargs={"pk": self.pk})
Now if the form is submitted, it gets redirected to the new DetailView
. The tests should pass.
Fifth Test: Short URL Must Redirect To The Long URL
Our next and final test actually tests if the short URLs work:
def test_redirect_to_long_link(self):
"""
Tests that submitting the forms returns a Link object.
"""
url = "http://example.com"
l = Link.objects.create(url=url)
short_url = Link.shorten(l)
response = self.client.get(
reverse("redirect_short_url",
kwargs={"short_url": short_url}))
self.assertRedirects(response, url)
The final bit of the puzzle is the reverse lookup or redirected the user with the short URL to the original URL. As you might have guessed, time for yet another class based view - RedirectView
.
Add to shorturls/views.py
:
from django.views.generic.base import RedirectView
...
class RedirectToLongURL(RedirectView):
permanent = False
def get_redirect_url(self, *args, **kwargs):
short_url = kwargs["short_url"]
return Link.expand(short_url)
Add to tiny/urls.py
:
from shorturls.views import RedirectToLongURL
...
url(r'^r/(?P<short_url>\w+)$', RedirectToLongURL.as_view(),
name='redirect_short_url'),
Now the tests should pass.
Refactoring For Shorter URLs
We will now see how tests prevent regression. The short URLs we generate are nothing but primary keys of Link instances in the database. This scheme works swimmingly well until a smart guy notices that we are wasting too many characters compared to other URL shorteners.
A shortener like bit.ly creates short urls which are a mix of alphabets (lower and upper case), numbers and symbols (like hyphens and underscores). However, we use only numbers. Mathematically speaking, we can use a base higher than 10 if we have more than ten symbols. This can lead to shorter numbers. For e.g. the decimal “255” can be represented as “FF” in hexadecimal, which saved 1 byte!
Notice that we have deliberately avoided testing the short URL format. This gives us flexibility to use any short URL representation we like and we are not limited to just decimals. So we create a module which can convert a decimal to a higher base with more symbols and back.
Create a new file shorturls/basechanger.py
with the following contents:
CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGUIJKLMNOPQRSTUVWXYZ"
BASE = len(CHARS)
def decimal2base_n(n):
if n >= BASE:
return decimal2base_n(n // BASE) + CHARS[n % BASE]
else:
return CHARS[n]
def base_n2decimal(n):
if len(n) > 1:
return base_n2decimal(n[:-1]) * BASE + CHARS.index(n[-1])
else:
return CHARS.index(n[0])
If you suspect anything wrong, just play along ;) . Since all the URL shortening and expansion logic resides in the models (as it should), we need to change that as well.
Change shorturls/models.py
to:
from django.db import models
from django.core.urlresolvers import reverse
from .basechanger import decimal2base_n, base_n2decimal
class Link(models.Model):
url = models.URLField()
def get_absolute_url(self):
return reverse("link_show", kwargs={"pk": self.pk})
@staticmethod
def shorten(link):
l, _ = Link.objects.get_or_create(url=link.url)
return str(decimal2base_n(l.pk))
@staticmethod
def expand(slug):
link_id = int(base_n2decimal(slug))
l = Link.objects.get(pk=link_id)
return l.url
Now running your tests should work as if nothing happened. Your smart guy will be pleased that the short URLs are now shorter. It is a win-win, folks!
Note that we have not removed the hard-coded short URL path in our DetailView
template. Let’s do that as well.
First, add a method to shorturls/models.py
:
def short_url(self):
return reverse("redirect_short_url",
kwargs={"short_url": Link.shorten(self)})
Change shorturls/templates/shorturls/link_detail.html
:
<p> Short Link: <a href="{{ object.short_url }}">{{ object.short_url }}</a>
<p> Original Link: {{ object.url }}
Subtle Bug: More Tests Needed
Everything goes great for a while until user testing, when you get a strange bug report:
Critical: Some short URLs get redirected to the wrong site.
By following TDD, the only way to fix your code would be to create a test to reproduce your bug. Since the problem only happens after a certain number of short URLs are created, you design a test to create a large number of short URLs and compare it with the original URL.
Add a new test case to shorturls/tests.py
:
import random, string
...
def test_recover_link_n_times(self):
"""
Tests multiple times that after shortening and expanding
the original url is recovered.
"""
TIMES = 100
for i in xrange(TIMES):
uri = "".join(random.sample(string.ascii_letters, 5))
url = "https://example.com/{}/{}".format(i, uri)
l = Link.objects.create(url=url)
short_url = Link.shorten(l)
long_url = Link.expand(short_url)
self.assertEqual(url, long_url)
Running the test gives you a mysterious error.
AssertionError: 'https://example.com/55/KPAOz' != u'https://example.com/42/mDcHw'
The hint of what is wrong is in the URLs themselves. After running a debugger, you realise that the 55th character is same as the 42nd character - the symbol ‘H’. A simple typo that was actually overlooked when I was writing this article.
I learnt two lessons from this. First, to never underestimate testing. The more tests you can write, the better your code becomes. Second, to never attempt to list the alphabets by hand. We are no longer in kindergarten and we are certainly not expected to remember all of it. That’s why Python has string
module with such silly strings kept ready for you.
So the fix is a simple change in the first line of shorturls/basechanger.py
:
import string
CHARS = string.digits+string.ascii_lowercase+string.ascii_uppercase
So once again, we have a 100% success rate in our tests.
The completed source code can be found at Github
So What’s Wrong With TDD?
The biggest problem I faced while learning TDD was wearing two different hats alternatingly. The Tester Hat first makes you think the most specific way to break your code and the Code Hat wants you to write terse code that works in the most general way possible. Perhaps, after some practice, the cognitive load was greatly reduced. But, I would leave you with this pithy statement for some thought:
“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” — Brian Kernighan
Of course, TDD practitioners themselves ackowledge that TDD is not for exploratory programming. In other words, you cannot create a map if you don’t know where you are or where you want to go. It is also expected that you have a certain amount of expertise in the language (in this case, Python/Django), to clearly see how the test case should be written in advance. So I wouldn’t expect a beginner to learn TDD along with learning programming.
It, of course, goes without saying due to its counter intuitive nature, it takes some time to get comfortable with TDD.
Conclusion
TDD is a design technique that needs a little bit of extra time for planning ahead. Some studies put this between 15-35% increase in development time. So, I wouldn’t use it for one time scripts or a quick work. But I will strongly consider it whenever I need to build production quality sites.
I might also not consider it for version 1 of something that I am working on. Maybe version 2 onwards, when the development will get faster and I am more familiar with the domain.
It doesn’t necessarily improve the client’s experience. If they are more interested in development time rather than the quality of code, then they might not appreciate TDD designed code itself. This is when BDD is much more effective in engaging them and “showcasing” the effectiveness of the methodology.
Django tests are pretty fast. So I am not too slowed down by the runs. I work on the models first, then the views, then the system tests and finally the templates. I prefer smaller, faster, non-redundant, black-boxy and functionality-oriented tests. Perhaps you want to read that again. It takes a while to design a good test case.
However, Pytest is a better option without learning all the legacy JUnit functions. It is a lot better that the default Django testing tool and I would prefer to use Pytest as my default test runner.
TDD relies on the inherent human nature to fix things. Writing tests are important but not essential. We might forget them. TDD will actually show a ‘broken’ status if any test fails, this is a great incentive to fix tests.
I tend to make a lot of design decisions while writing test cases. This is a good side effect of TDD. Lot of corner case which are missed while coding get importance upfront and by the time you are coding you are aware of them, rather than other way around. This is why the design is better for TDD.
It takes a lot of time to be good at TDD.