Get started with Async & Await

    You are reading a post from a two-part tutorial series on Django Channels

    Asyncio

    Asyncio is a co-operative multitasking library available in Python since version 3.6. Celery is fantastic for running concurrent tasks out of process, but there are certain times you would need to run multiple tasks in a single thread inside a single process.

    If you are not familiar with async/await concepts (say from JavaScript or C#) then it involves a bit of steep learning curve. However, it is well worth your time as it can speed up your code tremendously (unless it is completely CPU-bound). Moreover, it helps in understanding other libraries built on top of them like Django Channels.

    This post is an attempt to explain the concepts in a simplified manner rather than try to be comprehensive. I want you to start using asynchronous programming and enjoy it. You can learn the nitty gritties later.

    All asyncio programs are driven by an event loop, which is pretty much an indefinite loop that calls all registered coroutines in some order until they all terminate. Each coroutine operates cooperatively by yielding control to fellow coroutines at well-defined places. This is called awaiting.

    A coroutine is like a special function which can suspend and resume execution. They work like lightweight threads. Native coroutines use the async and await keywords, as follows:

    import asyncio
    
    
    async def sleeper_coroutine():
        await asyncio.sleep(5)
    
    
    if __name__ == '__main__':
        loop = asyncio.get_event_loop()
        loop.run_until_complete(sleeper_coroutine())
    

    This is a minimal example of an event loop running one coroutine named sleeper_coroutine. When invoked this coroutine runs until the await statement and yields control back to the event loop. This is usually where an Input/Output activity occurs.

    The control comes back to the coroutine at the same line when the activity being awaited is completed (after five seconds). Then then coroutine returns or is considered completed.

    Explain async and await

    [TLDR; Watch my screencast to understand this section with a lot more code examples.]

    Initially, I was confused by the presence of the new keywords in Python: async and await. Asynchronous code seemed to be littered with these keywords yet it was not clear what they did or when to use them.

    Let’s first look at the async keyword. Commonly used before a function definition as async def, it indicates that you are defining a (native) coroutine.

    You should know two things about coroutines:

    1. Don’t perform slow or blocking operations synchronously inside coroutines.
    2. Don’t call a coroutine directly like a regular function call. Either schedule it in an event loop or await it from another coroutine.

    Unlike a normal function call, if you invoke a coroutine its body will not get executed right away. Instead it will be suspended and returns a coroutine object. Invoking the send method of this coroutine will start the execution of the coroutine body.

    >>> async def hi():
    ...     print("HOWDY!")
    ...
    >>> o = hi()
    >>> o
    <coroutine object hi at 0x000001DAE26E2F68>
    >>> o.send(None)
    HOWDY!
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration
    >>>
    

    However, when the coroutine returns it will end in a StopIteration exception. Hence it is better to use the asyncio provided event loop to run a coroutine. The loop will handle exceptions in addition to all other machinery for running coroutines concurrently.

    >>> import asyncio
    >>> loop = asyncio.get_event_loop()
    >>> o = hi()
    >>> loop.run_until_complete(o)
    HOWDY!
    

    Next we have the await keyword which must be only used inside a coroutine. If you call another coroutine, chances are that it might get blocked at some point, say while waiting for I/O.

    >>> async def sleepy():
    ...     await asyncio.sleep(3)
    ...
    >>> o = sleepy()
    >>> loop.run_until_complete(o)
    # After three seconds
    >>>
    

    The sleep coroutine from asyncio module is different from its synchronous counterpart time.sleep. It is non-blocking. This means that other coroutines can be executed while this coroutine is awaiting the sleep to be completed.

    When a coroutine uses the await keyword to call another coroutines, it acts like a bookmark. When a blocking operation happens, it suspends the coroutine (and all the coroutines who are await-ing it) and returns control back to the event loop. Later, when the event loop is notified of the completion of the blocking operation, then the execution is resumed from the await expression paused and continues onward.

    Asyncio vs Threads

    If you have worked on multi-threaded code, then you might wonder – Why not just use threads? There are several reasons why threads are not popular in Python.

    Firstly, threads need to be synchronized while accessing shared resources or we will have race conditions. There are several types of synchronization primitives like locks but essentially, they involve waiting which degrades performance and could cause deadlocks or starvation.

    A thread may be interrupted any time. Coroutines have well-defined places where execution is handed over i.e. co-operative multitasking. As a result, you may make changes to a shared state as long as you leave it in a known state. For instance you can retrieve a field from a database, perform calculations and overwrite the field without worrying that another coroutine might have interrupted you in between. All this is possible without locks.

    Secondly, coroutines are lightweight. Each coroutine needs an order of magnitude less memory than a thread. If you can run a maximum of hundreds of threads, then you might be able to run tens of thousands of coroutines given the same memory. Thread switching also takes some time (few milliseconds). This means you might be able to run more tasks or serve more concurrent users (just like how Node.js works on a single thread without blocking).

    The downsides of coroutines is that you cannot mix blocking and non-blocking code. So once you enter the event loop, rest of the code driven by it must be written in asynchronous style, even the standard or third-party libraries you use. This might make using some older libraries with synchronous code somewhat difficult.

    If you really want to call asynchronous code from synchronous or vice versa, then do read this excellent overview of various cases and adaptors you can use by Andrew Godwin.

    The Classic Web-scraper Example

    Let’s look at an example of how we can rewrite synchronous code into asynchronous. We will look at a webscraper which downloads pages from a couple of URLs and measures its size. This is a common example because it is very I/O bound which shows a significant speedup when handled concurrently.

    Synchronous web scraping

    The synchronous scraper uses Python 3 standard libraries like urllib. It downloads the home page of three popular sites and the fourth is a large file to simulate a slow connection. It prints the respective page sizes and the total running time.

    Here is the code for the synchronous scraper:

    # sync.py
    """Synchronously download a list of webpages and time it"""
    from urllib.request import Request, urlopen
    from time import time
    
    sites = [
        "https://news.ycombinator.com/",
        "https://www.yahoo.com/",
        "https://github.com/",
    ]
    
    
    def find_size(url):
        req = Request(url)
        with urlopen(req) as response:
            page = response.read()
            return len(page)
    
    
    def main():
        for site in sites:
            size = find_size(site)
            print("Read {:8d} chars from {}".format(size, site))
    
    
    if __name__ == '__main__':
        start_time = time()
        main()
        print("Ran in {:6.3f} secs".format(time() - start_time))
    

    On a test laptop, this code took 5.4 seconds to run. It is the cumulative loading time of each site. Let’s see how asynchronous code runs.

    Asynchronous web scraping

    This asyncio code requires installation of a few Python asynchronous network libraries such as aiohttp and aiodns. They are mentioned in the docstring.

    Here is the code for the asynchronous scraper – it is structured to be as close as possible to the synchronous version so it is easier to compare:

    # async.py
    """
    Asynchronously download a list of webpages and time it
    
    Dependencies: Make sure you install aiohttp using: pip install aiohttp aiodns
    """
    import asyncio
    import aiohttp
    from time import time
    
    # Configuring logging to show timestamps
    import logging
    logging.basicConfig(format='%(asctime)s %(message)s', datefmt='[%H:%M:%S]')
    log = logging.getLogger()
    log.setLevel(logging.INFO)
    
    sites = [
        "https://news.ycombinator.com/",
        "https://www.yahoo.com/",
        "https://github.com/",
    ]
    
    
    async def find_size(session, url):
        log.info("START {}".format(url))
        async with session.get(url) as response:
            log.info("RESPONSE {}".format(url))
            page = await response.read()
            log.info("PAGE {}".format(url))
            return url, len(page)
    
    
    async def main():
        tasks = []
        async with aiohttp.ClientSession() as session:
            for site in sites:
                tasks.append(find_size(session, site))
            results = await asyncio.gather(*tasks)
        for site, size in results:
            print("Read {:8d} chars from {}".format(size, site))
    
    
    if __name__ == '__main__':
        start_time = time()
        loop = asyncio.get_event_loop()
        loop.set_debug(True)
        loop.run_until_complete(main())
        print("Ran in {:6.3f} secs".format(time() - start_time))
    

    The main function is a coroutine which triggers the creation of a separate coroutine for each website. Then it awaits until all these triggered coroutines are completed. As a best practice, the web session object is passed to avoid re-creating new sessions for each page.

    The total running time of this program on the same test laptop is 1.5 s. This is a speedup of 3.6x on the same single core. This surprising result can be better understood if we can visualize how the time was spent, as shown below:

    Comparing scrapers

    A simplistic representation comparing tasks in the synchronous and asynchronous scrapers

    The synchronous scraper is easy to understand. Scraping activity needs very little CPU time and the majority of the time is spent waiting for the data to arrive from the network. Each task is waiting for the previous task to complete. As a result the tasks cascade sequentially like a waterfall.

    On the other hand the asynchronous scraper starts the first task and as soon as it starts waiting for I/O, it switches to the next task. The CPU is hardly idle as the execution goes back to the event loop as soon as the waiting starts. Eventually the I/O completes in the same amount of time but due to the multiplexing of activity, the overall time taken is drastically reduced.

    In fact, the asynchronous code can be speeded up further. The standard asyncio event loop is written in pure Python and provided as a reference implementation. You can consider faster implementations like uvloop for further speedup (my running time came down to 1.3 secs).

    Concurrency is not Parallelism

    Concurrency is the ability to perform other tasks while you are waiting on the current task. Imagine you are cooking a lot of dishes for some guests. While waiting for something to cook, you are free to do other things like peeling onions or cutting vegetables. Even when one person cooks, typically there will be several things happening concurrently.

    Parallelism is when two or more execution engines are performing a task. Continuing on our analogy, this is when two or more cooks work on the same dish to (hopefully) save time.

    It is very easy to confuse concurrency and parallelism because they can happen at the same time. You could be concurrently running tasks without parallelism or vice versa. But they refer to two different things. Concurrency is a way of structuring your programs while Parallelism refers to how it is executed.

    Due to the Global Interpreter Lock, we cannot run more than one thread of the Python interpreter (to be specific, the standard CPython interpreter) at a time even in multicore systems. This limits the amount of parallelism which we can achieve with a single instance of the Python process.

    Optimal usage of your computing resources require both concurrency and parallelism. Concurrency will help you avoid idling the processor core while waiting for say I/O events. While parallelism will help distribute work among all the available cores.

    In both cases, you are not executing synchronously i.e. waiting for a task to finish before moving on to another task. Asynchronous systems might seem to be the most optimal. However, they are harder to build and reason about.

    Why another Asynchronous Framework?

    Asyncio is by no means the first cooperative multitasking or light-weight thread library. If you have used gevent or eventlet, you might find asyncio needs more explicit separation between synchronous and asynchronous code. This is usually a good thing.

    Gevent, relies on monkey-patching to change blocking I/O calls to non-blocking ones. This can lead to hard to find performance issues due to an unpatched blocking call slowing the event loop. As the Zen says, ‘Explicit is better than Implicit’.

    Another objective of asyncio was to provide a standardized concurrency framework for all implementations like gevent or Twisted. This not only reduces duplicated efforts by library authors but also ensures that code is portable for end users.

    Personally, I think the asyncio module can be more streamlined. There are a lot of ideas which somewhat expose implementation details (e.g. native coroutines vs generator-based coroutines). But it is useful as a standard to write future-proof code.

    Can we use asyncio in Django?

    Strictly speaking, the answer is No. Django is a synchronous web framework. You might be able to run a seperate worker process, say in Celery, to run an embedded event loop. This can be used for I/O background tasks like web scraping.

    However, Django Channels changes all that. Django might fit in the asynchronous world after all. But that’s the subject of another post.

    This article contains an excerpt from the upcoming second edition of book "Django Design Patterns and Best Practices" by Arun Ravindran

    Comments →

    Interview with Daniel Roy Greenfeld (PyDanny)

    Daniel Roy Greenfeld needs no introduction to Djangonauts. Co-author of the book Two Scoops of Django which is probably on the shelves of any serious Django practitioner. But PyDanny, as he is fondly known as, is also a wonderful fiction author, fitness enthusiast and a lot more.

    Having known Daniel for a while as a wonderful friend and a great inspiration, I am so excited that he agreed to my interview. Let’s get started…

    PyDanny Photo

    How did the idea of writing an ice-cream themed book occur?

    The first 50 pages that I wrote were angry and were for a book with an angry name. You see, I was tired of having to pick up after sloppy or unwise coding practices on rescue projects. I was furious and wanted to fix the world.

    However, I was getting stuck in what I wanted to say, or didn’t know things. I kept asking my normal favorite resource for help, Audrey Roy Greenfeld. Eventually she started to write (or rewrite) whole sections and I realized that I wasn’t writing the book alone.

    Therefore I asked Audrey to be my co-author. She’s a cheerful person and said that if she were to accept, the book had to be lightened. That meant changing the name. After a lot of different title names discussed over many ice cream sessions, we decided to use the subject matter at hand. Which worked out well as the ice cream theme made for a good example subject.

    How do you and Audrey collaborate while writing a book?

    We take turns writing original material that interests us. The other person follows them and acts as editor and proofreader. We go back and forth a few hundred times and there you go.

    For tech writing we use Git as version control and LaTeX for formatting. For fiction we use Google docs followed by some Python scripts that merge and format the files.

    What’s the most exciting recent development in Django? Where do you think it can improve?

    I like the new URL system as it’s really nice for beginners and advanced coders alike. While I like writing regular expressions, I’m the exception in this case.

    Where I think where Django can improve is having more non-US/Europe/Australian representation within the DSF and in the core membership. In short, most of Django core looks like me, and I think that’s wrong. Many of the best Django (and Python) people look nothing like me, and they deserve more recognition. While having Anna Makarudze on the DSF board is a wonderful development, as a community we can still do better in core.

    In the case of Django’s core team, I believe this has happened because all the major Django conferences are in the US, Europe, and Australia, and from what I’ve seen over the years it’s through participation in those events is how most people get onto the Django core team. The DSF is aware of the problem, but I think more people should raise it as an issue. More people vocalizing this as a problem will get it resolved more quickly.

    With the Ambria fantasy series, you have proven to be a prolific fiction author too. Reminds me Lewis Carroll who wrote children’s books and mathematical treatises. What is the difference in the writing process while writing fiction and non-fiction?

    For us, the process is very similar. We both write and we both review our stuff. The difference is that if we make a mistake in our fiction, it’s not as critical. That means that the review process for fiction is a lot easier on us then it is to write technical books or articles. I can’t begin to tell you what a load that is off my shoulders.

    Why fantasy? Any literary influences?

    We like fantasy because we can just let our imaginations run away with us. For the Ambria series, our influences include Tolkien, Joseph Campbell, Glen Cook, Greek mythology, and various equine and religious studies.

    Do you have a daily writing routine?

    Like coding on a fun project, when we get to write, we get up early and just start working. When we get hungry or thirsty we stop. The day seems to fly by and we are very happy. We try not to mix writing days with coding days, as we like to focus on one thing at a time. Neither of us are big believers in multi-tasking, so sticking to one thing is important to us.

    What’s your favorite part of the writing process?

    Getting to write with my favorite person in the whole world, Audrey Roy Greenfeld. :-)

    Also, having people read our stuff and comment on it, both positively and negatively.

    Do you ever get writer’s block?

    Not usually. Our delays are almost always because of other things getting in the way. We’re very fortunate that way!

    When I do get writers block, I try to do something active. Be it exercise or fix something in the house that needs it.

    Considering you can do cartwheels, I am assuming you are pretty fit. Do you think technology folks don’t give it enough importance?

    I’m older than I look but even dealing with an unpleasant knee injury move faster and better than 90% of software developers. And when I look at other coders my age, I see people old before their years. I believe youth is fleeting unless you take a little bit of time every day to keep your strength and flexibility.

    Anything else you would like to say?

    To paraphrase Jurassic Park, “Just because you can do a thing doesn’t mean you should do a thing.”

    As software developers, we have skills that let us do amazing things. With enough time and experience, we can do pretty much anything we are asked to do. That said, we should consider whether or not we should always do what we are asked to do.

    For example, the combined power of image recognition, big data, and distributed systems is really fun to play with, but we need to be aware that these tools can be dangerous. In the past year we’ve seen it used to affect opinion and elections, and this is only the beginning. It’s our responsibility to the future to be aware that the tools we are playing with have a lot of power, and that the people who are paying us to use them might not have the best intentions.

    Hence why I like to say, “Just because you can do a thing doesn’t mean you should do a thing.”

    Do checkout “Two Scoops of Django 1.11: Best Practices for the Django Web Framework” by Two Scoops Press

    Comments →

    Django Release Schedule and Python 3

    Do long term releases confuse you? For the longest time I was not sure which version of Ubuntu to download - the latest release or the LTS? I see a number of Django developers confused about Django’s releases. So I prepared this handy guide to help you choose (or confuse?).

    Which Version To Use?

    Django has now standardized on a release schedule with three kinds of releases:

    Feature Release: These releases will have new features or improvements to existing features. It will happen every 8 months and will have 16 months of extended support from release. They have version numbers like A.B (note there’s no minor version).

    Long-Term Support (LTS) Release: These are special kind of feature releases, which have a longer extended support of three years from the release date. These releases will happen every two years. They have version numbers like A.2 (since every third feature release will be a LTS). LTS releases have few months of overlap to aid in a smoother migration.

    Patch Release: These releases are bug fixes or security patches. It is recommended to deploy them as soon as possible. Since they have minimal breaking changes, these upgrades should be painless to apply. They have version numbers like A.B.C

    Django roadmap visualized below should make the release approach clearer:

    Django Releases (LTS and feature releases) explained

    The dates are indicative and may change. This is not an official diagram but something that I created for my understanding.

    The big takeaway is that Django 1.11 LTS will be the last release to support Python 2 and it is supported until April 2020. Subsequent versions will use only Python 3.

    The right Django version for you will be based on how frequent you can upgrade your Django installation and what features you need. If your project is actively developed and the Django version can be upgraded at least once in 16 months, then you should install the latest feature release regardless of whether it is LTS or non-LTS.

    Otherwise, if your project is only occasionally developed then you should pick the most recent LTS version. Upgrading your project’s Django dependency from one feature release to another can be a non-trivial effort. So, read the release notes and plan accordingly.

    In any case, make sure you install Patch Releases as soon as they are released. Now, if you are still on Python 2 then read on.

    Python 3 has crossed tipping point

    When I decided to use Python 3 only while writing my book “Django Design Patterns and Best Practices” in 2015, it was a time when Python 2 versus Python 3 was hotly debated. However, to me Python 3 seemed much more cleaner without arcane syntax like class methods named __unicode__ and classes needing to derive from object parent class.

    Now, it is quite a different picture. We just saw how Django no longer supports Python 2 except for the last LTS release. This is a big push for many Python shops to consider Python 3.

    Many platforms have upgraded their default Python interpreter. Starting 1st March 2018, Python 3 is announced to be the default “python” in Homebrew installs. ArchLinux had completely switched to Python 3 since 2010.

    Fedora has switched to Python 3 as its system default since version 23. Even though python command will launch python3, the symlink /usr/bin/python will still point to python2 for backward compatibility. So it is probably a good idea to use #!/usr/bin/env python idiom in your shell scripts.

    On 26 April 2018, when Ubuntu 18.04 LTS (Bionic Beaver) will be released, it is planned to be have Python 3.6 as default. Further upstream, the next Debian release in testing - Debian 10 (Buster) is expected to transition to Python 3.6.

    Moving to packages, the Python 3 Wall of Superpowers shows that with the backing of 190 out of 200 packages, at the time of writing, we have nearly all popular Python packages on Python 3. The only notable package remaining is supervisor, which is about to turn green in supervisor 4.0 (unreleased).

    Common Python 3 Migration Blockers

    You might be aware of atleast one project which is still on Python 2. It could be open source or an internal project, which may be stuck in Python 3 for a number of reasons. I’ve come across a number of such projects and here is my response to such reasons:

    Reason 1: My Project is too complex

    Some very large and complex projects like NumPy or Django have been migrated sucessfully. You can learn the migration strategies of projects like Django. Django maintained a common codebase for Python 2 and 3 using the six (2 × 3=6, get it?) library before switching to Python 3 only.

    Reason 2: I still have time

    It is closer than you think. Python clock shows there is a little more than 2 years and 2 months left for Python 2 support.

    In fact, you have had a lot of time. It has been ten years since Python 3 was announced. That is a lot of overlap to transition from one version to another.

    In today’s ‘move fast and break things’ world, a lot of projects decide to abruptly stop support and ask you to migrate as soon as a new release is out. This is a lot more realistic assumption for enterprises which need a lot more planning and testing.

    Reason 3: I have to learn Python 3

    But you already know most of it! You might need about 10 mins to learn the differences. In fact, I have written a post to guide Django coders to Python 3. Small Django/Python 2 projects need only trivial changes to work on Python 3.

    You might see many old blog posts about Python 3 being buggy or slow. Well, that has not been true for a while. Not only it is extremely stable and bug-free, it is actually used in production by several companies. Performance-wise it has been getting faster in every release, so it is faster than Python 2 in most cases and slower in a few.

    Of course, there are lot of awesome new features and libraries added to Python 3. You can learn them as and when you need them. I would recommend reading the release notes to understand them. I will mention my favourites soon.

    Reason 4: Nobody is asking

    Some people have the philosophy that if nobody is asking then nobody cares. Well, they do care if the application they run is on an unsupported technology. Better plan for the eventual transition than rush it on a higher budget.

    Are you missing out?

    Image by https://pixabay.com/en/users/GlenisAymara-856260/

    To me, the biggest reason to switch was that all the newest and greatest features were coming to Python 3. My favourite top three exclusive features in Python 3 are:

    • asyncio: One of the coolest technologies I picked up recently. The learning process is sometimes mind-bending. But the performance boost in the right situations is incredible.

    • f-strings: They are so incredibly expressive that you would want to use them everywhere. Instant love!

    • dict: New compact dict implementation which uses less memory and are ordered!

    Yes, FOMO is real.

    Apart from my personal reasons, I would recommend everyone to migrate so that the community benefits from investing efforts into a common codebase. Plus we can all be consistent on which Python to recommend to beginners. Because

    There should be one– and preferably only one –obvious way to do it. Although that way may not be obvious at first unless you’re Dutch.

    This article contains an excerpt from the upcoming second edition of book "Django Design Patterns and Best Practices" by Arun Ravindran

    Comments →

    A Gentle Introduction to Creating a Minimal Hugo Site

    When I started using Hugo, I was very impressed by its speed. But I was daunted by the directory structure it creates for a new project. With directory names like archetypes and static, a Hugo site felt unfamiliar and confusing. Fortunately, not every site needs them.

    This post tells you how to start small with just the bare minimum files and directories to build a Hugo site without errors. Being minimal, this site will have only one page (essentially, the home page).

    You can see the finished project on Github. Let’s start looking at only the top-level files of the project:

    .
    ├── config.toml
    ├── content/
    ├── .git/
    ├── .gitignore
    └── themes/
    

    There are only two directories: content which contains your site’s content like posts or articles and themes which are contain various themes - the non-content part of your site like its design and page layouts.

    For Hugo, config.toml contains all the configuration settings of your site like the name of the site, author name, theme etc. For this minimal site, we will only mention two lines:

    baseURL = "http://example.org/"
    theme = "bare"
    

    This is a TOML file. It has a very simple syntax. Each line is written like this:

    key = "value"

    The baseURL value mentions the URL where the site will be published. Strictly speaking, we don’t need it for a minimal site. But Hugo throws an error if baseURL is not specified.

    Next, we mention that we are using the “bare” theme. Essentially, “bare” is a directory inside “themes” directory. We will look at it closely soon.

    The Git files are worth mentioning. I prefer to exclude the generated site (having rendered HTML pages) from Git. Assuming that the generated files will go into a directory named “public”, my .gitignore file is simply this:

    public
    

    Content files

    A content file is usually a text file containing your blog post or article. Typically content files are written in Markdown syntax. But Hugo supports other text formats like Asciidoc, reStructuredText or Org-Mode, which gets converted to HTML. This is easier than directly editing HTML files.

    The only content file in this minimal site is _index.md. This filename is special to Hugo and used to specify a page leading to a list of pages. Typically, an index file is used for a home page, a section, a taxonomy or a taxonomy terms listing.

    Our _index.md looks like this:

    +++
    title = "Home sweet home"
    +++
    
    This page has **bold** and *italics* formatting.
    

    The two +++ separators divides the document into two parts - the front matter (first three lines) in TOML format and the rest of the document in Markdown format. Front matter specifies the metadata of the file like its title, date or category. Most text editors will treat this file as a Markdown file (due to the .md extension) and ignore the front matter.

    Since the _index.md is located inside content directory at the top-level, it will be the first page seen when the user opens the baseURL location i.e. the home page. However, for this page to be rendered, there must be a corresponding template within the theme.

    Theme files

    So far, we have not specified the look and feel of the site. This is typically mentioned in a separate theme directory (it can also be mentioned inside a layouts directory within the site but its more cleaner this way).

    Unlike say Wordpress themes, you might not be able to download an arbitrary Hugo theme and apply it to your site. This is because a theme makes certain assumptions like that your site is a blog and the posts are one-level deep etc, which might not be true in your case. So, I prefer making my own theme while creating new kind of sites. Besides you would eventually want to customize your theme anyway.

    The bare theme is located inside the themes directory. Here is directory structure of that theme:

    themes/
    └── bare/
        ├── layouts/
        │   └── index.html
        └── theme.toml
    

    Note that this is literally a “bare” theme in that it has no stylesheets or images. It can just render a single home page in plain HTML.

    The theme.toml like config.toml contains some metadata about this theme. As seen below, it is fairly self-explanatory:

    name = "Bare"
    license = "MIT"
    

    The layouts directory contains templates which specify how your content files should be rendered into HTML. As you might have guessed, index.html at the top-level is used for rendering a top-level _index.md. This file contains:

    <html>
      <body>
        <h1>Welcome!</h1>
    
        <h2>{{ .Title }}</h2>
        {{ .Content }}
    
      </body>
    </html>
    

    This looks like an HTML page except for parts enclosed in double curly braces ( {{ something }} ). These parts are in Go Template language and will be replaced by their values.

    To understand what .Title means you have to understand that the dot in the beginning refers to the current context, which is the current page. In other programming languages this might be written as ThisPage.Title, but here ThisPage is implied and hence omitted.

    The value for .Title comes from the front matter. While rendering “_index.md”, it will be replaced by “Home sweet home” (refer _index.md mentioned earlier). The value for .Content will be the rendered HTML from the Markdown file.

    Most themes will have certain common elements across a site like a header and footer. Just because we only need a template for one page, the “bare” theme omits all those flourishes.

    You can checkout the finished project at Github. You can use this as a landing page or could be a starting point to a larger Hugo project.

    Rendering and Publishing

    While developing a Hugo site or previewing a post that is being composed, you might want to use the built-in test server. The files are not generated (in the project folder) but you can browse your site locally in your browser

    For local development, execute the following command (in the project directory):

    $ hugo server -w
    

    This is what you should see in your browser:

    Minimal site in hugo

    Once you are ready to publish the site, you’ll need the generate the files into an output directory. The following command will create generate your site into the “public” directory:

    $ hugo --destination public
    

    Now you can copy the files from public directory into the web root of “example.com” or whichever domain that you mentioned as your baseURL. You can use GitHub Pages to host this site for free. I would recommend reading up instructions in their GitHub Pages documentation.

    Next, I would recommend looking at other Hugo sites to understand how various features of Hugo have been used. Personally, I wrote this post so that I have a minimal starting point when I start a new project. Enjoy exploring Hugo!

    Comments →

    Migrating Blogs (Again) from Pelican to Hugo

    Our universe is in a continuous cycle of creation and destruction. How can this humble site escape that cosmic dance? It is time to change the blogging platform of arunrocks.com again.

    You might remember my earlier lengthy justification of using Pelican. I have done numerous hacks and plugins to twist and bend it to my liking. While it has served me well for the past four years, it is time to move on.

    The reason, of course, is a much better static site generator - Hugo. A fairly young but very actively developed project, Hugo has some extremely compelling set of qualities, which made the choice to migrate my site fairly obvious. Some of them are:

    1. Content Organization: Hugo believes in the philosophy that your content should be arranged in the same way they are intended for the rendered website. This brings tremendous clarity in identifying file locations and reduces the need for hacks.

      By default, some blogging engines (including Pelican) assume that all your “pages” will be finally rendered into the pages folder. A configuration setting can change it to use any URL scheme. But why doesn’t it leave it where it was? The source structure is nested and organized. Why flatten it and lose that information?

    2. Extremely Fast: Even if your site contains hundreds of files, the build time is usually under a second, without cached compilation,… inside a virtual machine, … without an SSD. Yes, it is unbelievable.

      This makes the edit-preview development cycle a pleasure to use. Tweak an SCSS variable and the browser will live reload almost instantaneously. I make a hundred tweaks after a page gets rendered, so a short feedback loop is ideal for my workflow.

    3. No Stack: This is usually marketed as the “single executable” advantage of Go applications. But it matters much more profoundly in the case of Hugo.

      Many new languages like Javascript and Python need a pile of third party libraries to make their tools run (even if they are “batteries included”). While it might be easy to setup initially with a single command like pip install Foo, future invocations might need you to update everything to their latest versions first and then fixing whatever breaks.

      Slacking Off T shirt

      Even though dynamic languages do not require a lengthy compilation step, a constant stream of updates can get tiring. “Updating my packages”” has become the new “I am compiling”.

      Ok, end of my rant. Hugo is a single self contained executable (in all platforms). This not only reduces the number of moving pieces that you need to keep track of (my requirements.txt for the site listed 20 packages), it simplifies the ongoing maintenance of your site.

      I would rather not have the headache of managing code environments just for writing a blog. In Arch Linux, where I develop my site, it is even easier considering it is a rolling distribution. Hugo automatically updates along with the rest of the system. The updates are usually backward compatible and nothing breaks.

    4. Actively Developed: Many static blog generator projects have this problem. The early days would have the community engaged and actively involved. Then the interest dwindles, code gets stale and the backlog of issues grows. Hugo is still in its early days and has a energetic community. Hopefully the interest will continue to grow.

    5. Not too Blog-oriented: My site, like most personal sites, is predominantly a blog. But it is not just a blog, it is a full-fledged site. Hugo has very minimal assumptions about what kind of site you have. For instance, I can convert a JSON file with all my talks into a Talks page. I just need to define a template for a new type of page in my layouts.

    It is not all Peaches and Cream

    Hugo does have some rough edges. But considering the rate of its development, you might find that all these problems have been solved by the time you read it. But, for the record, I did come across these issues:

    1. Lack of Static Assets Pipeline: I use SCSS to create my CSS files. In general, Hugo is unaware of any static assets pipeline which may involve compiling SCSS, minification or compression. I use a Makefile which invokes the compiler as a dependency for build. Sometimes I need to stop live reloading and explicitly call Make. Since everything is fast, it doesn’t matter much.

      Wasabi restaurant at Tysons Corner Center in McLean, Virginia taken by Ben Schumin“active section in menu” pattern used in most website menus a bit hard to implement. There are others who have faced similar issues. I resorted to not using this pattern for now.

    2. Outdated Blog posts Hugo documentation is extensive and covers a lot of material. But good documentation is very hard to get right. Some sections are confusing and I often look for better explanations say from blogs.

      But one needs to watch out for outdated content. For instance, I needed to change my RSS feed location and one blog recommended changing RSSUri in the config file. This is deprecated. Now, you need to use output formats say like this:

      outputFormats.RSS:
        BaseName: "index"
        Path: "feed/index.xml"
      
    3. Python Tooling: While using Pelican, I was happy that if I needed any additional functionality, I could always write a plugin in Python. This doesn’t worry me for two reasons. First, Hugo’s template system is very expressive and handles many of such needs. Second, ShortCodes offer a much cleaner alternative to plugins.

    4. Inconsistent Casing Sometimes a variable is mentioned in title case in one place and lower case in another. I would end up wondering whether it is copyright or Copyright or CopyRight. Probably, there is a naming convention. But I haven’t found it yet.

    As you can see, most of these issues are solvable. Hugo is a long way from version 1.0 (currently we have v20.7). So this might all be fixed by then.

    Changing a Jet Engine Mid-flight

    Once your site achieves a certain amount of traffic, then making any change is fraught with risk. Every time I change this site’s underlying platform I make a short TODO list. It looks something like this:

    • Setup Redirects: Map the old link structure to the new ones
    • Fix Renders: Markdown is not quite a standard. Each Markdown engine has its own quirk. I manually check the rendered HTML is some cases.
    • Check Missing Images: Things have to move around to adjust for various source layouts. So images, scripts or download links can return 404s.

    Even after all these checks, I still need to watch the error log for omissions. As this happens in my spare time, the whole migration “project” takes months. It is pretty much like changing a jet’s engine mid-flight.

    Some Takeaways

    In short, I would say you can never underestimate the sheer amount of work involved in migrating a site. There is a very good post on site migration by folks at Mozilla that covers many aspects that people tend to miss. Unless you have very good reasons to move, I would suggest that you stick with your blogging platform of choice.

    As for me, time to start thinking about migration to HTTPS 😉

    Comments →

    Page 1 of 35 Older »