Ray Tracer in Python (Part 6) - Show Notes of "Firing All Cores"
When it comes to code optimization, there is an overused statement about Premature Optimization. Here is the rarely-shown full quote:
“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” – Donald Knuth
I understand this as not worrying about small optimizations all the time. It is best to focus on the areas which have the most return on (time) investment. Usually you need to do extensive performance measurements to find those areas but there are some natural places to look.
So far, we have been not too concerned about performance especially when it affected readability (else we wouldn’t be writing it in Python). However, in the previous part we got a huge speedup by switching to the PyPy interpreter.
To me the next big opportunity lies in using all the cores. As an experienced Python programmer, I know this means using the multiprocessing module. But I also need to explain why other options, notably Python threads, would not help.
To demonstrate this I wrote a multi-threaded C program, a multi-threaded Python program and a multi-processing Python program - all doing the same thing. What was most entertaining was to watch the GIL in action where it allows only one thread to execute at a time.
Race conditions are often explained using a bank ATM example. I think it can be explained in a non technical way using many real life scenarios. I wanted to explain it using a cooking example. It’s weird how many asynchronous and parallel concepts can be taught using what we do in the kitchen. So much so that technical job interviews can be made more effective by simply checking your culinary skills (“Sorry we don’t have a whiteboard, just a cutting board for you.").
Ideally the entire explanation should be animated. My drawing skills are decent but I cannot animate even a bouncing ball. Don’t laugh, just try it. Firstly, there are several cues that any animated object gives which can be only be detected if you watch in slow motion. The Twelve Principles of Animation is a good start.
Secondly, it is a lot of work. A good animator will know how to give maximum expression with the minimum frames. Timing is also crucial in animation. Even a subtle timing error makes animations look janky or confusing. Mere mortals would need to spend days to eventually produce a 5 second clip.
So I took the safe approach by showing a slideshow. I think it looks decent. I might post it as a separate clip on Race Conditions so that is doesn’t get lost in this video.
These are the topics we will cover in this episode:
One of the fascinating ways to build a complex system like a computer is to build it from scratch using simple logic gates (a good book which shows how is The Elements of Computing Systems). Same goes for late John Conway’s Game of Life which starts with a simple set of rules but shows very life-like behaviour. This is called Emergent complexity. I guess the only way to understand something that seems complex is to understand its basic principles.
Look carefully at the, by now familiar, image that we will be rendering by the end of this part:
Notice within the pinkish-purple ball there is a reflection of the red ball reflecting the pink ball. This kind of detailing that unfolds on closer inspection is what gives realistic renders its beauty. Yet it is governed by the most basic laws of Physics like the laws of reflection.
The law tells us that the angle of reflection is the same as the angle of incidence. But if we apply it to the world of vectors a rather different looking formula emerges that could be derived from the very same law. The reflected ray is again traced and the process continues.
If you have been following the series closely so far, then you might point out that we have already dealt with this earlier. Yes, diffuse and specular shading are special cases of a material reflecting light. But since we are now considering mirror-like reflecting surfaces it is time to look at the general formula for reflection.
The computation for each pixel will now increase many fold so the overall render time will increase proportional to the maximum depth of reflections we need.
A plainly colored object is not quite interesting so you would find even the earliest ray traced images containing a chessboard pattern:
So I introduce a chessboard pattern generated procedurally into the scene. Procedural textures are fascinating and a lot of fun to make. Compared to image textures, they have almost infinite detail. Sort of like analog versus digital.
The chessboard pattern’s formula is easy to guess so it is an ideal introduction. Once you start playing around there is an entire universe of textures to explore with marble, Voronoi and Perlin noise patterns. Some even go to the extend of building entire scenes with only procedural textures. This is deeply satisfying but probably pointless.
Most toy raytracers are happy generating their scene in the main program. But this quickly becomes frustrating when you want to render a couple of examples. The straightforward solution would be to define a scene as data say using JSON and import the scene given as an argument. This is how games like Doom load levels.
JSON, YAML and other configuration languages are deceptively simple to read but you could spend a lot of time writing them due to their tiny quirks with commas and whitespaces. It is also not suited for scenes generated procedurally which is happens quite a bit in ray tracers. You would soon wish if these languages were Turing complete. So I decided to ditch all that and use plain old Python instead.
To be honest, I was not comfortable in allowing a given Python file to describe a scene. But the power and flexibility it allows is really a great tradeoff for the security. I used importlib to import modules inspired by Django.
I can now define a new procedural texture material class inside a scene! This makes the ray tracer quite extensible like a plugin system. I love this approach and look forward to trying this in future projects.
Towards the end we see a dramatic 7X speedup of the ray tracer due to the use of PyPy. I also mention my rough rule of thumb to increase Python performance:
Processor Bound? Try Pypy.
IO Bound? Try AsyncIO.
If you are learning to improve the performance of your Python program, this would be pretty bad advice. In that case, make sure you first profile your program and identify the performance hotspots. Then try different ways to optimize those places. After you have tried all that and the performance is still bad, then you can use my rule of thumb for unconventional ways to get great results.
Ray tracing concepts
With this part, I would have covered all the basic ray tracing concepts that I had planned to cover. The next part would be about improving the performance of the ray tracer by using multiple cores.
Many have contacted me asking whether I would be covering topics like Dielectrics, Depth of Field, Anti-aliasing etc. I think there are enough books like Ray Tracing Gems and Ray Tracing in One Weekend which cover all that and much more. If you have followed this series then reading those books would be much easier and you would have a ready implementation to tinker with.
Nevertheless, I may work on a follow-up if people find that useful and time permits. So do let me know.
These are the topics we will cover in this episode:
The first ray-traced animation I had ever seen was in my first year of Engineering. My senior showed us a gold-plated logo of the institution slowly spinning and glinting in the light. It had an amazing level of detail - a triangular frame, a gear wheel on the left, a tower on the right and even a palm tree in the center. We were dying to know how he did it. When he showed us, it blew our minds - it was all code! Every shape was made by combining (or subtracting) primitive solids like spheres, cylinders and cubes. It featured on the masthead of the first edition of our online magazine (distributed on 3 1⁄2 inch floppies).
Of course, these days 3D modelling is mostly done by artists in a graphics program like 3ds Max or Maya. But not everything is tinkered by the hand of an artist. Procedural code is still used to create thousands of unique soldiers in an army scene or an infinite world in a multiplayer game. Computer generated art can often surprise or amuse you.
So when I had to design the model of the omnipresent SAR-Cov-2 virus for a poster, even though I was very tempted to use such tools, I opted to generate the model in code. My Python raytracer (currently in it’s fourth part of the video series) was decent enough to create 3D looking spheres. Having only spheres is a big constraint but you can be more creative under constraints.
There is a process fundamental to artistically recreating any real world object – Deconstruction. When you learn to draw you are asked to “block out” the simple shapes you are trying to draw. So a face becomes an elongated sphere, a nose becomes a triangle etc. If your big shapes are correctly proportioned then it will be a lot easier to add in the details.
In good old days, the computer artists used graph paper. Like my senior who traced the logo from the college magazine, enlarged it on graph paper and wrote down all the coordinates. These days you could use a 2D drawing program like Inkscape to overlay a photograph and trace over it. But the idea is essentially the same - find all the large blocking shapes that make your image.
Generating the model in the computer can be a process of trial and error. In the days of slower computers, rendering a simple object took nearly an hour. Even with today’s dramatically faster machines, rendering a complex scene could take hours. There is simply a ton of numbers to crunch.
In the case of the coronavirus, I needed to figure out how to make a crown. Essentially I needed to position points evenly around a circle. People say a lot of things about high school trigonometry and how much of a waste of time it was. But I found it quite useful not only to arrange the spikes of the crown but also give it a slight random shift. This would mean each time you render the virus would look sightly different. I had a lot of fun animating several renders to a beat in the video.
I couldn’t stop playing with the model even after the poster was made. I tweaked it into various color combinations that I found online. Ironically, it looks distinctively charming in any avatar.
It shows that simple things are awesome than the most professionally done creations. Especially if you could make your own. Looking forward to seeing what others would create with this.
Probably a global lockdown is a good time to resume a series that demands a lot of your time. Not really if your kids are home too. But I’ve been getting enough reminders through comments and emails that I thought it’s time to tackle this challenging episode.
Yes, this is an episode with a lot of Physics and Mathematics. But you will observe it has a very gentle learning curve since there is enough background presented in the problem requirements and in previous episodes. The real challenge is in translating the math to code.
Computers are computing devices crunching numbers all the time. So you might think implementing math formulas should be straightforward. Unless you are a scientist or a mathematician, in which case, you would know the truth which is – it is not straightforward at all.
I remember before Doom came out, it was very hard to develop a 3D engine. Most of the available research involved understanding computer graphics text books with a lot of math theory. It’s a painful task to translate formulas into algorithms and later performant code. There are many ways you could get it wrong. But only one way to get it right. Even more frustrating is forgetting to account for edge conditions like a negative result or divide by zero and floating point errors.
I remember referring one of the most popular books at that time Numerical Recipes in C++ to understand how to implement some of the algorithms. But I had to abandon the project after a working prototype because making a game engine (alone) is a tonne of work!
In this part, I have used a lot of images and slides to make sure that I don’t skip any explanations (like how cosine and dot products are related). I know that these videos are being watched by all kinds people – students, experienced programmers and non-programmers. So I have not made any assumptions.
Hopefully you would enjoy this part without any challenges. Even if you do, it should be fun 😊
These are the topics we will cover in this episode:
In Dec 2019, we saw the release of Django 3.0 with an interesting new feature — support for ASGI servers. I was intrigued by what this meant. When I checked the performance benchmarks of asynchronous Python web frameworks they were ridiculously faster than their synchronous counterparts often by a factor of 3x–5x.
So I set out to test Django 3.0 performance with a very simple Docker setup. The results — though not spectacular — were still impressive. But before that, you might need a little bit of background about ASGI.
Before ASGI there was WSGI
It was 2003 and various Python web frameworks like Zope, Quixote used to ship with their own web servers or had their own home grown interfaces to talk to popular web servers like Apache.
Being a Python web developer meant a devout commitment to learning an entire stack but relearning everything if you needed another framework. As you can imagine this led to fragmentation. A PEP 333 - “Python Web Server Gateway Interface v1.0” tried to solve this problem by defining a simple standard interface called WSGI (Web Server Gateway Interface). Its brilliance was in its simplicity.
In fact the entire WSGI specification can be simplified (conveniently leaving out some hairy details) as the server side invoking a callable object (i.e. anything from a Python function to a class with a call method) provided by the framework or the application. If you have a component that can play both roles, then you have created a “middleware” or an intermediate layer in this pipeline. Thus, WSGI components can be easily chained together to handle requests.
WSGI became so popular that it was adopted not just by the large web frameworks like Django and Pylons but also by microframeworks like Bottle. Your favourite framework could be plugged into any WSGI-compatible application server and it would work flawlessly. It was so easy and intuitive that there was really no excuse not to use it.
Road ‘Blocks’ to Scale
So if we were perfectly fine with WSGI, why did we have to come up with ASGI? The answer will be quite evident if you have followed the path of a webrequest. Check out my animation of how a webrequest flows into Django. Notice how the framework is waiting after querying the database before sending the response. This is the drawback of synchronous processing.
Frankly this drawback was not obvious or pressing until Node.js came into the scene in 2009. Ryan Dahl, the creator of Node.js, was bothered by the C10K problem i.e. why popular web servers like Apache cannot handle 10,000 or more concurrent connections (given a typical web server hardware it would run out of memory) . He asked “What is the software doing while it queries the database?".
The answer was, of course, nothing. It was waiting for the database to respond. Ryan argued that webservers should not be waiting on I/O activities at all. Instead it should switch to serving other requests and get notified when the slow activity is completed. Using this technique, Node.js could serve many orders of magnitude more users using less memory and on a single thread!
It was becoming increasingly clear that asynchronous event-based architectures are the right way to solve many kinds of concurrency problems. Probably that is why Python’s creator Guido himself worked towards a language level support with the Tulip project, which later became the asyncio module. Eventually Python 3.7 added the new keywords async and await to support asynchronous event loops. This has pretty significant consequences in not just how Python code is written but executed as well.
Two Worlds of Python
Though writing asynchronous code in Python might seem as easy as sliding an async keyword in front of a function definition, you have to be very careful not to break an important rule - Do not freely mix synchronous and asynchronous code.
This is because synchronous code can block an event loop in asynchronous code. Such situations can bring your application to a standstill. As Andrew Goodwin writes this splits your code into two worlds - “Synchronous” and “Asynchronous” with different libraries and calling styles.
Coming back to WSGI, this means we cannot simply write an asynchronous callable and plug it in. WSGI was written for a synchronous world. We will need a new mechanism to invoke asynchronous code. But if everyone writes their own mechanisms we would be back to the incompatibility hell we started with. So we need a new standard similar to WSGI for asynchronous code. Hence, ASGI was born.
ASGI had some other goals as well. But before that let’s look at two similar web applications greeting “Hello World” in WSGI and ASGI style.
Notice the change in the arguments passed into the callables. The scope argument is similar to the earlier environ argument. The send argument corresponds to start_response. But the receive argument is new. It allows clients to nonchalantly slip messages to the server in protocols like WebSockets that allow bidirectional communications.
Like WSGI, the ASGI callables can be chained one after the other to handle web requests (as well as other protocol requests). In fact, ASGI is a superset of WSGI and can call WSGI callables. ASGI also has support for long polling, slow streaming and other exciting response types without side-loading resulting in faster responses.
Thus, ASGI introduces new ways to build asynchronous web interfaces and handle bi-directional protocols. Neither the client or server needs to wait for each other to communicate - it can happen any time asynchronously. Existing WSGI-based web frameworks being written in synchronous code would not support this event-driven way of working.
This also brings us to the crux of the problem with bringing all the async goodness to Django - all of Django was written in synchronous style code. If we need to write any asynchronous code then there needs to be a clone of the entire Django framework written in asynchronous style. In other words, create two worlds of Django.
Well, don’t panic — we might not have to write an entire clone as there are clever ways to reuse bits of code between the two worlds. But as Andrew Godwin who leads Django’s Async Project rightly remarks “it’s one of the biggest overhauls of Django in its history”. An ambitious project involving reimplementation of components like ORM, request handler, Template renderer etc in asynchronous style. This will be done in phases and in several releases. Here is how Andrew envisions it (not to be taken as a committed schedule):
You might be thinking what about the rest of the components like Template rendering, Forms, Cache etc. They may still remain synchronous or an asynchronous implementation be fitted somewhere in the future roadmap. But the above are the key milestones in evolving Django to work in an asynchronous world.
That brings us to the first phase.
Django talks ASGI
In 3.0, Django can work in a “async outside, sync inside” mode. This allows it to talk to all known ASGI servers such as:
Daphne - an ASGI reference server, written in Twisted
Uvicorn - a fast ASGI server based on uvloop and httptools
Hypercorn - an ASGI server based on the sans-io hyper, h11, h2, and wsproto libraries
It is important to reiterate that internally Django is still processing requests synchronously in a threadpool. But the underlying ASGI server would be handling requests asynchronously.
This means your existing Django projects require no changes. Think of this change as merely a new interface by which HTTP requests can enter your Django application.
But this is a significant first step in transforming Django from “outside-in”. You could also start using Django on the ASGI server which is usually faster.
How to use ASGI?
Every Django project (since version 1.4) ships with a wsgi.py file, which is a WSGI handler module. While deploying to production, you will point your WSGI server like gunicorn to this file. For instance, you might have seen this line in your Docker compose file
command: gunicorn mysite.wsgi:application
If you create a new Django project (for e.g. created by running the django-admin startproject command) then you will find a brand new file asgi.py alongside wsgi.py. You will need to point your ASGI server (like daphene) to this ASGI handler file. For example, the above line would be changed to:
command: daphene mysite.asgi:application
Note that this requires the presence of an asgi.py file.
Running Existing Django Projects under ASGI
None of the projects created before Django 3.0 have an asgi.py. So how do you go about creating one? It is quite easy.
Here is a side-by-side comparison (docstrings and comments omitted) of the wsgi.py and asgi.py for a Django project:
If you are squinting too hard to find the differences, let me help you - everywhere ‘wsgi’ is replaced by ‘asgi’. Yes, it is as straightforward as taking your existing wsgi.py and running a string replacement s/wsgi/asgi/g.
You should take care not to call any sync code in the ASGI Handler in your asgi.py. For example, if you make a call to some web API within your ASGI handler for some reason, then it must be an asyncio callable.
ASGI vs WSGI Performance
I did a very simple performance test trying out the Django polls project in ASGI and WSGI configurations. Like all performance tests, you should take my results with liberal doses of salt. My Docker setup includes Nginx and Postgresql. The actual load testing was done using the versatile Locust tool.
The test case was opening a poll form in the Django polls application and submitting a random vote. It makes n requests per second when there are n users. The wait time is between 1 and 2 seconds.
The results shown below indicate around 50% increase in the number of simultaneous users when running in ASGI mode compared to WSGI mode.
As the number of simultaneous request ramp up the WSGI or ASGI handler will not be able to cope up beyond a certain point resulting in errors or failures. The requests per second after the WSGI failures start varies wildly. The ASGI performance is much more stable even after failures.
As the table shows, the number of simultaneous users is around 300 for WSGI and 500 for ASGI on my machine. This is about 66% increase in the number of users the servers can handle without error. Your mileage might vary.
I did a talk about ASGI and Django at BangPypers recently and there were a lot of interesting questions the audiences raised (even after the event). So I thought I’ll address them here (in no particular order):
Q. Is Django Async the same as Channels?
Channels was created to support asynchronous protocols like Websockets and long polling HTTP. Django applications still run synchronously. Channels is an official Django project but not part of core Django.
Django Async project will support writing Django applications with asynchronous code in addition to synchronous code. Async is a part of Django core.
Both were led by Andrew Goodwin.
These are independent projects in most cases. You can have a project that uses either or both. For example if you need to support a chat application over web sockets, then you can use Channels without using Django’s ASGI interface. On the other hand if you want to make an async function in a Django view, then you will have to wait for Django’s Async support for views.
Q. Any new dependencies in Django 3.0?
Installing just Django 3.0 will install the following into your environment:
The asgiref library is a new dependency. It contains sync-to-async and async-to-sync function wrappers so that you can call sync code from async and vice versa. It also contains a StatelessServer and a WSGI-to-ASGI adapter.
Q. Will upgrading to Django 3.0 break my project?
Version 3.0 might sound like a big change from its previous version Django 2.2. But that is slightly misleading. Django project does not follow semantic version exactly (where a major version number change may break the API) and the differences are explained in the Release Process page.
You will notice very few serious backward incompatible changes in the Django 3.0 release notes. If your project does not use any of them, then you can upgrade without any modifications.
Then why did the version number jump from 2.2 to 3.0? This is explained in the release cadence section:
Starting with Django 2.0, version numbers will use a loose form of semantic versioning such that each version following an LTS will bump to the next “dot zero” version. For example: 2.0, 2.1, 2.2 (LTS), 3.0, 3.1, 3.2 (LTS), etc.
Since the last release Django 2.2 was long-term support (LTS) release, the following release had to increase the major version number to 3.0. That’s pretty much it!
Q. Can I continue to use WSGI?
Yes. Asynchronous programming could be seen as an entirely optional way to write code in Django. The familiar synchronous way of using Django would continue to work and be supported.
Even if there is a fully asynchronous path through the handler, WSGI compatibility has to also be maintained; in order to do this, the WSGIHandler will coexist alongside a new ASGIHandler, and run the system inside a one-off eventloop - keeping it synchronous externally, and asynchronous internally.
This will allow async views to do multiple asynchronous requests and launch short-lived coroutines even inside of WSGI, if you choose to run that way. If you choose to run under ASGI, however, you will then also get the benefits of requests not blocking each other and using less threads
Q. When can I write async code in Django?
As explained earlier in the Async Project roadmap, it is expected that Django 3.1 will introduce async views which will support writing asynchronous code like:
You are free to mix async and sync views, middleware, and tests as much as you want; Django will ensure that you always end up with the right execution context. Sounds pretty awesome, right?
At the time of writing, this the patch is almost certain to land in 3.1 now, awaiting a final review.
We covered a lot of background about what led to Django supporting asynchronous capabilities. We tried to understand ASGI and how it compares to WSGI. We also found some performance improvements in terms of increased number of simultaneous requests in ASGI mode. A number of frequently asked questions related to Django’s support of ASGI were also addressed.
I believe asynchronous support in Django could be a game changer. It will be one of the first large Python web frameworks to evolve into handling asynchronous requests. It is admirable that it is done with a lot of care not to break backward compatibility.
I usually do not make my tech predictions public (frankly many of them have been proven right over the years). So here goes my tech prediction — Most Django deployments will use async functionality in five years.
That should be enough motivation to check it out!
Thanks to Andrew Goodwin for his comments on an early draft. All illustrations courtesy of Old Book Illustrations