A farewell to Django (and Python?)

For about three years, I’ve been programming (professionally) almost exclusively with Django. It let me work fast as a solo programmer, faster than most of other programmers in my country (which are still doing mostly Java and PHP), and have the freedom to pick only the jobs I was interested in.

Things are starting to change in the web industry, though. And I’m not talking about some hyped technology that’s supposed to be the future of web programming, but about what the standard user expects of today’s web applications. Most programmers will know what I’m talking about: as client programming gets more and more complex, it’s getting harder (not to say impossible) to stay on the DRY side of things. This situation is very well explained in this article. It’s time to start looking for alternatives to LAMP and its variants.

Recently there was a series of posts in Hacker News discussing how cloudy Python’s future was (here, here and here). I don’t think Python is going away in the near future. I personally consider it the best general purpose programming language, and my weapon of choice in most cases; it’s probably what I’ll be comparing against every other language I try in the next couple of years. That being said, it’s clear that Python (using Django or some other framework) is not the best tool for some of the hottest jobs in the market, complex web applications being one of them. As a side note, it is interesting that the case of Python being “too slow” or “too CPU intensive”, which I always disregarded (and still do for the most part) for not being a bottleneck for most applications, has finally found a raison d’être in the battery consumption rate of mobile devices.

I’m not sure what the future of web programming will look like, but I sure as hell know how the present does: JavaScript.

I never liked JavaScript; I was one of those people that learned it because they had to: the browser speaks JavaScript and there’s no way around it. But saying that I learned it is overstating, I just started using it as I needed without much idea of what was going on, but with the certainty that whatever it was, it was weird. Indeed, as Douglas Crockford puts it in his JavaScript: The Good Parts:

The amazing thing about JavaScript is that it is possible to get work done with it without knowing much about the language, or even knowing much about programming. It is a language with enormous expressive power.

Over the years, it got less painful as I understood a bit more of the language (and after having experience with other languages) although I never really took the time to study its foundations and cleaner idioms. That’s what I’ve started to do now, and I must say that I see potential in JavaScript. If you stick to the good parts, that is. For one, it was JavaScript, not on its own merit as a programming language but for historical reasons, that managed for features such as dynamic typing and first class functions (that users of “better” languages have been advocating for for years) to go mainstream.

Finally the day came when Python is not the best tool for the job, at least not for my job (I plan to stay as a web programmer for the time being) and to move on to a better one. The first step, is to learn JavaScript. I mean to learn proper JavaScript.

Advertisements

Django DB optimization for pedestrians

I don’t usually deal with SQL. Ever since I moved to django, I never had the need to write a SQL line in my code, and very rarely I had to look at the database for insights on my application. This is a great thing I think; as long as you don’t have performance issues developing a web application, the best is to worry more about writing readable rather than fast code. But once pages start to take a little too long to load (and a couple of hours usually is a little too long) then it’s time to get your hands dirty. As I said, I don’t usually deal with this kind of problem, so I had to google my way out of it. I explain here how I diagnosed and solved the bottlenecks on my application; the methods used probably are far from ideal but might be of use to inexperienced programmers in such tasks (as I am).

My problem in this opportunity wasn’t intensive database access, but simply tables too big to query carelessly; the database contents are updated by a batch process once a week (the application deals mostly with searching and doesn’t modify the data). The database used is MySQL. Some background reading that I found useful was this for django optimization and this for mysql optimization.

The first thing to do (after acknowledging that I had a performance problem) was to detect the bottlenecks on my program. To accomplish this, django debug toolbar proved to be of great value; upon page load, this app will tell you what queries were executed, where in the code, the time each one took (highlighting the slow ones) and even the sql EXPLAIN output to study how to optimize it. The toolbar really cracked the problem for me. When the query was so slow that waiting for the page to load wasn’t convenient, I used the query attribute of the QuerySet API. I addressed the slow queries in turn, using different solutions in each case:

Django lookups. This is the first place to look; sometimes a django query can be easily rewritten to be more efficient. The most common case is probably retrieving a full model where only one or two fields are being used; instead those values can be retrieved using values_list.

Indexes. Here is some documentation on how to optimize through indexes, and here is the syntax to do it. This was of less use of what I initially thought when I started reading on the subject. This may be because I didn’t took the time to fully understand the output of the EXPLAIN statement (most of my queries didn’t present signs of possible optimization through indexes anyway). But one place I found indexes useful was for text searches; to take advantage of these indexes in django, one must use the search query lookup and create an index for the field being searched. For example, to search books by title, one would use code like:

Book.objects.filter(title__search=keyword)

And create an index as:

CREATE FULLTEXT INDEX title_index ON app_book(title);

MySQL configuration tuning. This is probably the hardest task without previous experience. Here are some pointers; the most useful tool I found for tweaking mysql is the MySQLTuner script, which diagnoses your system and suggests configuration improvements.

Results caching. Frequent queries (or subqueries) that hold a small enough result set are ideal candidates for caching, using the low-level cache API. I used a custom admin command to refresh the cache contents periodically.

Summary tables. I had several huge tables from which I only needed a small subset of records (and of fields per record), and were impossible to query as is. I created a script to make summary tables with the records I needed and use those in django instead. I run the script after the database is updated. Here is a quick introduction to writing mysql-python scripts.

It’s interesting to note that even though I had to make some mysql-python scripts, and rewrite some of my django queries, my code remained SQL-agnostic all along, so this helped me to reinforce the notion that if you are resorting to raw sql in your code, you probably need to take a deeper look at django’s ORM.

Deploying a django project on nginx with gunicorn

Back when I started  working with django, one of the things that I found was sort of difficult to do was deploying a project, as suggested by the documentation, using apache and mod_wsgi (and previously with mod_python), at least when it came to serve several projects and its media at the same server. Doing all those things with nginx and gunicorn is shockingly easy.

To add a new site on nginx I just touch a new file in the /etc/nginx/sites-enabled, with the contents:

server {
    listen 80;
    server_name myserver.com;

    location / {
        proxy_pass http://127.0.0.1:8888;
    }
}

Now, supposing I want to have acces and error logs, and serve my site’s and the admin site’s media, I would add some lines to the server:

access_log /var/www/django/myproject/access.log;
error_log /var/www/django/myproject/error.log;

location /media {
    root /var/www/django/myproject;
}

location /admin_media {
    alias /usr/local/lib/python2.7/dist-packages/django/contrib/admin/media;
}

Where /media and /admin_media are the MEDIA_URL and ADMIN_MEDIA_PREFIX pointed by my settings.py. That is, I have a media folder in my project root directory and the admin media is served directly from the django installation.

To actually run my django site on http://127.0.0.1:8888 I create a gunicorn configuration file in my project root (where manage.py is located):

bind = "127.0.0.1:8888"
workers = 3

Assuming I named the file gunicorn.conf.py, gunicorn is run from the same directory doing:

gunicorn_django -c gunicorn.conf.py -D

And that’s it.

When the code is updated, there’s no need to restart nginx; instead, gunicorn is restarted (not so prettily) with:

kill -HUP <main gunicorn process id>

To serve multiple sites, just use differents ports (8888 in the example) in each one’s configuration file.