Interview for This is not a Monad tutorial

unbalancedparen interviewed me for his blog This is not a Monad tutorial. I elaborated on some of the topics I usually write about in this blog, like Python, Clojure and Open Source programming.

Here’s the link to the interview.

Advertisements

A farewell to Django (and Python?)

For about three years, I’ve been programming (professionally) almost exclusively with Django. It let me work fast as a solo programmer, faster than most of other programmers in my country (which are still doing mostly Java and PHP), and have the freedom to pick only the jobs I was interested in.

Things are starting to change in the web industry, though. And I’m not talking about some hyped technology that’s supposed to be the future of web programming, but about what the standard user expects of today’s web applications. Most programmers will know what I’m talking about: as client programming gets more and more complex, it’s getting harder (not to say impossible) to stay on the DRY side of things. This situation is very well explained in this article. It’s time to start looking for alternatives to LAMP and its variants.

Recently there was a series of posts in Hacker News discussing how cloudy Python’s future was (here, here and here). I don’t think Python is going away in the near future. I personally consider it the best general purpose programming language, and my weapon of choice in most cases; it’s probably what I’ll be comparing against every other language I try in the next couple of years. That being said, it’s clear that Python (using Django or some other framework) is not the best tool for some of the hottest jobs in the market, complex web applications being one of them. As a side note, it is interesting that the case of Python being “too slow” or “too CPU intensive”, which I always disregarded (and still do for the most part) for not being a bottleneck for most applications, has finally found a raison d’être in the battery consumption rate of mobile devices.

I’m not sure what the future of web programming will look like, but I sure as hell know how the present does: JavaScript.

I never liked JavaScript; I was one of those people that learned it because they had to: the browser speaks JavaScript and there’s no way around it. But saying that I learned it is overstating, I just started using it as I needed without much idea of what was going on, but with the certainty that whatever it was, it was weird. Indeed, as Douglas Crockford puts it in his JavaScript: The Good Parts:

The amazing thing about JavaScript is that it is possible to get work done with it without knowing much about the language, or even knowing much about programming. It is a language with enormous expressive power.

Over the years, it got less painful as I understood a bit more of the language (and after having experience with other languages) although I never really took the time to study its foundations and cleaner idioms. That’s what I’ve started to do now, and I must say that I see potential in JavaScript. If you stick to the good parts, that is. For one, it was JavaScript, not on its own merit as a programming language but for historical reasons, that managed for features such as dynamic typing and first class functions (that users of “better” languages have been advocating for for years) to go mainstream.

Finally the day came when Python is not the best tool for the job, at least not for my job (I plan to stay as a web programmer for the time being) and to move on to a better one. The first step, is to learn JavaScript. I mean to learn proper JavaScript.

Django DB optimization for pedestrians

I don’t usually deal with SQL. Ever since I moved to django, I never had the need to write a SQL line in my code, and very rarely I had to look at the database for insights on my application. This is a great thing I think; as long as you don’t have performance issues developing a web application, the best is to worry more about writing readable rather than fast code. But once pages start to take a little too long to load (and a couple of hours usually is a little too long) then it’s time to get your hands dirty. As I said, I don’t usually deal with this kind of problem, so I had to google my way out of it. I explain here how I diagnosed and solved the bottlenecks on my application; the methods used probably are far from ideal but might be of use to inexperienced programmers in such tasks (as I am).

My problem in this opportunity wasn’t intensive database access, but simply tables too big to query carelessly; the database contents are updated by a batch process once a week (the application deals mostly with searching and doesn’t modify the data). The database used is MySQL. Some background reading that I found useful was this for django optimization and this for mysql optimization.

The first thing to do (after acknowledging that I had a performance problem) was to detect the bottlenecks on my program. To accomplish this, django debug toolbar proved to be of great value; upon page load, this app will tell you what queries were executed, where in the code, the time each one took (highlighting the slow ones) and even the sql EXPLAIN output to study how to optimize it. The toolbar really cracked the problem for me. When the query was so slow that waiting for the page to load wasn’t convenient, I used the query attribute of the QuerySet API. I addressed the slow queries in turn, using different solutions in each case:

Django lookups. This is the first place to look; sometimes a django query can be easily rewritten to be more efficient. The most common case is probably retrieving a full model where only one or two fields are being used; instead those values can be retrieved using values_list.

Indexes. Here is some documentation on how to optimize through indexes, and here is the syntax to do it. This was of less use of what I initially thought when I started reading on the subject. This may be because I didn’t took the time to fully understand the output of the EXPLAIN statement (most of my queries didn’t present signs of possible optimization through indexes anyway). But one place I found indexes useful was for text searches; to take advantage of these indexes in django, one must use the search query lookup and create an index for the field being searched. For example, to search books by title, one would use code like:

Book.objects.filter(title__search=keyword)

And create an index as:

CREATE FULLTEXT INDEX title_index ON app_book(title);

MySQL configuration tuning. This is probably the hardest task without previous experience. Here are some pointers; the most useful tool I found for tweaking mysql is the MySQLTuner script, which diagnoses your system and suggests configuration improvements.

Results caching. Frequent queries (or subqueries) that hold a small enough result set are ideal candidates for caching, using the low-level cache API. I used a custom admin command to refresh the cache contents periodically.

Summary tables. I had several huge tables from which I only needed a small subset of records (and of fields per record), and were impossible to query as is. I created a script to make summary tables with the records I needed and use those in django instead. I run the script after the database is updated. Here is a quick introduction to writing mysql-python scripts.

It’s interesting to note that even though I had to make some mysql-python scripts, and rewrite some of my django queries, my code remained SQL-agnostic all along, so this helped me to reinforce the notion that if you are resorting to raw sql in your code, you probably need to take a deeper look at django’s ORM.

Python doesn’t treat me like I’m stupid

I don’t know how it works worldwide, but here in Argentina, most programmers not only use Java, but are also convinced that Java is the best language available if you can do without C/C++ efficiency, and therefore don’t bother to learn other languages. There’s no such concept as the “programmer toolbox”. And this is not enforced only by the Software Industry (where non technical issues, like programmer availability come to play[1]), but sadly by a noticeable part of the academic community.

There’s a fact that I think makes this situation worse, and is that Java is a pretty dated language. What I mean is there are too different problems;  The first one is that programmers don’t bother to study other languages, so they use one tool to solve all problems. This is indeed a big issue, but more related to engineering and science than to programming. The second one is that there clearly are better languages to do what Java is supposed to do best: general purpose object oriented programming, and if you may, web applications back-end programming (which is what most programmers end up doing here). And those better languages are not obscure novelties as some like to suggest, but languages that have been around as long as Java has. Ruby and Python are the examples I’m most familiar with, but there are others, notably lisp.

I don’t know about any Java programmer that having tried a higher level language such as Python, wanted to go back to using Java. Bruce Eckel who wrote the books that are used for reference on Java and C++ in the programming courses that teach object-oriented programming at my college  noticed almost 10 years ago that sticking to Java was a huge waste of programmer’s time. A lot has been written about the advantages of Python over Java; a specially concise example are Eckel’s slides Why I love Python[2]. I also like a couple of screencasts by Sean Kelly, that focus more on web development (where Java is specially useless)[3].

I’m not interested in repeating all those points; most of them are related to language features like dynamic typing or native types. Some of them can be overcome by enhancing the language with new features, and this is what Java designers have been doing: they have been patching bad language design decisions ever since it first came out.

But there’s one particular aspect that I think defines on principle why Python is a better language than Java, and it won’t change no matter what features are added or removed. And it is the philosophy behind the language, the way they think about the programmer. Quoting Eckel: Java treats programmers like they are stupid, Python doesn’t.

Java design and libraries consistently make a huge effort making it difficult for a programmer to do bad things. If a feature was a potential means for a dumb programmer to make bad code, they would take away the feature altogether. And what’s worse, at times they even used it in the language implementation but prohibited the programmer to do so[4]. Paul Graham remarkably pointed this[5] as a potential flaw of Java before actually trying the language:

Like the creators of sitcoms or junk food or package tours, Java’s designers were consciously designing a product for people not as smart as them.

But this approach is an illusion; no matter how hard you try, you can’t always keep  bad programmers from writing bad programs. Java is a lousy replacement to the learning of algorithms and good programming practices.

The implications of this way of thinking are not limited to language features. The design decisions of the underlying language have a great effect on the programmer’s idiosyncrasy. It’s common to see Java designs and patterns that make an unnecessary effort in prohibiting potential bad uses of the resulting code, again not trusting the programmer’s judgment, wasting time and putting together very rigid structures.

The bottom line is that by making it hard for stupid programmers to do bad stuff, Java really gets in the way of smart programmers trying to make good programs. Python on the other hand, is focused on making it extremely easy to program well[6]. It gives you the choice; If you want to do crappy code is your responsibility.

[1] Paul Graham, Revenge of the Nerds.
[2] Bruce Eckel, Why I love Python. The points are extended in these interviews.
[3] Sean Kelly, Recovery from addiction, Better web app development.
[4] Bruce Eckel, The Zen of Python.
[5] Paul Graham, Java’s Cover.
[6] Tim Peters, The Zen of Python.

Deploying a django project on nginx with gunicorn

Back when I started  working with django, one of the things that I found was sort of difficult to do was deploying a project, as suggested by the documentation, using apache and mod_wsgi (and previously with mod_python), at least when it came to serve several projects and its media at the same server. Doing all those things with nginx and gunicorn is shockingly easy.

To add a new site on nginx I just touch a new file in the /etc/nginx/sites-enabled, with the contents:

server {
    listen 80;
    server_name myserver.com;

    location / {
        proxy_pass http://127.0.0.1:8888;
    }
}

Now, supposing I want to have acces and error logs, and serve my site’s and the admin site’s media, I would add some lines to the server:

access_log /var/www/django/myproject/access.log;
error_log /var/www/django/myproject/error.log;

location /media {
    root /var/www/django/myproject;
}

location /admin_media {
    alias /usr/local/lib/python2.7/dist-packages/django/contrib/admin/media;
}

Where /media and /admin_media are the MEDIA_URL and ADMIN_MEDIA_PREFIX pointed by my settings.py. That is, I have a media folder in my project root directory and the admin media is served directly from the django installation.

To actually run my django site on http://127.0.0.1:8888 I create a gunicorn configuration file in my project root (where manage.py is located):

bind = "127.0.0.1:8888"
workers = 3

Assuming I named the file gunicorn.conf.py, gunicorn is run from the same directory doing:

gunicorn_django -c gunicorn.conf.py -D

And that’s it.

When the code is updated, there’s no need to restart nginx; instead, gunicorn is restarted (not so prettily) with:

kill -HUP <main gunicorn process id>

To serve multiple sites, just use differents ports (8888 in the example) in each one’s configuration file.