Interview for This is not a Monad tutorial

unbalancedparen interviewed me for his blog This is not a Monad tutorial. I elaborated on some of the topics I usually write about in this blog, like Python, Clojure and Open Source programming.

Here’s the link to the interview.

Advertisements

A farewell to Django (and Python?)

For about three years, I’ve been programming (professionally) almost exclusively with Django. It let me work fast as a solo programmer, faster than most of other programmers in my country (which are still doing mostly Java and PHP), and have the freedom to pick only the jobs I was interested in.

Things are starting to change in the web industry, though. And I’m not talking about some hyped technology that’s supposed to be the future of web programming, but about what the standard user expects of today’s web applications. Most programmers will know what I’m talking about: as client programming gets more and more complex, it’s getting harder (not to say impossible) to stay on the DRY side of things. This situation is very well explained in this article. It’s time to start looking for alternatives to LAMP and its variants.

Recently there was a series of posts in Hacker News discussing how cloudy Python’s future was (here, here and here). I don’t think Python is going away in the near future. I personally consider it the best general purpose programming language, and my weapon of choice in most cases; it’s probably what I’ll be comparing against every other language I try in the next couple of years. That being said, it’s clear that Python (using Django or some other framework) is not the best tool for some of the hottest jobs in the market, complex web applications being one of them. As a side note, it is interesting that the case of Python being “too slow” or “too CPU intensive”, which I always disregarded (and still do for the most part) for not being a bottleneck for most applications, has finally found a raison d’être in the battery consumption rate of mobile devices.

I’m not sure what the future of web programming will look like, but I sure as hell know how the present does: JavaScript.

I never liked JavaScript; I was one of those people that learned it because they had to: the browser speaks JavaScript and there’s no way around it. But saying that I learned it is overstating, I just started using it as I needed without much idea of what was going on, but with the certainty that whatever it was, it was weird. Indeed, as Douglas Crockford puts it in his JavaScript: The Good Parts:

The amazing thing about JavaScript is that it is possible to get work done with it without knowing much about the language, or even knowing much about programming. It is a language with enormous expressive power.

Over the years, it got less painful as I understood a bit more of the language (and after having experience with other languages) although I never really took the time to study its foundations and cleaner idioms. That’s what I’ve started to do now, and I must say that I see potential in JavaScript. If you stick to the good parts, that is. For one, it was JavaScript, not on its own merit as a programming language but for historical reasons, that managed for features such as dynamic typing and first class functions (that users of “better” languages have been advocating for for years) to go mainstream.

Finally the day came when Python is not the best tool for the job, at least not for my job (I plan to stay as a web programmer for the time being) and to move on to a better one. The first step, is to learn JavaScript. I mean to learn proper JavaScript.

Django DB optimization for pedestrians

I don’t usually deal with SQL. Ever since I moved to django, I never had the need to write a SQL line in my code, and very rarely I had to look at the database for insights on my application. This is a great thing I think; as long as you don’t have performance issues developing a web application, the best is to worry more about writing readable rather than fast code. But once pages start to take a little too long to load (and a couple of hours usually is a little too long) then it’s time to get your hands dirty. As I said, I don’t usually deal with this kind of problem, so I had to google my way out of it. I explain here how I diagnosed and solved the bottlenecks on my application; the methods used probably are far from ideal but might be of use to inexperienced programmers in such tasks (as I am).

My problem in this opportunity wasn’t intensive database access, but simply tables too big to query carelessly; the database contents are updated by a batch process once a week (the application deals mostly with searching and doesn’t modify the data). The database used is MySQL. Some background reading that I found useful was this for django optimization and this for mysql optimization.

The first thing to do (after acknowledging that I had a performance problem) was to detect the bottlenecks on my program. To accomplish this, django debug toolbar proved to be of great value; upon page load, this app will tell you what queries were executed, where in the code, the time each one took (highlighting the slow ones) and even the sql EXPLAIN output to study how to optimize it. The toolbar really cracked the problem for me. When the query was so slow that waiting for the page to load wasn’t convenient, I used the query attribute of the QuerySet API. I addressed the slow queries in turn, using different solutions in each case:

Django lookups. This is the first place to look; sometimes a django query can be easily rewritten to be more efficient. The most common case is probably retrieving a full model where only one or two fields are being used; instead those values can be retrieved using values_list.

Indexes. Here is some documentation on how to optimize through indexes, and here is the syntax to do it. This was of less use of what I initially thought when I started reading on the subject. This may be because I didn’t took the time to fully understand the output of the EXPLAIN statement (most of my queries didn’t present signs of possible optimization through indexes anyway). But one place I found indexes useful was for text searches; to take advantage of these indexes in django, one must use the search query lookup and create an index for the field being searched. For example, to search books by title, one would use code like:

Book.objects.filter(title__search=keyword)

And create an index as:

CREATE FULLTEXT INDEX title_index ON app_book(title);

MySQL configuration tuning. This is probably the hardest task without previous experience. Here are some pointers; the most useful tool I found for tweaking mysql is the MySQLTuner script, which diagnoses your system and suggests configuration improvements.

Results caching. Frequent queries (or subqueries) that hold a small enough result set are ideal candidates for caching, using the low-level cache API. I used a custom admin command to refresh the cache contents periodically.

Summary tables. I had several huge tables from which I only needed a small subset of records (and of fields per record), and were impossible to query as is. I created a script to make summary tables with the records I needed and use those in django instead. I run the script after the database is updated. Here is a quick introduction to writing mysql-python scripts.

It’s interesting to note that even though I had to make some mysql-python scripts, and rewrite some of my django queries, my code remained SQL-agnostic all along, so this helped me to reinforce the notion that if you are resorting to raw sql in your code, you probably need to take a deeper look at django’s ORM.

ifile.it API wrapper for Python

Here’s a Python wrapper I’ve made to upload files to the ifile.it servers using their API. The code sends the files using the poster package.

'''
Wrapper for the ifile.it API. Uses the poster api.
'''
import urllib2
import json
from poster.encode import multipart_encode
from poster.streaminghttp import register_openers
import urllib

class Ifileit(object):
    """ Wrapper class for the ifile.it API. """
    
    UPOLOADED_FILE_URL = 'http://ifile.it/{ukey}'
    GET_UPLOAD_URL = 'http://ifile.it/api-fetch_upload_url.api'
    FETCH_API_KEY_URL = 'https://secure.ifile.it/api-fetch_apikey.api'
    PING_URL = 'http://ifile.it/api-ping.api'
    
    USER = ''
    PASSWORD = ''
    API_KEY = ''
    
    @classmethod
    def ping(cls):
        """ Returns True if the ifile.it server is up. """
        
        response = cls._open_and_check(cls.PING_URL)
        return response.get('message', '') == 'pong'
    
    @classmethod
    def upload(cls, the_file):
        """ 
        Uploads the_file to the ifile.it server. It should be a file-like 
        object as required by poster. 
        """
        
        #Needed by poster
        register_openers()
    
        post_data = {'Filedata' : the_file}
        post_data.update(cls._get_akey())
        datagen, headers = multipart_encode(post_data)
        
        request = urllib2.Request(cls._determine_upload_url(), 
                                                    datagen, headers)
        response = cls._open_and_check(request) 
        
        return cls.UPOLOADED_FILE_URL.format(ukey=response['ukey'])
    
    @classmethod
    def _determine_upload_url(cls):
        """ Gets the upload url from ifile.it API. """
        
        return cls._open_and_check(cls.GET_UPLOAD_URL)['upload_url']
    
    @classmethod
    def _open_and_check(cls, url, data=None):
        """ 
        Opens the given url string or Request object and checks for the status
        parameter in the response.  If it's 'ok' returns a dict with the 
        response. otherwise raises IfileitApiError.
        
        Data should be a dictionary of POST arguments or None for a GET request.
        """
        
        if data:
            data = urllib.urlencode(data)
        
        response = json.loads(urllib2.urlopen(url, data).read())
    
        if response['status'] != 'ok':
            raise IfileitApiError()
        
        return response
    
    @classmethod
    def _get_akey(cls):
        """ 
        If the user info is set, return the akey parameter for the POST API 
        calls. Otherwise returns an empty dict.
        """
        
        if cls.USER and cls.PASSWORD:
            if not cls.API_KEY:
                response = cls._open_and_check(cls.FETCH_API_KEY_URL, 
                                               {'username' : cls.USER,
                                                'password' : cls.PASSWORD})
                cls.API_KEY = response['akey']
            
            return {'akey' : cls.API_KEY }
        
        return {}

class IfileitApiError(Exception):
    pass

Python doesn’t treat me like I’m stupid

I don’t know how it works worldwide, but here in Argentina, most programmers not only use Java, but are also convinced that Java is the best language available if you can do without C/C++ efficiency, and therefore don’t bother to learn other languages. There’s no such concept as the “programmer toolbox”. And this is not enforced only by the Software Industry (where non technical issues, like programmer availability come to play[1]), but sadly by a noticeable part of the academic community.

There’s a fact that I think makes this situation worse, and is that Java is a pretty dated language. What I mean is there are too different problems;  The first one is that programmers don’t bother to study other languages, so they use one tool to solve all problems. This is indeed a big issue, but more related to engineering and science than to programming. The second one is that there clearly are better languages to do what Java is supposed to do best: general purpose object oriented programming, and if you may, web applications back-end programming (which is what most programmers end up doing here). And those better languages are not obscure novelties as some like to suggest, but languages that have been around as long as Java has. Ruby and Python are the examples I’m most familiar with, but there are others, notably lisp.

I don’t know about any Java programmer that having tried a higher level language such as Python, wanted to go back to using Java. Bruce Eckel who wrote the books that are used for reference on Java and C++ in the programming courses that teach object-oriented programming at my college  noticed almost 10 years ago that sticking to Java was a huge waste of programmer’s time. A lot has been written about the advantages of Python over Java; a specially concise example are Eckel’s slides Why I love Python[2]. I also like a couple of screencasts by Sean Kelly, that focus more on web development (where Java is specially useless)[3].

I’m not interested in repeating all those points; most of them are related to language features like dynamic typing or native types. Some of them can be overcome by enhancing the language with new features, and this is what Java designers have been doing: they have been patching bad language design decisions ever since it first came out.

But there’s one particular aspect that I think defines on principle why Python is a better language than Java, and it won’t change no matter what features are added or removed. And it is the philosophy behind the language, the way they think about the programmer. Quoting Eckel: Java treats programmers like they are stupid, Python doesn’t.

Java design and libraries consistently make a huge effort making it difficult for a programmer to do bad things. If a feature was a potential means for a dumb programmer to make bad code, they would take away the feature altogether. And what’s worse, at times they even used it in the language implementation but prohibited the programmer to do so[4]. Paul Graham remarkably pointed this[5] as a potential flaw of Java before actually trying the language:

Like the creators of sitcoms or junk food or package tours, Java’s designers were consciously designing a product for people not as smart as them.

But this approach is an illusion; no matter how hard you try, you can’t always keep  bad programmers from writing bad programs. Java is a lousy replacement to the learning of algorithms and good programming practices.

The implications of this way of thinking are not limited to language features. The design decisions of the underlying language have a great effect on the programmer’s idiosyncrasy. It’s common to see Java designs and patterns that make an unnecessary effort in prohibiting potential bad uses of the resulting code, again not trusting the programmer’s judgment, wasting time and putting together very rigid structures.

The bottom line is that by making it hard for stupid programmers to do bad stuff, Java really gets in the way of smart programmers trying to make good programs. Python on the other hand, is focused on making it extremely easy to program well[6]. It gives you the choice; If you want to do crappy code is your responsibility.

[1] Paul Graham, Revenge of the Nerds.
[2] Bruce Eckel, Why I love Python. The points are extended in these interviews.
[3] Sean Kelly, Recovery from addiction, Better web app development.
[4] Bruce Eckel, The Zen of Python.
[5] Paul Graham, Java’s Cover.
[6] Tim Peters, The Zen of Python.

Open Library and the beauty of Python

I’m currently working on a books site, so I recently was in the need to interact with the Open Library’s APIs, specifically to retrieve a book’s cover based on its ISBN. The API puts a limit on how many cover requests an IP can do when looking by ISBN, but not if you look by the OLID (Open Library book id), so I decided to see how hard it was to get a book’s OLID once I got its ISBN.

It’s useful every once in a while to recall how hard it was to accomplish certain things on certain cluttered and rigid languages, and how easy it is when you can use something as powerful as Python1. Well, I ended up with something like:

import urllib2
import json

URL = 'http://openlibrary.org/api/books?bibkeys=ISBN:{isbn}&format=json&jscmd=data'

def get_olid(isbn):

    response = urllib2.urlopen(URL.format(isbn=isbn))
    js_dict = json.loads(response.read())
    key = 'ISBN:{isbn}'.format(isbn=isbn)
    if js_dict.has_key(key):
        return js_dict[key]['identifiers']['openlibrary'][0]

But the real beauty of Python this time was not the clearness or succinctness of the code, nor how useful the libraries were, but rather how easy it was to figure out how to use those libraries. Just checked the function signatures on the online documentation (I could have used the help function, but I like the docs better), and started to play with them on the shell until I got the result I expected. It even saved me the time to read thoroughly the OL reference, since I could see in the shell the contents of the response they are sending me on each call.

1. Yes, there are no silver bullets, but there are some real good bullets and some real bad bullets for each job, and I find Python to be a great general purpose bullet :P.