I don’t usually deal with SQL. Ever since I moved to django, I never had the need to write a SQL line in my code, and very rarely I had to look at the database for insights on my application. This is a great thing I think; as long as you don’t have performance issues developing a web application, the best is to worry more about writing readable rather than fast code. But once pages start to take a little too long to load (and a couple of hours usually is a little too long) then it’s time to get your hands dirty. As I said, I don’t usually deal with this kind of problem, so I had to google my way out of it. I explain here how I diagnosed and solved the bottlenecks on my application; the methods used probably are far from ideal but might be of use to inexperienced programmers in such tasks (as I am).
My problem in this opportunity wasn’t intensive database access, but simply tables too big to query carelessly; the database contents are updated by a batch process once a week (the application deals mostly with searching and doesn’t modify the data). The database used is MySQL. Some background reading that I found useful was this for django optimization and this for mysql optimization.
The first thing to do (after acknowledging that I had a performance problem) was to detect the bottlenecks on my program. To accomplish this, django debug toolbar proved to be of great value; upon page load, this app will tell you what queries were executed, where in the code, the time each one took (highlighting the slow ones) and even the sql EXPLAIN output to study how to optimize it. The toolbar really cracked the problem for me. When the query was so slow that waiting for the page to load wasn’t convenient, I used the query attribute of the QuerySet API. I addressed the slow queries in turn, using different solutions in each case:
Django lookups. This is the first place to look; sometimes a django query can be easily rewritten to be more efficient. The most common case is probably retrieving a full model where only one or two fields are being used; instead those values can be retrieved using values_list.
Indexes. Here is some documentation on how to optimize through indexes, and here is the syntax to do it. This was of less use of what I initially thought when I started reading on the subject. This may be because I didn’t took the time to fully understand the output of the EXPLAIN statement (most of my queries didn’t present signs of possible optimization through indexes anyway). But one place I found indexes useful was for text searches; to take advantage of these indexes in django, one must use the search query lookup and create an index for the field being searched. For example, to search books by title, one would use code like:
And create an index as:
CREATE FULLTEXT INDEX title_index ON app_book(title);
MySQL configuration tuning. This is probably the hardest task without previous experience. Here are some pointers; the most useful tool I found for tweaking mysql is the MySQLTuner script, which diagnoses your system and suggests configuration improvements.
Results caching. Frequent queries (or subqueries) that hold a small enough result set are ideal candidates for caching, using the low-level cache API. I used a custom admin command to refresh the cache contents periodically.
Summary tables. I had several huge tables from which I only needed a small subset of records (and of fields per record), and were impossible to query as is. I created a script to make summary tables with the records I needed and use those in django instead. I run the script after the database is updated. Here is a quick introduction to writing mysql-python scripts.
It’s interesting to note that even though I had to make some mysql-python scripts, and rewrite some of my django queries, my code remained SQL-agnostic all along, so this helped me to reinforce the notion that if you are resorting to raw sql in your code, you probably need to take a deeper look at django’s ORM.