A context manager is not the answer.
Think twice before mock.patch.
And, God, whatever that is, don’t put it in __init__.py
A context manager is not the answer.
Think twice before mock.patch.
And, God, whatever that is, don’t put it in __init__.py
We came from different backgrounds; I was programming Django for years, working in applications with increasingly complex UIs, moving from spaghetti jQuery to client MVCs such as backbone; Martin was already getting into Node.js development, also using AngularJS after trying other client frameworks. We both regarded the current state of web development, centered in REST servers and MV* clients, as one of unstable equilibrium. Some problems were evident to us: inherent duplication (same models, same validations) and continuous context switches between front and back end code. The latter one was partially solved by Node.js, letting the programmers use the same language in both sides. But we felt there wasn’t enough effort put into taking advantage of the potential of the platform, to erase or at least reduce the gap between client and server in web applications. That was the direction we wanted to take with Invisible.js, acknowledging the limitations of being a couple of developers working in our free time.
With that goal in mind, we started out with some months of research on frameworks and technologies, most of which we weren’t familiar with or hadn’t used yet; then we built a couple of prototypes to test them out. After that, we had a better picture on how to lay out the development of Invisible.js. We weren’t out to build a full stack framework like Derby or Meteor, trying to cover each aspect of web development; rather, we wanted to pull together the awesome modules available in Node.js (express, browserify, socket.io) in order to achieve client/server model reuse as gracefully as possible. In that sense, the nodejitsu blog was a great source of inspiration.
As a side note, the framework is named after Invisible, a progressive rock group led by Luis Alberto Spinetta in the ’70s.
Invisible.js stands on a more or less MEAN stack; it’s actually not tied at all to AngularJS, but we usually choose it as our front end framework, as we think it’s the best and places no constraints on the models it observes, which makes it a good fit for Invisible (as opposed to backbone, for example). As of the database, we appreciate the short distance between a JSON object and a Mongo document, plus it has a nice, flexible Node.js driver, but certainly an interesting branch of development for Invisible.js would be to add support for other data stores.
The main component of Invisible are the models. The developer defines them with its methods and attributes, and registers them in Invisible; this way they are exposed both in client and server, and augmented with methods to handle database access, real-time events and validations. The implementation of those methods will change depending on where the call is made, as the figure shows.
What goes under the hood is that the Invisible server, which replaces the express one, exposes a dynamically generated browserify bundle that contains all the registered model definitions for the client. It also exposes the REST controllers to handle the CRUD methods that they call.
We’re very pleased with the result of our work; most of the things we’ve tried worked out, and we went further than we expected. Indeed, we feel that Invisible.js not only solves its initial goal of exposing reusable models, but also that it’s simple to use and gives a lot of non-trivial stuff out of the box, with a few lines of code.
Apart from security, we still have to see how well the framework escalates, both in resources and code base, as it’s used in medium and big applications. We hope other developers will find it interesting enough to give it a try and start collaborating, so it does turn into something more than a toy.
The topic of authentication in REST architectures is a debatable one; there are several ways to do it, not all of them practical, not all RESTful; no standard and a lot of room for confusion. Ever since I got into REST, this was the one thing which wasn’t evident to me, even after a decent amount of research. Recently I got the time to dive deeper in the problem, evaluated thoroughly the alternatives and made my conclusions. While they may be inaccurate at some degree, I gather them here since I found no one place that would present the topic in a friendly fashion.
First let’s establish some ground rules for the analysis, to avoid a lot of the usual confusion.
That being said, let’s look at the authentication methods available.
Digest is intended to be a more secure alternative to HTTP Basic, and could be considered if we were not using HTTPS, which we are. Without a secure connection, the method is vulnerable to Man-in-the-Middle attacks, you’d be sending credentials hashed with a weak algorithm and you wouldn’t be allowed to use a strong encryption method to store the passwords. Moreover it’s less simple than Basic and you still have to deal with the browser login box. So we rule out digest.
A classic resource on RESTful Authentication is the homonymous stackoverflow question. The most voted answer there mentions the problems of using Basic Auth and proposes a custom method based on storing a session id in a cookie. I don’t mind having a narrow scoped session (for example with expiration date), but if you’re rolling a custom method, I don’t see any advantages in using cookies over an Authorization Header, either mimicking Basic Auth or with a different logic.
OpenID provides federated authentication by letting the user log in an application using his account from another provider such as Google or Yahoo. It is in theory a more adequate approach than OAuth for delegating the credentials management to a third-party provider, but it’s harder to implement and I haven’t found a single source discussing how it may be used as a method for REST authentication.
OAuth is probably the biggest source of confusion: you have two widely deployed versions, with a lot of debate behind, and several workflows to handle different scenarios. What’s more, OAuth is an authorization standard, that in some cases may be bent into doing authentication.
The most common use case of OAuth is a user authorizing a consumer application to access his data on a third party application (i.e. Facebook), without giving away his credentials. This authorization schema can be used as a way of delegated authentication: if the consumer is granted access to the user data, then the identity of the user is proven. While this works, it has some pitfalls: first, it assumes that having access to user data equals to being the user, which isn’t necessarily true (this is not enforced by the protocol), but more importantly, it gives the consumer application access to data that shouldn’t be required for authentication (i.e. photos, contacts). That’s why this is referred as pseudo-authentication. It’s worth noting that OpenID Connect is being developed as a complement to OAuth to solve this problem.
There are cases where you want to handle credentials yourself, so you don’t need the third party provider in the workflow. Some articles suggest using OAuth1 2-legged Auth or the OAuth2 Client Credentials grant, but I’ve found that both of them solve the authorization part, providing an access token to include in the requests, but leave authentication (how you establish the identity when requesting for that token) to be handled by other method. Thus, it’s not of much use for the problem at hand.
OAuth2 Resource Password Owner flow does solve authentication when you are in control of the credentials. It exchanges an initial request with user and password for a token that can be used to authenticate (and authorize) subsequent requests. This is an alternative to Basic Auth, slightly better in the sense that you just include credentials on the first call (thus you don’t need to store them in the client). It’s also a standard with a simple implementation and avoids the browser interaction problem of the standard HTTP methods, making it the better choice in this scenario.
Meteor.js recently introduced the Secure Remote Password protocol as a way to handle authentication in web applications. It’s hailed as the one method that guarantees security without HTTPS, but SRP itself only provides a way to log a user in without using its credentials in the application server. Upon user registration, a verifier is stored instead of the password; for authentication the user sends some parameters derived from the password that can be checked against that verifier. The credentials indeed are never sent, and can’t be guessed by the parameters, but you still need a secure transaction when registering the verifier. An attacker that gets a hold of the verifier can obtain the passwords with a dictionary attack. An interesting case of this is the attack to Blizzard servers in 2012.
Avoiding password management is generally a good idea for application developers. With that in mind, I’d start looking at delegated and federated authentication for securing my RESTful APIs. OAuth is formally less appropriated, but simpler and more widely used than OpenID, which some state to be dead, so that looks like the safer bet. If you want to handle credentials yourself, OAuth’s Resource Password Owner flow is probably the best choice.
I finally took the time to start fiddling with Node.js, and as I expected from such a young and dynamic technology, I ran into some gotchas and configuration headaches, so I’ll put down some notes here that might be helpful for other people getting started with Node.
As I moved along the tutorial, I ran into the first problem: it uses the formidable node module to handle the file uploads, which is not compatible with the most recent versions of Node (the current one is 0.10.5). Looking into this I found out a couple of interesting facts:
So I needed to install version 0.8.something.
At this point I started to feel uncomfortable messing around with different versions on a global Node installation. Neither I liked to sudo everytime I needed to install a new module. There was some misleading advice around the web on doing a chown on your /usr/local folder as a way to avoid this, which didn’t look all that good. Coming from Python and virtualenv I like to handle my installations locally. This is the simplest way to do it I’ve found.
There are several modules that allow handling multiple Node versions, the most popular being nvm and n. I found n was difficult to configure to work with the local installation, so I switched to nvm instead. The code needed to install it and switch to 0.8 was something like:
wget -qO- https://raw.github.com/creationix/nvm/master/install.sh | sh echo '[[ -s ~/.nvm/nvm.sh ]] && . ~/.nvm/nvm.sh' >> .bashrc nvm install 0.8 nvm alias default 0.8
For about three years, I’ve been programming (professionally) almost exclusively with Django. It let me work fast as a solo programmer, faster than most of other programmers in my country (which are still doing mostly Java and PHP), and have the freedom to pick only the jobs I was interested in.
Things are starting to change in the web industry, though. And I’m not talking about some hyped technology that’s supposed to be the future of web programming, but about what the standard user expects of today’s web applications. Most programmers will know what I’m talking about: as client programming gets more and more complex, it’s getting harder (not to say impossible) to stay on the DRY side of things. This situation is very well explained in this article. It’s time to start looking for alternatives to LAMP and its variants.
Recently there was a series of posts in Hacker News discussing how cloudy Python’s future was (here, here and here). I don’t think Python is going away in the near future. I personally consider it the best general purpose programming language, and my weapon of choice in most cases; it’s probably what I’ll be comparing against every other language I try in the next couple of years. That being said, it’s clear that Python (using Django or some other framework) is not the best tool for some of the hottest jobs in the market, complex web applications being one of them. As a side note, it is interesting that the case of Python being “too slow” or “too CPU intensive”, which I always disregarded (and still do for the most part) for not being a bottleneck for most applications, has finally found a raison d’être in the battery consumption rate of mobile devices.
I don’t usually deal with SQL. Ever since I moved to django, I never had the need to write a SQL line in my code, and very rarely I had to look at the database for insights on my application. This is a great thing I think; as long as you don’t have performance issues developing a web application, the best is to worry more about writing readable rather than fast code. But once pages start to take a little too long to load (and a couple of hours usually is a little too long) then it’s time to get your hands dirty. As I said, I don’t usually deal with this kind of problem, so I had to google my way out of it. I explain here how I diagnosed and solved the bottlenecks on my application; the methods used probably are far from ideal but might be of use to inexperienced programmers in such tasks (as I am).
My problem in this opportunity wasn’t intensive database access, but simply tables too big to query carelessly; the database contents are updated by a batch process once a week (the application deals mostly with searching and doesn’t modify the data). The database used is MySQL. Some background reading that I found useful was this for django optimization and this for mysql optimization.
The first thing to do (after acknowledging that I had a performance problem) was to detect the bottlenecks on my program. To accomplish this, django debug toolbar proved to be of great value; upon page load, this app will tell you what queries were executed, where in the code, the time each one took (highlighting the slow ones) and even the sql EXPLAIN output to study how to optimize it. The toolbar really cracked the problem for me. When the query was so slow that waiting for the page to load wasn’t convenient, I used the query attribute of the QuerySet API. I addressed the slow queries in turn, using different solutions in each case:
Django lookups. This is the first place to look; sometimes a django query can be easily rewritten to be more efficient. The most common case is probably retrieving a full model where only one or two fields are being used; instead those values can be retrieved using values_list.
Indexes. Here is some documentation on how to optimize through indexes, and here is the syntax to do it. This was of less use of what I initially thought when I started reading on the subject. This may be because I didn’t took the time to fully understand the output of the EXPLAIN statement (most of my queries didn’t present signs of possible optimization through indexes anyway). But one place I found indexes useful was for text searches; to take advantage of these indexes in django, one must use the search query lookup and create an index for the field being searched. For example, to search books by title, one would use code like:
And create an index as:
CREATE FULLTEXT INDEX title_index ON app_book(title);
MySQL configuration tuning. This is probably the hardest task without previous experience. Here are some pointers; the most useful tool I found for tweaking mysql is the MySQLTuner script, which diagnoses your system and suggests configuration improvements.
Results caching. Frequent queries (or subqueries) that hold a small enough result set are ideal candidates for caching, using the low-level cache API. I used a custom admin command to refresh the cache contents periodically.
Summary tables. I had several huge tables from which I only needed a small subset of records (and of fields per record), and were impossible to query as is. I created a script to make summary tables with the records I needed and use those in django instead. I run the script after the database is updated. Here is a quick introduction to writing mysql-python scripts.
It’s interesting to note that even though I had to make some mysql-python scripts, and rewrite some of my django queries, my code remained SQL-agnostic all along, so this helped me to reinforce the notion that if you are resorting to raw sql in your code, you probably need to take a deeper look at django’s ORM.
Design patterns are probably one of the most misused and misunderstood programming techniques out there, and I argue this is because they’re studied out of the context in which they were conceived, or worse, because is assumed that this context never changes. This ultimately causes some particular set of patterns to be considered as a somewhat general tool whose use is a good programming practice.
This is not a critic to the Gang Of Four book, which I believe is one of the best books addressing object oriented software programming. The authors of that book were very aware of the consequences of the wrong use of the techniques they were introducing, and so they stated in its first pages.
The context of a design pattern
A design pattern is a general reusable solution to a commonly occurring problem within a given context in software design; a set of descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context. Much of the misuse of design patterns comes from ignoring the context part.
If you change the problem domain or the programming paradigm or the language applied, then the set of appropriate patterns also changes. Indeed, since patterns are reusable, proved solutions to common programming problems, if a problem goes away by changing one of those variables, then the pattern is no more. Similarly, as some problems disappear, new ones arise when the context is changed. A detailed study of this argument is made by Peter Norvig in his slides Design patterns in dynamic languages. There he reaches the conclusion that most of GoF’s patterns are simpler or invisible in Lisp or Dylan.
Some argue that while the implementation gets simpler in some languages, the patterns are still there, so they’re still useful in the sense that they provide a vocabulary for programmers to describe the design. This might be true in some cases, but not in most. Usually, the higher the level of the language features, the higher the design abstractions and the lesser the need to refer to patterns that usually are foreign to the domain of those abstractions.
For example, in a statically typed language such as Java one might need to introduce interfaces or class hierarchies to achieve polymorphism, whereas in a dynamically typed one everything is polymorphic, the feature is taken for granted by the programmers and since it’s built-in, it’s not needed anymore to describe the design. Another example, more related to what’s commonly accepted as a design pattern, are iterators. In C++, when not using STL, one might find the need to introduce iterators as a part of the program design. In Java, where every standard data structure comes with its own iterator, one might use them a lot, but they stop being a part of the design worth discussing. Lastly, in Python, where the iterator implementation is completely hidden by the control and data structures, one might never even see an iterator-related piece of code.
Choosing a programming language
This is part of a broader problem, but is directly related to the one I’m addressing here. Commonly in the software industry (but unfortunately in the academic community too) the people in charge of the design and architectural decisions on a software project don’t consider the language as a parameter of the solution. Sometimes they just pick the one they’re more comfortable with, but often this decision is made by management people based on non technical factors, such as programmer availability. Paul Graham puts it very clearly in one of his essays:
The pointy-haired boss miraculously combines two qualities that are common by themselves, but rarely seen together: (a) he knows nothing whatsoever about technology, and (b) he has very strong opinions about it.
Suppose, for example, you need to write a piece of software. The pointy-haired boss has no idea how this software has to work, and can’t tell one programming language from another, and yet he knows what language you should write it in. Exactly. He thinks you should write it in Java.
Why does he think this? Let’s take a look inside the brain of the pointy-haired boss. What he’s thinking is something like this. Java is a standard. I know it must be, because I read about it in the press all the time. Since it is a standard, I won’t get in trouble for using it. And that also means there will always be lots of Java programmers, so if the programmers working for me now quit, as programmers working for me mysteriously always do, I can easily replace them.
Well, this doesn’t sound that unreasonable. But it’s all based on one unspoken assumption, and that assumption turns out to be false. The pointy-haired boss believes that all programming languages are pretty much equivalent.
I won’t get into how viable it is for the employee to disobey the manager’s bad decisions or how valid the manager’s point of view is. But as a programmer I too think that you can’t let the suits make technical decisions for you.
When professors are closer to the the industry than to science, this too has an effect in their approach on teaching. They will tend to select as teaching languages those more widely used and (at least where I study) impose that choice thus creating a vicious cycle where the student never develops the need to compare and choose the language better suited for each job.
Be it by choice or by obligation, the lack of the habit of choosing the programming language for each project causes the context to be considered as invariable. If the context is assumed static then the patterns for that context are the only patterns ever applied (ever taught for that matter), and so they end up looking universal or more relevant than others. The fact that most GoF patterns are never needed when programming in languages such as Python is seen by the naive as a shortcoming of the language rather than the opposite.
Patterns smell funny
Some programmers (specially those recently introduced to the technique) think that the more patterns it has, the better a design is. Or maybe a less extreme position is that given two ways of solving a problem, if one uses a known pattern, it’s better. This, I suppose, is based on the mentioned idea that patterns introduce a common vocabulary that lets them speak at a higher level. Also on the widespread belief that they are some sort advanced programming technique.
On the contrary, I think that a design that introduces much patterns smells funny, using the parlance of Refactoring (and Frank Zappa). Fowler and Beck argue that certain structures in code suggest the possibility of refactoring, and call those bad smells; they don’t necessarily imply that the code is wrong, but it probably is.
In one hand, a pattern introduces foreign abstractions into the problem domain, increasing the design’s complexity. On the other, often making the design more flexible in one specific direction, it does so by sacrificing flexibility in the others. This tends to be a recipe for disaster when programmers believe they can predict how a system may evolve in the long run. Graham holds a similar position:
When I see patterns in my programs, I consider it a sign of trouble. The shape of a program should reflect only the problem it needs to solve. Any other regularity in the code is a sign, to me at least, that I’m using abstractions that aren’t powerful enough– often that I’m generating by hand the expansions of some macro that I need to write.
He is off course, speaking from the standpoint of a Lisp programmer. Indeed, if we accept the idea that patterns come to overcome the lack of a certain language feature, most patterns are presumably needed because of the absence of macros. Still, there are many contexts in which using Lisp might not be the best technical decision, so we should apply a more general principle.
First we should abandon the idea that patterns improve the design of software, and use them only when they are really needed, this is, when there’s no simple solution to a problem with the current abstractions of our domain. Second, we should be aware of the proliferation of patterns, since it might be a sign that the language (or architecture) we are using is not the right tool for the job. And be prepared to leave the tool when necessary.
 Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, Design Patterns: Elements of Reusable Object-Oriented software.
 Design patterns should not be applied indiscriminately. Often they achieve flexibility and variability by introducing additional levels of indirection, and that can complicate a design and/or cost you some performance. A design pattern should only be applied when the flexibility it affords is actually needed.
 Software design pattern.
 Despite the book’s size, the design patterns in it capture only a fraction of what an expert might know. It doesn’t have any patterns dealing with concurrency or distributed programming or real-time programming. It doesn’t have any application domain-specific patterns. It doesn’t tell you how to build user interfaces, how to write device drivers, or how to use an object-oriented database. Each of these areas has its own patterns, and it would be worthwhile for someone to catalog those too.
The choice of programming language is important because it influences one’s point of view. Our patterns assume Smalltalk/C++-level language features, and that choice determines what can and cannot be implemented easily. If we assumed procedural languages, we might have included design patterns called “Inheritance,” “Encapsulation,” and “Polymorphism.” Similarly, some of our patterns are supported directly by the less common object-oriented languages. CLOS has multi-methods, for example, which lessen the need for a pattern such as Visitor. In fact, there are enough differences between Smalltalk and C++ to mean that some patterns can be expressed more easily in one language than the other.
 Peter Norvig, Design Patterns in Dynamic Languages.
 Paul Graham, Revenge of the Nerds.
 Martin Fowler, Refactoring: Improving the Design of Existing Code.