One of the pitfalls of being a polyglot is that it is easy to fall into the trap of drawing parallels rather quickly and incorrectly. In this post, we shall compare java servlet filters, wsgi middlewares and django middlewares.

The J2EE story

The notion of servlets is well understood in java, it is a piece of code that handles an incoming request and provides some response. The most popular kind of servlet happens to be the HttpServlet. Filters are powerful tools that are used to wrap a servlet to provide either a pre or a post functionality i.e. do something just before the servlet is invoked or do something right after the servlet is invoked. A few common examples of operations that usually configured as a filters are as follows:

  • Authentication
  • Output compression
  • Logging

And here is an example of the structure of a filter implementation:

public final class RealFilter implements Filter {
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) 
      throws IOException, ServletException {
 
       // Do the "pre" work
 
       // Now let the real implementation take over
       chain.doFilter(request, response);
 
       // Do the "post" work
    }
}

Things get interesting when multiple filters are used. A programmer is allow to stack any number of filters on top of a servlet. The “pre” parts of the filter are executed in order before transferring control to the servlet and the “post” parts are executed in reverse order. So if filters f1, f2 & f3 are configured in that sequence along with a servlet s1, here is what is really happening internally on each request:

f1.doFilter{
  // Do the "pre" work of f1
 
	f2.doFilter{
	  // Do the "pre" work of f2
 
		f3.doFilter{
		  // Do the "pre" work of f3
 
		  s1.service()
 
		  // Do the "post" work of f3
		}
 
	  // Do the "post" work of f2
	}
 
  // Do the "post" work of f1
}

As might have guessed, the invocation of chain.doFilter() from within a filter is crucial for this behaviour. It ensures that control is transferred either to the next filter or the servlet itself if there are no other filters to process. This paradigm has the following implications:

  1. It is possible to short-circuit the inbound path. Failure to call chain.doFilter() ensures that control never reaches the servlet and also filters that are further down the path
  2. It is not possible to short-circuit on the way out unless an exception is thrown and it is not getting caught by your filter
  3. The call stack gets deeper as you start adding filters. Examining the call stack from within the servlet will actuallly reveal all the filters

The actual servers (web containers, app servers, etc. etc) are permitted to introduce their own filters and add them implicitly. This however is not of interest to this discussion.

WSGI middleware

The python world has a lot of frameworks for developing web applications but didn’t have any standard , cross framework, cross server API for a long time. Though WSGI (PEP-0333) existed for a long time, it didn’t get traction until recent times. This twist in the evolution means that most application developers still code against their frameworks and wsgi’s utility has now been reduced to acting as a specification against which framework developers and python server implementors code against. Regardless, wsgi supports the concept of middlewares which operates pretty much in the same fashion as java filter

Django middleware

Django, a popular web development in framework also has the notion of middlewares. It happens to be one of those frameworks that got on the wsgi bandwagon quite late in the game (it did have some rudimentary support for quite some time but high quality support came in post 2009). Interestingly, django also has a concept of middlewares but it is quite different from anything else one has seen before. To understand the difference, let us first look into the structure of a middleware:

class MyMiddleware(object):
  def process_request(self, request):
     ...
 
  def process_view(self, request, view_func, view_args, view_kwargs):   
     ...
 
  def process_response(self, request, response):
     ...
 
  def process_exception(self, request, exception):
     ...

The biggest difference is that django middlewares are iterative and not stacked. The split interface for request and response is a kind of a giveaway of what is happening internally. Here is an oversimplified version of how middlewares are executed in the django world:

for i in middlewares:
	i.process_request()
 
invoke_the_view()
 
for i in reversed(middlewares):
	i.process_response()

Note that the rules for short-circuiting is much more complex than the java model and one must read entire documentation to understand those. So what are the implications of using iterative model as opposed to a stacked model?

  1. Depth of call stack is always fixed regardless of the number of middlewares used
  2. Short-circuiting is possible both on the inbound and outbound paths
  3. The stack frame (local variables) cannot be used to maintain context across “pre” & “post” phases for a single request.

 

The reason why I got writing down this post was reason #3. A common instrumentation practice is to usually have a filter that measures the time taken by a request. The obvious way to implement is to do the following steps:

  1. Start a  timer
  2. Do the actual work
  3. Stop the timer
  4. Measure the difference

Something as straightforward as this gets convoluted in the django way since there is no place available to store the context (short of polluting the request object). The added flexibility of being able to short-circuit even in the “post” phase also means that the instrumentation path could be skipped altogether. Overall, it definitely feels more complex than it should be.

tl;dr

If you are writing application agnostic middlewares for your django application, then you are probably better off just writing a wsgi middleware. Resort to django middlewares only if has something to do with application behaviour.