An if-substring-in-string Django Template Construction

Here's a quick tip for Django template hackers. It's a known fact of Django templates that the syntax is purposefully limited. I've been living with the need for an if-substring-in-string construction. Of course, I could write a custom template tag, but work is quite busy. So on a whim and a 10 minute break I tried this yesterday, and it worked well for me. Take a look and let me know what you think.

First, the problem.

Bookmark Formatting

I have a custom app for pulling in my Delicious bookmarks and formatting them as link posts here on this site. I include the HTML I want in the original bookmark and have a bit of template code to insert preview images generated by ShrinkTheWeb. When I link a YouTube video, I do a quick cut-n-paste of the video to embed the player, and in these cases, I don't want to include the preview image. In Python, this would be super simple:


if 'http://www.youtube.com/' not in link_url:
    show_thumb()

However, Django templates don't have a similiar syntax.

A Solution Using {% with %}

So here's what I did:


{% with link_url|slice:":23" as short_url %}
  {% ifnotequal short_url "http://www.youtube.com/" %}
    My ShrinkTheWeb image goes here.
  {% endifnotequal %}
    Other HTML common to all links goes here.
{% endwith %}

What do others think? A decent solution? There are some issues; if I want to use multiple video sites, for example. At that point, I would be better to use link categories or a custom tag. But for a 10 minute fix, I think it's quite nice.

Link | Posted by deryck on July 3, 2009 | 4 comments

A Django Auth Backend for Second Life

I've started hacking away at a personal project of mine around Second Life. More on that in the days to come, but I did want to share some code created last night while playing around with Second Life logins. I've worked up a Django authentication backend for authenticating users on a Django-based site against Second Life's login process. I've created a Google code project for the code, so cleverly named slauth.

This is code of the "release early, release often" variety. There are no docs, no tests, not even a README. I just wanted to get this up while I had 5 minutes today. I welcome feedback, and I'm certain I will be working on this as the larger project evolves. I'm not even certain I'll use this in the final project. I feel uncomfortable taking username and password for another "site," but without a proper login API for site-to-site authentication, this seems to be the only viable route. This uses the same XMLRPC auth process of the Second Life viewer code, which seemed to legitimize it a little for me (since this is how third party viewers have to authenticate). It's certainly better than page scraping the response of the Second Life web site's login form. If there were a way to register you application with the login process, I would be totally cool with this. Then, the user could verify a site as being legitimate -- or at least more legitimate than any joe running this auth backend. ;)

Having said all that, it's pretty easy to authenticate via this package. Just make sure the module lives on your PythonPath, and then add slauth.backends.SLAuthBackend to your Django AUTHENTICATION_BACKENDS setting. You'll even be able to login through the Django admin with your full SL username ("Anders Falworth" in my case, as an example). Of course, you won't get into the admin until you add make the SL account "staff". (This app creates a stub Django user account for each successful SL login, and then you can check is_staff, then login again with the SL account, and you'll see the successful entry into the Django admin site.)

This is the main class that does all the work:


from django.contrib.auth.models import User

from slauth.utils import valid_sl_login, get_or_create_sl_user

class SLAuthBackend:
    """
    A Second Life authentication backend for Django-based sites.
    """
    def authenticate(self, **kwargs):
        """
        Use kwargs to make the authenticate method more flexible.

        Django's admin app assumes username/password logins, so
        allow first and last in one username.  For example,
        username could be 'Bob Smith' and this method will split
        that apart into the first and last names SL login expects.

        So either of the following would work:

            >>> from django.contrib.auth import authenticate
            >>> authenticate(first_name='Bob', last_name='Smith', 
                                               password='foo')
            >>> authenticate(username='Bob Smith', password='foo')
        """
        first_name = kwargs.get('first_name', '')
        last_name = kwargs.get('last_name', '')
        password = kwargs.get('password', '')

        if kwargs.get('username', ''):
            username = kwargs.get('username', '')
            if ' ' in username:
                first_name, last_name = username.split(' ')

        authenticated = valid_sl_login(first_name, last_name, password)

        if authenticated:
            user = get_or_create_sl_user(first_name, last_name)
            return user
        return None

    def get_user(self, user_id):
        try:
            return User.objects.get(pk=user_id)
        except User.DoesNotExist:
            return None

You could certainly use parts of this without using Django, even though it's written with Django in mind. There is a utils module that has sl_login and valid_sl_login which returns the response from a login attempt or a True/False on success or failure of a login attempt.

Please have at the code at it's Google project home if you have need for or want to play with Second Life logins via Django or Python. Comments, suggestions, and of course, contributions are always welcome. This code is released under the GNU GPL v2.

Link | Posted by deryck on March 3, 2008 | 0 comments

Facebook/Washington Post, Performance Tuning

This final post about my group's work (at Washington Post.Newsweek Interactive) on our Facebook Platform app The Compass is long overdue. But now the time has come! Let's talk Postgresql and Apache performance.

In the first two posts on this subject, I wrote about the Facebook Platform itself and the Compass' architecture. In this post, we'll look at some of the challenges we encountered while serving the app and areas we focused on to improve our Postgresql and Apache performance.

NOTE: All of this is anecdotal, based on my experience with this app. I'm no performance guru and don't hold myself up as such. I think, too, different applications have different needs, and the requirements of something like Facebook could not be optimal for other situations.

Caching Limitations

As I mentioned last time, all of FBML we load into a profile is cached and served by Facebook, but the hits to our application pages are hits to our servers as well. The first thing that comes to mind with Django is, "well, make sure you have caching enabled." There are a couple reasons why this doesn't work as well as one would like.

First, the caching for a Django site is bypassed when the request contains GET or POST data. Every request from Facebook contains POST data. Each callback request has a few fb_sig* parameters that are POSTed to your page to verify the request comes from Facebook. This is great for security and passing data from Facebook back to your application, but it kills the normal caching process for Django-based sites.

Second, each request can potentially be unique. In our case, the only Facebook canvas pages we serve are the one that submits the compass survey questions and the one to display the Flash map of your friends who have installed the compass. It's hard to do much low-level caching of Django querysets because you don't want to inadvertently give the user someone else's data. We do a little of this, though. See, for example, what we do here when we display the compass based on your last answer:


cache_key = 'compass_entries_%s' % facebook.user
compass_entries = cache.get(cache_key)
if not compass_entries:
    compass_entries = Compass.objects.filter(user__exact=facebook.user).order_by('-id')[:10]
    cache.set(cache_key, compass_entries, 60 * 15)

We also reset these entries in the cache when a user resubmits the survey. So we save a few DB hits if the same user retakes the survey a few times back to back. However, there's just not much in common across users to really take advantage of Django's cache. We're pretty well left to raw DB performance.

Bypass the ORM

One of the first things we did to help performance was to bypass Django's ORM. We store the user's answer to each question via a save method on the form that is submitted. Using the ORM this would look something like:


from politicompass.models import Compass
def save(self, uid):
    q1 = self.clean_data.get('q1')
    q2 = self.clean_data.get('q2')
    q3 = self.clean_data.get('q3')
    q4 = self.clean_data.get('q4')
    q5 = self.clean_data.get('q5')
    q6 = self.clean_data.get('q6')
    q7 = self.clean_data.get('q7')
    q8 = self.clean_data.get('q8')
    q9 = self.clean_data.get('q9')
    q10 = self.clean_data.get('q10')

    for i in range(1,11):
        answer = 'q%s' % i
        compass = Compass(user=uid, question_id=i, answer=answer)
        compass.save()

We refactored this before launch to bypass the ORM and excecute the INSERTs in one connection:


from django.db import connection
from politicompass.models import Compass
def save(self, uid):
    q1 = self.clean_data.get('q1')
    q2 = self.clean_data.get('q2')
    q3 = self.clean_data.get('q3')
    q4 = self.clean_data.get('q4')
    q5 = self.clean_data.get('q5')
    q6 = self.clean_data.get('q6')
    q7 = self.clean_data.get('q7')
    q8 = self.clean_data.get('q8')
    q9 = self.clean_data.get('q9')
    q10 = self.clean_data.get('q10')

    sql = ""
    for i in range(1,11):
        answer = 'q%s' % i
        sql += "INSERT INTO facebook_compasses (user, question_id, answer) VALUES (%s, %s, %s);" % (uid, i, answer)

    cursor = connection.cursor()
    cursor.execute(sql)
    connection._commit()

There were other performance-conscious moves we made along these lines, and still, once the app started to grow in popularity, we had users submitting that form in such numbers that our DB server load stayed at a freakishly high level. (NOTE: Prior to Facebook, we normally ran at about a .20-.35 load. Once the Facebook app launched, our load jumped up into the 3.00-4.30 range depending on site activity.)

Tuning Postgresql

I had already tuned Postgresql once for some spikes we had encountered when some of our apps were linked up by MSN and MSNBC. These tunings included raising the max_connections limit and bumping up the amounts for the following settings:


shared_buffers
work_mem
maintenance_work_mem
max_stack_depth

The most significant of these for us was shared_buffers. With the hits we had received from MSN and MSNBC, raising shared_buffers to about 1.6 GB (we have 8 on the box) and increasing max_connections was enough to keep us humming along nicely. With the Facebook traffic we had to increase shared_buffers to about half the available RAM on the box and everything dropped back to a sane level. We are running on Solaris and so we had to have our box increase the amount of shared memory available from the kernel in order to give so much RAM to shared_buffers, but again, once this happened, the load recovered amazingly well.

Hits Under Facebook

Just to toss out some raw numbers, when we first loaded our app to Facebook, we were doing about 5-10 hits a second during peak usage. We ended up doing about 2.5 million hits the first week, just from Facebook alone. We run 4 other sites off the same server. This is a single Postgresql server. We do have our two web servers behind a load balancer, and our static media is served from the normal media.washingtonpost.com setup. Needless to say, there are certainly higher numbers that other sites boast, but the single DB, with some tuning and planning, survived the spike pretty well.

Currently, we're doing about 10 million hits a month from this setup, and we're really at its limits now. To do much more, we'll have to look at replicating the database. Having said that, were it not for the Facebook traffic and a similar Newsweek week app bypassing the cache for reasons outlined above, I think we could easily do twice the traffic on the same setup. Caching really saves on DB load, so use it all you can if possible.

Apache Tuning

Luckily, we never felt the Facebook traffic from an Apache stand point. I will point out, for the sake of LAMP stack completeness, that the best trick I learned for Apache is to set MaxRequestsPerChild to something in the range of 500. This keeps Apache memory size down while also serving a decent amount of traffic per process. And if you don't know this already, never serve a Django-based site with DEBUG=True. Not only is it bad from a security stand point, but Django in DEBUG mode stores the queries run in memory, so you can quickly eat up your RAM if you forget to turn this off.

Again, this is just my experience of tuning our stack, so YMMV, but I hope sharing this info will prove useful.

Link | Posted by deryck on August 15, 2007 | 6 comments

Washington Post and Facebook, Part Two

Last time, I wrote an introduction to our development efforts around Washington Post's first app for the Facebook Platform. See that post to get an idea of what Platform is and why it's interesting. In this post, I'd like to talk more about how we used Django to serve the application. In part three, I'll talk about some performance-tuning lessons learned through the course of this development and deployment.

Callback Architecture

Facebook's Platform is based on a callback architecture. The application is hosted on Facebook, users connect to and interact with the application through Facebook, but any page for the application is returned from a callback URL running on our own servers. To help illustrate this, let's look at the process for registering an app on Facebook.

The figure below show the first few questions for the setup page for our app, The Compass.

Edit screen for The Compass

Notice there is a "Callback URL" and a "Canvas Page URL". The callback URL is the base URL on the developer's server (washingtonpost.com in our case); the canvas page URL is the base URL on Facebook's server. When you install our app on Facebook, you are redirected to the canvas page URL, which in turn fetches content from the callback. You can have any number of callback pages extending off the base. If you went to apps.facebook.com/thecompass/foo/, then that page would fetch content from specials.washingtonpost.com/politicompass/foo/.

Now you can't go directly to specials.washingtonpost.com/politicompass/ because without the POST data Facebook submits to the callback URL, the application won't work. If you hit our server directly without coming through Facebook, we redirect to the Facebook URL for our app. In fact, every time Facebook hits our callback URL there is a little setup that has to be done for each request. To incapsulate this neatly, I've got an init function that is called at the start of every view function.

The init function looks like this:


def init_facebook(request):
    facebook = Facebook(config.API_KEY, config.SECRET)
    facebook.set_facebook_url(host=FACEBOOK_HOSTNAME)

    # Ensure we're running inside the Facebook frame.
    # All Facebook platform frame pages send a POST with fb_sig.
    if not request.POST.get('fb_sig'):
        return HttpResponseRedirect(facebook.facebook_url)

    user = request.POST.get('fb_sig_user')
    if not user:
        return HttpResponse('<fb:redirect url="%s" />' % facebook.get_install_url())

    facebook.user = user
    facebook.session_key = request.POST.get('fb_sig_session_key')
        
    if facebook.session_key != facebook.get_session():
        return HttpResponse('<fb:redirect url="%s" />' % facebook.get_install_url())
    return facebook
  

init_facebook is not a view function itself. It is called from within a view function, even though it is passed the request object like a view function. To see this in action, let's look at the first few lines of the "index" view function:


def index(request):
    facebook = init_facebook(request)
    if not hasattr(facebook, 'api_key'):
        return facebook
  

I check the returned object to see if it's a Facebook object or not. If not, then it's an HttpResponse object, and I need to return that to the requesting client. Once the Facebook session has been setup (all of this through the Python client library), then we go about the business of processing forms, saving to the DB, etc. There are only a couple views used. An "index" view is used for the initial canvas page, which delivers the 10 questions and submits the answers to the database, and a "friend" view that builds the friend canvas page and provides XML to the Flash map on the friend page.

Profile Publish Model

If it's not readily obvious, every time a user hits one of our canvas pages, they end up submitting a request to our servers. There is also a part of our application that lives on the user's profile page. Once you submit the 10 question survey, we place a compass image on your profile page. Facebook doesn't call another site's URL directly from a user's profile. They use a publish model. To place something on the profile, we have to explicitly call profile.setFBML. I'm doing this right after a successful save of the compass questions to the DB.

There are advantages to the publish model, both for Facebook users and developers. For us, it means our servers don't get hit with every profile page view on Facebook. profile.setFBML takes a string of FBML as its chief argument. That FBML is cached and served from Facebook's servers. For users, this means that their profile is a little safer from being hijacked by an application. The disadvantage of this is that you have to have the user initiate the action that changes the content on the profile. This wasn't a problem for us, but would create problems for something like last.fm that would want their app to dynamically update a playlist on the user's profile.

To be continued....

Even though the profile hits are cached and served from Facebook's server farm, the canvas page traffic has still been intense. Next time, I'll go over some things we learned in tuning Apache and Postgresql given the sudden bump in traffic.

Link | Posted by deryck on June 6, 2007 | 5 comments

Washington Post and Facebook Platform Development

A little over a week ago, my group at WPNI was involved in developing an application for Facebook's latest version of their platform. If you haven't yet heard about this, Facebook Platform now lets developers create full blown Facebook applications just like Facebook's own Photos, Events, and Notes applications. Any developer can create any kind of application which can run on Facebook itself. Our first application is called The Compass. (You must be a Facebook user to view or use the app.)

Rob does a much better job than I ever could of explaining the ideas, the creative process, and the Compass itself. I thought I would try to add a little on the technical aspects of the app, specifically the Facebook/washingtonpost.com intergration, deploying on Django, and server performance issues from the Facebook traffic.

Facebook Platform

Facebook has had a developer's API since late last year. This API allowed developers outside of Facebook a way to create web or desktop applications that could integrate data from Facebook's social network. A Facebook user could go to a website I created, and given the proper Facebook credentials, login to my site and have some or all of their social data from Facebook follow them. For example, their list of friends, groups, event information, and so on.

The current, updated version of this API contains all the methods that existed before --


facebook.auth.getSession
facebook.friends.get
facebook.groups.get
facebook.evets.get
  

but also, there are new methods specific to loading an app within the context of Facebook itself --


facebook.feed.publishStoryToUser
facebook.feed.publishActionOfUser
facebook.profile.setFBML
facebook.profile.getFBML
  

The latter methods are the hooks for publishing to a section on the user's profile. Calling those methods (through the Facebook client library of your choice) would allow you to publish a string of FBML into the user's profile. FBML is where the new version of the Facebook platform gets interesting and begins to separate itself from other simple widget platforms.

FBML

FBML, or Facebook Markup Language, is essentially html (stripped of script and other potentially dangerous tags) with a few Facebook specific tags. These fb tags allow the developer to access a Facebook user's data in a generic fashion. The real advantage of this approach is that with just the UID of a Facebook user, I can load related data in a Facebook page or on the user's profile without having to do any processing on my end.

For example, this --


{% for friend in friends %}
    <fb:profile-pic uid="{{ friend }}" />
    <fb:userlink uid="{{ friend }}">
      <fb:name uid="{{ friend }}" useyou="false" />
    </fb:userlink>
{% endfor %}
  

is an example in Django template syntax using FBML that would produce the following --

Facebook renders the FBML tags for you, which has some significant advantages. The user's profile picture is always set to the current profile pic just to throw out the most obvious.

Note: in the code example above, "friends" is a list of Facebook UIDs returned by calling facebook.friends.get.

Python Client Libraries

Everything we do at WPNI uses the Django web framework. Being a Python framework, this means everything we do is written in Python (short of several shell scripts for updating code, managing deploying, etc.). The initial example library that we received from Facebook was written in PHP. There was an existing Python version of the Facebook API hosted on Google Code, but it didn't have the new methods mentioned above nor did it seem to work all that well for me. So to get started with Facebook, I had to rewrite that PHP client library in Python. I did borrow a couple methods from the original Python library, but mine is largely a one to one copy of the current PHP library hosted from the Facebook developers page.

Since I was under a bit of a deadline pressure, I didn't port every method or else I would release my version. I did notice that the current Python library on Google Code has been updated. I haven't tried it to see if it works any better. I had hoped to spend a little time after our application's launch to finish out this library and do a little more Facebook development, but other deadlines are pressing down on our team. It looks like I may not get back to this, but if that changes, I'll post here with new info. If I ever have the chance to finish out the library, I'll certainly make it public.

To be continued....

Okay, this got to be a longer post than I first imagined it would be. I'll break this into sections. In a day or two, I'll post on how the Facebook/Django intergration actually works, and a day or two after that, I'll post on the server issues we experienced and offer some performance tips for scaling a mod_python/Postgresql application.

Link | Posted by deryck on June 3, 2007 | 4 comments

Any Django People Coming To LinuxWorld?

Anyone working on/with or just interested in Django coming to LinuxWorld? If so, let me know and we'll see about an informal meetup.

Link | Posted by deryck on August 11, 2006 | 0 comments

Django .95 Is Here!

Yes, Django .95 has just been released. For those who don't run off SVN trunk, there will be some upgrade issues. Read Removing The Magic on the Django wiki carefully. The changes to Django are well worth any upgrade inconvenience. And some things are just plain easier to get done post magic removal, as I found out this week back-porting a Google site map generator app I wrote while running from SVN.

Congratulations to Adrian, Jacob, and all who contribute code for the release!

Link | Posted by deryck on July 29, 2006 | 0 comments

Django/AJAX Beating

James Bennett is taking a bit of a beating for his Django/AJAX suggestions. A lot of the criticism is unmerited Rails envy, I imagine. Rails has RJS — great! Django is not Rails.

If you want to build more than toy apps, you'll need something more sophisticated than these little server-side helper functions. And if you just want partial page updates or DHTML UI tricks, any JavaScript toolkit can make this quick and painless for you.

I also don't see why when someone says "Django's AJAX support shouldn't look like RJS," people hear, "Django isn't going to include AJAX support." AJAX, for all it's usefulness as a term, is used in many different ways. I think the confusion in this case is due to the same word being used to mean two completely different things.

Link | Posted by deryck on July 4, 2006 | 0 comments

New Job with Naples Daily News

I've taken a new job with Naples Daily News. I'll be a developer in the new media department building cool stuff with Django. I'm excited about the work and about working with Rob Curley, Eric Moritz, and the rest of the team.

I'll still be Alabama working from home, being that I just bought a house 3 months ago. I'll travel a bit more now, making monthly trips to Naples, which is a really beautiful and interesting city.

I think this is one of the best moves I've ever made. When I first started working with Django, I was so impressed by what I heard about World Online and really wanted to be doing the same kind of work and be in the same kind of environment. Now, I get that chance. How cool is this!

I finish at the library on Tuesday, July 11, and start work with Naples News the next day.

Link | Posted by deryck on July 4, 2006 | 0 comments

Now Django Powered

I finally got this site converted to Django, more or less. There are a few static pages lying around, but on the whole, I'm Django powered now. It was quite a hack job, of which I'll (maybe) relate later.

My hosting service Jump Domain doesn't really advertise Python or Django support. It can be done, though it's probably not for the faint of heart. Scott with Jump Domain has been ultra helpful and I'll try to talk with him more about what can be done to improve support for Django and Python.

Link | Posted by deryck on July 1, 2006 | 0 comments