Facebook/Washington Post, Performance Tuning

This final post about my group's work (at Washington Post.Newsweek Interactive) on our Facebook Platform app The Compass is long overdue. But now the time has come! Let's talk Postgresql and Apache performance.

In the first two posts on this subject, I wrote about the Facebook Platform itself and the Compass' architecture. In this post, we'll look at some of the challenges we encountered while serving the app and areas we focused on to improve our Postgresql and Apache performance.

NOTE: All of this is anecdotal, based on my experience with this app. I'm no performance guru and don't hold myself up as such. I think, too, different applications have different needs, and the requirements of something like Facebook could not be optimal for other situations.

Caching Limitations

As I mentioned last time, all of FBML we load into a profile is cached and served by Facebook, but the hits to our application pages are hits to our servers as well. The first thing that comes to mind with Django is, "well, make sure you have caching enabled." There are a couple reasons why this doesn't work as well as one would like.

First, the caching for a Django site is bypassed when the request contains GET or POST data. Every request from Facebook contains POST data. Each callback request has a few fb_sig* parameters that are POSTed to your page to verify the request comes from Facebook. This is great for security and passing data from Facebook back to your application, but it kills the normal caching process for Django-based sites.

Second, each request can potentially be unique. In our case, the only Facebook canvas pages we serve are the one that submits the compass survey questions and the one to display the Flash map of your friends who have installed the compass. It's hard to do much low-level caching of Django querysets because you don't want to inadvertently give the user someone else's data. We do a little of this, though. See, for example, what we do here when we display the compass based on your last answer:


cache_key = 'compass_entries_%s' % facebook.user
compass_entries = cache.get(cache_key)
if not compass_entries:
    compass_entries = Compass.objects.filter(user__exact=facebook.user).order_by('-id')[:10]
    cache.set(cache_key, compass_entries, 60 * 15)

We also reset these entries in the cache when a user resubmits the survey. So we save a few DB hits if the same user retakes the survey a few times back to back. However, there's just not much in common across users to really take advantage of Django's cache. We're pretty well left to raw DB performance.

Bypass the ORM

One of the first things we did to help performance was to bypass Django's ORM. We store the user's answer to each question via a save method on the form that is submitted. Using the ORM this would look something like:


from politicompass.models import Compass
def save(self, uid):
    q1 = self.clean_data.get('q1')
    q2 = self.clean_data.get('q2')
    q3 = self.clean_data.get('q3')
    q4 = self.clean_data.get('q4')
    q5 = self.clean_data.get('q5')
    q6 = self.clean_data.get('q6')
    q7 = self.clean_data.get('q7')
    q8 = self.clean_data.get('q8')
    q9 = self.clean_data.get('q9')
    q10 = self.clean_data.get('q10')

    for i in range(1,11):
        answer = 'q%s' % i
        compass = Compass(user=uid, question_id=i, answer=answer)
        compass.save()

We refactored this before launch to bypass the ORM and excecute the INSERTs in one connection:


from django.db import connection
from politicompass.models import Compass
def save(self, uid):
    q1 = self.clean_data.get('q1')
    q2 = self.clean_data.get('q2')
    q3 = self.clean_data.get('q3')
    q4 = self.clean_data.get('q4')
    q5 = self.clean_data.get('q5')
    q6 = self.clean_data.get('q6')
    q7 = self.clean_data.get('q7')
    q8 = self.clean_data.get('q8')
    q9 = self.clean_data.get('q9')
    q10 = self.clean_data.get('q10')

    sql = ""
    for i in range(1,11):
        answer = 'q%s' % i
        sql += "INSERT INTO facebook_compasses (user, question_id, answer) VALUES (%s, %s, %s);" % (uid, i, answer)

    cursor = connection.cursor()
    cursor.execute(sql)
    connection._commit()

There were other performance-conscious moves we made along these lines, and still, once the app started to grow in popularity, we had users submitting that form in such numbers that our DB server load stayed at a freakishly high level. (NOTE: Prior to Facebook, we normally ran at about a .20-.35 load. Once the Facebook app launched, our load jumped up into the 3.00-4.30 range depending on site activity.)

Tuning Postgresql

I had already tuned Postgresql once for some spikes we had encountered when some of our apps were linked up by MSN and MSNBC. These tunings included raising the max_connections limit and bumping up the amounts for the following settings:


shared_buffers
work_mem
maintenance_work_mem
max_stack_depth

The most significant of these for us was shared_buffers. With the hits we had received from MSN and MSNBC, raising shared_buffers to about 1.6 GB (we have 8 on the box) and increasing max_connections was enough to keep us humming along nicely. With the Facebook traffic we had to increase shared_buffers to about half the available RAM on the box and everything dropped back to a sane level. We are running on Solaris and so we had to have our box increase the amount of shared memory available from the kernel in order to give so much RAM to shared_buffers, but again, once this happened, the load recovered amazingly well.

Hits Under Facebook

Just to toss out some raw numbers, when we first loaded our app to Facebook, we were doing about 5-10 hits a second during peak usage. We ended up doing about 2.5 million hits the first week, just from Facebook alone. We run 4 other sites off the same server. This is a single Postgresql server. We do have our two web servers behind a load balancer, and our static media is served from the normal media.washingtonpost.com setup. Needless to say, there are certainly higher numbers that other sites boast, but the single DB, with some tuning and planning, survived the spike pretty well.

Currently, we're doing about 10 million hits a month from this setup, and we're really at its limits now. To do much more, we'll have to look at replicating the database. Having said that, were it not for the Facebook traffic and a similar Newsweek week app bypassing the cache for reasons outlined above, I think we could easily do twice the traffic on the same setup. Caching really saves on DB load, so use it all you can if possible.

Apache Tuning

Luckily, we never felt the Facebook traffic from an Apache stand point. I will point out, for the sake of LAMP stack completeness, that the best trick I learned for Apache is to set MaxRequestsPerChild to something in the range of 500. This keeps Apache memory size down while also serving a decent amount of traffic per process. And if you don't know this already, never serve a Django-based site with DEBUG=True. Not only is it bad from a security stand point, but Django in DEBUG mode stores the queries run in memory, so you can quickly eat up your RAM if you forget to turn this off.

Again, this is just my experience of tuning our stack, so YMMV, but I hope sharing this info will prove useful.

Posted by deryck on August 15, 2007

Comments

Jacob Kaplan-Moss August 15, 2007 at 10:31 p.m.

Interesting stuff, Deryck -- thanks.

One tip: use cursor.executemany() instead of your bundled INSERT statements. It'll be faster, but more importantly won't open you up to the eggregious SQL injection errors you've got in your code. The code as you've got it here leaves you wide open for some pretty nasty problems -- imagine if q1 is "1); DELETE FROM auth_user;"...

http://dpaste.com/16983/ is more like the code you ought to be using.

Brian Harring August 16, 2007 at 3:54 a.m.

The 500 limit is a bit low imo, unless you've got a lot of vhosts in use...

deryck August 16, 2007 at 9 a.m.

@Jacob

Thanks for dropping by! :-)

I think I'm safe from SQL injection because I validate the form first. The answers are from a ChoiceField which can only return ints. I check that the form is valid before I call save. But I probably should have mentioned this.

As for executemany, I actually considered that, but when I looked at the psycopg2 source, it looked to me like executemany just iterated over the list calling execute for each item. I actually thought mine was a bit quicker, doing this in one execute. If I misread or misunderstood what's happening with these methods, please correct me.

But now, I do like that transaction decorator, which stupid me, didn't even think of.

@Brian

Welcome also to you, sir! (/me is glad for some readers)

Again, this was just my experience with trial and error. I tried all sorts of values from 250-2000 and 500 seemed to be a sweet spot for us in terms of memory usage per process and server load. I don't have a lot of virtual hosts. About 10 total, split in 5 for one Apache instance and 5 for another (the staging server for the live environment), so I'm not sure why this works well, but for us, it does.

Jacob Kaplan-Moss August 16, 2007 at 10:05 a.m.

@Deryck: yeah, if you're validating the form then you're probably OK on the injection front.

Personally, though, I'd never use string interpolation for SQL; it's just too dangerous. Say you make a small mistake in your validation that allows raw strings to creep through. If you're using bound arguments the bug stays minor, but if you're interpolating SQL strings then the small bug turns into a big security hole. I'm by no means an expert here, though; there's a lot of room for differing opinions.

I hadn't realized that psycopg2 iterates over the query instead of using low-level features of libpq; that's kinda janky. Other backends will probably make executemany() work faster...

Of course, if you want *really* fast bulk inserts and you're OK being PostgreSQL only, you might want to take a look at COPY -- it's quite a bit faster than INSERT. I think psycopg2 exposes COPY as cursor.copy_from and cursor.copy_to, but that's off the top of my head; could be wrong.

OK, so more information than you needed. With apologies to Twain, sorry about the long comment; I didn't have time to write anything shorter.

deryck August 16, 2007 at 10:24 a.m.

@Jacob

Thanks for the follow up!

I certainly agree that avoiding string interpolation is best if possible. We're in *complete* agreement about the dangers, and that's another reason I should have been more explicit about what I was doing rather than just a small excerpt. In this case, performance was more of the killer for me. There are other security precautions, too, i.e. having to submit from the Facebook canvas page, which aren't clearly outlined here.

I just didn't see a way to do it in bulk without string interpolation, given what I said about executemany. (Again, if I'm wrong on that, maybe a psycopg2 guru will stop by and correct me.) And for now, I am okay with Postgresql only, as I don't think we'd get much by-in for open sourcing our apps (Rob would, of course, but I don't know about the rest of the company) or have a need to migrate off Postgresql. So I may look into COPY then.

Prasanth Nair September 9, 2007 at 8:38 a.m.

Deryck, if you don't mind me asking, what are your server specs (CPU, Processor) and where are you hosting? Very useful comment on shared buffers. Thanks.

Post a comment

Comments temporarily closed.