Today I’m gonna talk about how we’re using Python to make a content management system that doesn’t suck. Who’s we? It’s me and Ian here. We’re using the tentative company name Unstoppable Rocket.
So, what’s a CMS? CMS stands for Content Management System. You’ve probably heard this before. But what does it mean to manage content? I’ll use an example: let’s say you’re a museum. Ian here works for the Cleveland Museum of Natural History, so this is for real. Museums have lots of exhibits coming and going! So if the Tesla Coil exhibit starts in 3 weeks, maybe you want it on the web site calendar now, and then a Coming Soon page in a couple weeks, and then a Featured Exhibit page in 3 weeks. What you want is to be able to schedule content changes instead of having people change things in real time. Likewise, anywhere you have admission prices (which may be on a bunch of pages), you want to show the right seasonal price. A good content management system will keep track of where all the prices are mentioned, so you don’t have to go hunting them down. And then you might want to support roles like Editor, Publisher, System Administrator, and other boring enterprise-y stuff.
So this is a pretty crowded market. In fact if you just Google for “CMS”, there’s not a single software result on the first page. It’s all pages helping you figure out how to wade through the hundreds of choices. As you can see, having a stupid name is almost a prerequisite. In fact, CMS Matrix lists a Pagoda CMS... that isn’t ours! Some Czech outfit is trying to take over our name. But I did some detective work and ours was first.
So how can we stand out?
First of all, we can share our code and let people modify and redistribute it. Even so, a lot of the most popular CMS’s are already open source: Drupal, Joomla!, Plone... so we need something else.
Here’s one big chance to get ahead. A lot of content management systems were developed before this “Web 2.0” style of web application development became popular. So they require lots of clicking and waiting for pages to reload. We’re developing with this mode of interaction in mind, so we can use new (but stable) tools like JavaScript frameworks to our advantage. The older competitors have to retrofit their whole code base to incorporate this.
We hear a lot about “web frameworks” these days. It’s just a fancy way of referring to some software that was made to construct other software in a particular style. And one thing we’re trying to do in Pagoda is capitalize on the success of an existing Python web framework (TurboGears). A lot of content management systems are their own framework! And frameworks require learning. Pagoda is not a framework. Instead, it uses an existing framework with a community around it. So if you’re a TurboGears developer, you can become a Pagoda developer with very little overhead.
We’re taking the “less is more” approach to stand out. We’re not trying to do everything involved in running a web site. The biggest example is themes. Lots of CMS’s have default templates and then swappable themes for those templates. But we’re not really targeting the type of sites that would use themes. Imagine a big museum or a university department downloading Bob’s Sparkly Green Theme. They’re going to already have professional templates and styles developed in-house! So we don’t have any default templates or themes for Pagoda. You still need a programmer for that. We just want to do one thing well.
And our single biggest drive while developing Pagoda is to make it humane. That means using understandable terminology, making things discoverable, and trying to read the user’s mind - what are they trying to accomplish? Where are they going to look for this feature? A lot of developers ignore this, but we’re focusing on it.
We’ve been developing this since February. Here’s what we’ve learned since then.
One idea behind a few Python web frameworks is to reuse existing libraries. TurboGears and Pylons both take this approach. This is sweet because developers can specialize on a specific component and make something that just does one thing well. But it also creates a moving target! TurboGears started out recommending Kid and SQLObject, but now everyone recommends Genshi and SQLAlchemy. The “best” tool is not constant! Best of breed also means that components are glued together, sometimes against their will. So if libraries are using ugly hacks to cram pieces together, that means our code relies on those ugly hacks. And they’ve bitten us.
We’ve learned that modeling content is not easy. Especially if you want the content model to support some pretty key features, like being multilingual and revisionable. I can totally understand now why the Zope people skipped this problem altogether and invented their own database! Each individual problem is not so hard to tackle with SQL: multiple languages - have a content identifier and a locale column; revisions - have a revision number; polymorphism - use a JOIN (SQLAlchemy will abstract this with inheritance). But put these features together and you have a pretty serious challenge to solve.
When we first started out, one of our goals was to not go too crazy with the plugins. We weren’t going to try to anticipate everything people will want to customize and then make all the built-in defaults swappable. And while this is still somewhat true, we realized something while developing our defaults. If we want plugins to live among our built-ins as first-class citizens, then we can’t treat our built-ins too specially! We have to make our built-ins plugins themselves that just happen to be enabled by default.
For plugins, we’re using setuptools entrypoints and namespace packages. Entrypoints are nice, simple, and open-ended. They’re basically like having multi-valued dictionaries of Python objects aggregated in a particular namespace from various packages. If you ask pkg_resources for the ‘pagoda.content_types’ entrypoint, it’ll have keys like ‘page’ and ‘event’ which are mapped to their corresponding controller classes. Namespace packages are like ‘virtual modules’ where objects from separately-installed Python packages can all be available in one imported module.
setuptools isn’t perfect, however. Whenever something imports pkg_resources, and your package uses entrypoints, it’ll import your package to scan for them! This means if we mess up Pagoda (say, with an ImportError), and someone installs it and just tries to run ipython, their ipython won’t run because it happens to import pkg_resources, which runs our broken code! This sucks!
I used to think pair programming would be really annoying. I always imagined myself just wanting to ignore the other person and hack away at my own code. But in practice, Ian and I produce our best code this way. We discuss every word in our comments and always speak up if something’s ugly or doesn’t conform to the style guide. The result is that we have a growing code base and I don’t have the urge to rewrite any of it, because it’s really solid!
Here’s the more technical part. This is how we designed our polymorphic, multilingual, revisioned content model.
So here’s a simple content type, let’s say it has 2 fields - a title and a date. It could be an event, for example, or a blog article.
Over time, the event will be updated. You’ll fix typos, maybe change the date, that kind of thing. And to keep track of the changes we’ll have a metadata table that knows about every revision to every piece of content. It’ll have stuff like the revision comment, who made the change, a timestamp. I’ll just show that every revision has that metadata record with that dark spot. Now let’s make it multilingual...
So we have an identically formed record, but the values are different. Any English text will be different in the Spanish locale. This doesn’t require a whole new table, just a ‘locale’ column. Why are the blocks misaligned? This is to show that the English and Spanish records are independently revisioned! When I change the English translation, it does not create or modify any of the Spanish translation records. But this could be improved! Notice that in this model, every time we make a translation or an update, we’ll be copying all the data into the new record. Even the data that isn’t dependent on the locale at all! Date values for example, do not differ between locales. So for this content type we can have another table keeping track of the locale-independent values...
Now we have multiple tables. When we want to get a revision at a particular time, we just take a slice at that time and get the latest revision in each table and locale. So to get the state of the content at this red line, we use the blue English record, yellow Spanish record, and the green locale-independent record - those are the last revisions on or before the red line. That set of records is the state of the content at that time. Now let’s make it polymorphic! What does that mean? Well, there are some fields that all content types will have. We can store all those in one table that can be shared among all the different content types.
I’m calling it the ‘polymorphic record’ here. In Pagoda we have a polymorphic table we call the Node table, and it stores information about the site tree, like URLs and parent pages. We can use the same time slice method to reconstruct individual revisions from multiple tables. And this is generic, so we can add any number of tables depending on how complex our objects are, and even have localized polymorphic records. This will result in very little duplication when making changes.
Yeah, so that’s complex - but not complicated. It will be very difficult for any Active Record style ORM to abstract much of that. What you need is an ORM using the Data Mapper pattern.
SQLAlchemy handles this beautifully. Classes can be mapped against complex SELECT statements, not just tables. And in addition to simple inheritance using JOINs, we can define inheritance conditions that get translated into custom SQL constructs. It’s all very neat. So we can use the revision model I just showed you and let SQLAlchemy programmatically generate the right SQL, with any number of tables taking place in the content composition. To the programmer using our models, it just looks like one normal Python class. They can easily query only specific revisions and locales, and easily make new revisions from the record instance they’re working with.
I took a few screenshots before we drove out here tonight.
Here’s what you see when you log in. We don’t plan on this Control Panel list growing much.
Here’s an unfinished Edit view. There would be options and stuff to change around on the right there. That tree on the left can have any content type: calendars, web pages, wiki entries, blog articles...
Here’s how we manage workflow. If a bunch of people are submitting changes to make to the live site, a publisher can come here and release them all at once.
The end!