As you might have read in the site history, this site is in it's 3rd generation. The basic problem with maintaining a site is to make it look like one single site rather than a collection of random pages. This is very important for users since every site imposes some need for learning how to navigate around a site no matter how simple, elegant or seemingly intuitive a site is. Besides standardizing on interaction patterns, a right dose of links within the site is essential to give a sense of connectedness.
Technologically speaking, the first two avatars of my site represented different stages of evolution in website management. The first was a collection of static HTML files that used to frames to avoid duplication of repeating elements such as headers, footers, navigation etc etc. The second was largely around using a frame free design and some eye candy. At that point, I had little idea of scripting technologies used either on the server side or the client side, so I ended up writing php code that generated some javascript code. This still had the problem that if I needed to reskin the site, I had to edit all the pages. There was some overly complex XML configuration and a supporting PHP code to generate the navbar.
The need for a CMS became painfully obvious when I'm playing the part of a content producer. When I'm generating content; I really should not bother myself with the visual styling of the page. The presentational ascpect must be largely controlled outside of the content creation process. On the technical side of things, this idea resonates well with the idea of LSM. The separation of content from actual HTML also means that it becomes much more easier to have good HTML. If you are wondering what is "good" and want to know more about it, please refer to this excellent presentation. Then there is the problem of keeping sitemaps feed up to date.
I looked into random technologies (in mid 2007) like smarty, drupal, symfony, forrest, wordpress etc. etc. to solve various problems. I soon realized that even if I were to pull the best parts from each of those, I'd still not get what I want. Plus making them all work together would be complex since I was not familiar with any of those technologies. And lastly, I had been burnt enough with crictical security updates with coppermine. Software installs and updates on a shared hosting environment is nightmare. And lastly, I do care about performance. Though the chances of me getting slashdotted is pretty slim, I still care about how quickly the pages load for each visitor.
The first and the probably the most important realization I had is that ideally, a page that serves static content should be a static HTML. There is content creation, then a build process and then finally the generated files are dropped onto a webserver. The only piece of dynamic code used in generation of these pages is the little pieces of PHP that include the common elements like the navigational area and the footer. The navigational area itself is generated by using a magic file. It the probably the only violation of the dynamic rule.
I know for a fact that my primary editor is going to be vim. As much as I hate XML, I ended up choosing it as my CMS. It has some basic metadata and then a massive node for the actual content. It says nothing about either where the page will fit in the site or what the container shell looks like.
The piece of information that ties a piece of content to a url is stored outside that piece of content as a simple key value pair. This started off as XML but then I moved to YAML since it is so much more simpler. Keeping this piece of information separately lets me control which pages are currently active and which are not. Just knock off the key value pair and the page goes away.
The actual stiching of the core page content into a HTML grid along with the right title and meta tags is done using XSLT. The generated file is a php file that is mostly HTML. The only pieces of php used are includes and a couple of function calls (which have no imput arguments). This file is filtered using a HTML validator: tidy. The steps are driven using a perl script.
This is probably the most complex piece involved. The navigation area is generated in PHP by examining a CSV file. This file and the sitemaps feed is generated by examining each content file referred to in the site management config file. A nice little trick for figuring out the last modified time of a file is to use SVN keyword substitution inside each of the content files. This also lets me show the last modified time on each page. This again is done in perl
The build process itself is done using good old makefiles. rsync is used for copying the generated payload to its destination. The rsyncing step is just another rule in the Makefile.