Tuesday, December 11, 2007

Video System Upgrades

I recently discovered a defect in our video server code which was causing some servers to use up 100% of CPU cycles. On Friday we released an update to address this issue which dramatically improved our quality of service and the maximum number of viewers who can watch your channel. Here's a little chart of CPU usage over time, which shows the big drop on Friday:


You probably won't notice a big difference (unless you have 1000+ viewers on your channel), but rest assured we'll be ready when you become the lonelygirl15 of Justin.tv!

Saturday, December 1, 2007

Unintended consequences

We recently decided to add events on Justin.tv to search as part of the enhanced site-wide schedule we're working on.

We thought that adding something to search would just require couple pieces of work. We need a new template for the type of search result, and then we need to add events to our search index. Shouldn't be more than a 30 minute project, all told.

Of course, as I begin implementing it, I immediately notice a complication: every other kind of searchable item on Justin.tv can be sorted by page views, in addition to newness and best text match. So now we have to decide whether to either:
  • Special case the sort code and remove the page view sort for events
  • Add a page views counter for events, requiring a small change to the database schema and a small amount of code in several places. [1]
Neither of these options is particularly difficult, but only because we got lucky and the original change was small as well. We decided to add the counter, if only to satisfy our own OCD. The "just a template" change winds up touching 6 or 7 different parts of the site. This is not a fluke, it's typical. New features nearly always require changes you would never have dreamed of prior to implementation.

This is why second system syndrome is so hard to avoid: It's like invading Vietnam or Iraq. At first everything seems perfectly fine, but the deeper in you get the more unforseen complications emerge. Eventually you find yourself under fire from all directions, and you just want to get it over with before you bleed out any worse than you already are.

[1] As a side note, this is why standard databases suck: part of the reason I want to avoid adding a page view counter to events is that it's going to require making a schema change. Of course, I could create a page_views table that solves the problem generically, but that would have been more work to set up in the beginning, when I had no idea I'd have this problem. And because schemas are costly to change in a running system, changing over to that solution would take yet more work now.