Lately, RTH has been experiencing slow response times and intermittent timeouts. We contacted our hosting provider, Webfaction, to find out what was going on, and they identified some issues on the server. Webfaction has been working on their end to reduce the average load and to mitigate load spikes.
Of course, some of those load spikes are courtesy of RTH itself. Notwithstanding the frequently-seen claim that this site has only five readers, our overall traffic and site activity (e.g. commenting) have been growing steadily, as has the size of our database.
So we also asked Webfaction to let us know whether any RTH queries were running slowly. They identified two queries that were not optimizing to use the fast lookup indexes we had created. (I'm still not 100% sure I understand why, but it has something to do with how MySQL handles queries that sort results in descending order.)
To get around this, I tweaked the database structure, changed the queries and refactored the site code so the query results are already sitting in a table instead of having to be calculated on the fly. That in itself sped the site page generation time significantly.
We also determined that the RTH database connection pool had too many connections. During periods of high traffic, the slow queries mentioned above were keeping each database connection open longer than was ideal, while the large number of open database connections was causing overall congestion.
That congestion further exacerbated the time it took to run each query, which in turn further increased the number of queries trying to run at the same time. The result was a gridlock death spiral that caused the site to time out completely.
Between the hosting provider's server load mitigation strategy and the tweaks we made to reduce our own contribution to the load, the site now seems to be running much more quickly in general. More importantly, we haven't experienced any periods of gridlock and timeouts under heavy traffic since making these changes.
Optimizing a site for traffic is a constant business of determining and clearing bottlenecks, which then exposes the next bottleneck once traffic increases again. In that spirit, we'll continue to monitor the site performance and look for other opportunities to find efficiencies as traffic warrants.
In the meantime, we'll continue to enjoy the idea that these are the good kinds of problems for a website to have.
You must be logged in to comment.