Site under volume
This is one of those “request for thoughts” posts that’s only going to be of interest if you’re wonky into servers/etc/Wordpress/MT/etc and want to talk about things like caching through .htaccess and mysql optimization. Sooooo I’m just going to throw the more tag up now
Whew. Thinned out that readership pretty quick.
We got fairly well brutalized, as you saw if you were here today. Our server load today averaged – averaged – 40 for about eight hours. 40. On a brand new dual proc box donations bought us. We took 200k+ hits today in a few hours with the Soriano rumor. Generating a non-cached page (like on the backend, for us authors) took 30-60s if the database connection didn’t time out.
I am not shitting you. With WP-cache on and everything. Essentially it was all mysqld, too – one proc was almost entirely idle while mysql thrashed its disk (disk was ~30% busy)(no, really). I’m increasingly convinced WordPress can’t scale – that once you’re as large as we are, you’re SOL because it thrashes so hard generating pages.
So! I’m about ready to hit the eject button and port to Movable Type unless we can figure a way to get WordPress to stay up under load.
But here’s my call – is there a fix? Am I really going to reach past the increasingly weird .htaccess mods I’ve been pulling off and into mysql optimization? Is there possibly some crazy bug?
Or is it MT or bust?
DMZ
When the server was purchased, DMZ did mention that the break-even point for purchasing over leasing was something as short as 8 months, so owning their own server makes a lot of sense.
How much RAM is in the db server? A good solution is to add more, and then increase the MySQL buffer sizes. Some keys you might want to look at are “read_buffer_size”, “sort_buffer_size”, and “key_buffer_size”. When these are bigger, MySQL will hit the disk less often.
This is what I’ve got for a server with 2GB RAM. Please keep in mind that I (more or less) have no idea what I’m doing, so YMMV.
[my.cnf]
set-variable = read_buffer_size=1M
set-variable = sort_buffer_size=8M
#
# Set key_buffer to 5 – 50% of your RAM depending on how much
# you use MyISAM tables, but keep key_buffer_size + InnoDB
# buffer pool size
DMZ, if it’ll help, I’ve got a WordPress/MySQL installation preconfigured with my webhost. If you want me to compare various config files, or whatnot, let me know.
At this level of headache, isn’t someone extremely well-funded supposed to ring our virtual doorbell and offer us $bazillion and take over the back end problems for us?
Yes!….in 1998. Ten years ago, you’d have sold out for hundreds of millions of dollars (most of it in stock that if you didn’t sell quickly would’ve been worthless paper by the middle of 2000). Today, well, you’re debugging MySQL indexing over breakfast. Welcome to the dotcom crater.
It’s hard to say anything specific about indexing without being elbow-deep in the back-end of USSM.
That’s very true. As a general rule of thumb (really general, so you guys who spend your days looking at query plans can just look away now, ok?) you want to index any column that shows up in the WHERE or ORDER BY clause of any frequently-executed query. My guess would be the most important one for your purposes would be whatever determines the “topic” of a post (so that, for example, we see comments on this thread in this page and not all the other topics and all the other comments for those topics). There may be some others, depending on how denormalized your tables are and what queries go into assembling a page (the comment timestamp or ordinal might be another one, since that appears to create the ordering of the comments on a page). In general indexes improve Select performance and degrade Insert/Update performance, but USSM sees vastly more reads than writes so that’s the right tradeoff (you don’t want to index everything, however, because you’re creating more work for MySQL and your server, and that’s already the bottleneck).
I’ll leave the details to the other people here who clearly know more of MySQL than I do. I’m more of a SQL Server guy (I do everything as stored procs as much as possible, and since MySQL didn’t have that feature until recently I never bothered to learn it).
And how much RAM is in this server?
#54: Generally true.
I’d add a provision that the table should be large enough. Indexing forces you to make two reads, one for the index (very efficient), and the other to access the data in the table. A table scan will do the two at the same time. I join Oracle tables with thousands of records, and the index doesn’t help. When I’m in the millions, it helps. It really all depends on the DBMS.
As for the inserts, I guess it’s true for USSM that there aren’t “many”. But, dropping the index and rebuilding might be worthwhile.
An 80MB mysql installation shouldn’t be a problem. Mine at Baseball Toaster is 650 MB right now, on a shared server, and it’s not an issue.
I think joser and some of the other comments are on the right track: you’ve most likely got some query that’s doing a table scan, and you need to build an index on that table.
Is the source code for WordPress available for inspection somewhere? If you pair up the queries that might be causing the trouble with some “SHOW INDEXES ON tablename” results, you might see where you’d need to build that missing index.
Email me at support at baseballtoaster.com if you have more questions.
Okay, so, I’m back.
Our box has 2 gigs of RAM on it. I don’t know what the values for buffer sizes are.
It’s holding up amazingly well today.
And w/r/t size – the issue isn’t that the USSM install is 80mb, it’s way larger than that. My point was that it’s been a little weird, while researching scaling, to find some reference to an amazingly huge comments table of 50mb and realize we’re over 50% larger than that.
And in general, I don’t think this is a mysql issue, though I’m still a little confused about why it’s not multi-threading, which would help.
33, 34 — while probably a good idea in the abstract sense, I doubt putting old threads to flat files will make any real difference. People aren’t bringing USSM down by reading the archives — they’re bringing it down by refreshing continuously and commenting incessantly on the top thread of the day.
Looking at some of the links posted earlier, this mentions
and this says
Oops, the code in that second quote botched the HTML. Let’s try that last bit again:
and this says
WordPress’ post queries always ask for posts with post_date_gmt
That latter link offers a plugin to fix the issue (it sounds like it just substitutes a literal value for the 'now' variable) which might be worth a try, combined with turning on the qcache.
We suspect. I still haven’t found any good tools to monitor what’s actually going on within mysql that’s so brutal.
It’s a fairly good bet, though. Use of WP-Caching does seem to dramatically slow them down, though it also does seem that they’re still able to generate a new page if they’re annoying enough about it.
OK, I give up, clearly wordpress is not going to let me quote that (either because it is trying to prevent some kind of PHP injection attack, or because I’m effectively executing one, I’m not sure). At any rate, have a look at that link — there’s a plugin suggested that is supposed to fix the problem (apparently by substituting a literal for the ‘now’ variable) when combined with turning on the qcache.
Okay. Thanks.
That link is gold. If I wasn’t so depressed about the team, I’d be overjoyed.
vmstat for BSD:
Look for pi and po (same as si and so on Linux). Also, keep an eye on sr (scan rate – the number of times the system is scanning memory for pages which it can free up) – this is a direct indicator of a memory issue.
You’ll want to turn on the slow query log if it’s not set up that way already, but when the site is getting hammered, chances are it’ll over-report. The other thing that comes to mind is ‘show processlist;’, in the mysql shell tool, which will describe the queries it’s currently acting on. Usually this tells you what you’d expect, but occasionally you’ll learn something useful, and it will show you how long each active process has been hanging around.
Well… it may yet be, in conjunction with how WordPress works. With all the info posted so far, if you’re still seeing one CPU used WAY more than another I would STRONGLY suspect it’s one single process or query that’s the problem. Multithreading over SMPs only works of separate threads are generated. If it’s one single thread that never splits, the process will not utilize the other CPU.
Jimmie
model name : Intel(R) Xeon(TM) CPU 3.40GHz 🙂
I’ve seen some enormous MT sites, including a few we run at work, and I/O bottlenecks from rebuilds just aren’t a factor. That includes full rebuilds, which are only necessary when you need to make some sort of global template change; the incidental rebuilds as new entries are posted and comment counts get incremented don’t even amount to background noise.
You’re still somewhat dependent on a database for comments, but you’re no longer at its mercy to do simple things like loading the front page (or any article pages, if you put the comments off by themselves).
Really? That seems counter-intuitive.
Also: have you seen how smooth the site’s running today? Same volume as yesterday.
MySQL optimization strikes me as a total rathole vis-a-vis the fundamental question, which is: why is a read-heavy, write-light site optimized for read-expensive, write-cheap? In other words, serving an article page with comments, which is by far your most common operation, should be your cheapest. It should never hit the database. You don’t have customized modules, dynamic content, ajax, or any other complexity. You’ve got a big hunk of text that doesn’t change until someone new comments, and yet you go through the work of rebuilding that page (identically) every time someone hits the article URL. Bad architecture.
Go with an architecture that writes articles to disk and let apache take care of serving it. PHP should only do work when there’s work to be done.
So that’s a vote for.. MT?
What is it that seems counterintuitive?
Well, the difference between MT and WordPress, as it’s always been explained to me, is that while WP is (to whatever degree) dynamically generated, MT has static pages to serve but for each change or comment has to regenerate that page. It seems like that operation would be more than a trivial drain on system resources.
The site has been running MUCH better today for me. Maybe enabling the query cache (and whatever else was done) hit the root of the problem.
How is it for everyone else?
I know! Isn’t it awesome?
Well, the difference between MT and WordPress, as it’s always been explained to me, is that while WP is (to whatever degree) dynamically generated, MT has static pages to serve but for each change or comment has to regenerate that page. It seems like that operation would be more than a trivial drain on system resources.
Except that WP has to do exactly the same work to regenerate the page, and it’s doing it for every request. A far better design for a middleware layer (call it X — WordPress, Movable Type, whatever):
1. Page is first requested
2. X queries the db and generates the page
3. The page is cached somewhere and Apache serves it
Now every subsequent user agent that requests the page gets the cached version from Apache, with no db hit or middleware work required
When a user submits a comment, the cached version is thrown out and a new page is regenerated to be sent down to the submitter. That page goes into the cache for all subsequent requests
I know nothing about MT or WP, but it sounds like MT does this the right way — the work is only done when there’s a write, not when there’s a read.
hey guys, ive had a few beers here, but since i do web dev (not networking) for a living ill chirp in my not thought-out thoughts: mysql optimization is a waste of time. it isnt going to save you any more resources in your time than hardware resources. if you dont have any bandwidth issues, if its all server performance, then just get another box and split the loads. you guys seem to be growing at a steady rate and pinching performance barely gets you buy. my sugg is to just spring for a new box. with load balancing it should be fine. maybe get some cheap machines at re-pc and multiply that sh*t. – another donation round before the new season- its ok, we love your content, we dont like your outages!
My counter-argument to that is that on Day 1, USSM was essentially unusable under extreme load and on Day 2, after I turned on query caching, making some reasonable guesses about settings, it was perfectly usable all day long under the same request load.
If I can do something like that and it saves us from having to buy another $2000 server, that’s worth doing.
sid/#71:
I was thinking about a possible idea on that in terms of limiting the DB doing work, and putting the strain on Apache instead.
Store each comment as its own flat file. The database will hold the name of the file, but not the comment itself. When the application requests a page, it goes to the database to get the names of the files for the comments, and it will request it as a series of server-side includes.
I did this for my archived site here:
http://www.tangotiger.net/archives/stud0141.shtml
Each of those comments you see on that webpage is stored as its own file. If you were to see the actual source code for the page, it would be almost entirely a series of include statements.
#80: Whether the client receives their comments via a separately included flat file or via inclusion into the original HTML via the database doesn’t really seem like it should make a difference.
All a database is is a bunch of flat files with a bunch of double hash tables on top of it for quicker searching, sorting, etc. So either way, you’re still reading the comment from disk and therefore hitting your main bottleneck — disk seek time and disk I/O. Except now your excluding yourself from nice things like query caching and stuff, which can minimize disk usage.
The only time that this would be applicable was if you were doing an insane amount of writes on myISAM (or similar) tables, which lock the entire table whenever it’s getting written to, meaning no one could read from the table easily. And in this case, I’d just say go to a better table style, like InnoDB. =)
Unless you’re storing your comments in CLOBs inside MySql. In my experience they’re f*&^ed up performance-wise and need some fixing. In this case, I’d say yeah, this would be a good workaround until those bugs in MySql are fixed.
#80: the work the DB does is almost totally in scanning the tables or indices for matching rows and merging them for joins. Once it’s got them, actually pumping out the data is negligible. Since Derek said the CPU was spiking we can assume that’s the bottleneck rather than I/O. The problem is that it’s doing the same expensive joins over and over again on every hit. It sounds like DMZ put some caching in the DB layer to prevent excessive querying, so that’s good. But I persist in the belief that any site that executes redundant queries on every page request is cruising for growing pains, and should be moved over to pagebuilding model. An easy first step might be dropping in mod_cache, which could cache your pages rather than rebuild. The problem is you have to figure out the expiration model so the pages are rebuilt after someone comments, and I have no idea whether mod_cache supports explicit expirations.