Site under volume
This is one of those “request for thoughts” posts that’s only going to be of interest if you’re wonky into servers/etc/Wordpress/MT/etc and want to talk about things like caching through .htaccess and mysql optimization. Sooooo I’m just going to throw the more tag up now
Whew. Thinned out that readership pretty quick.
We got fairly well brutalized, as you saw if you were here today. Our server load today averaged – averaged – 40 for about eight hours. 40. On a brand new dual proc box donations bought us. We took 200k+ hits today in a few hours with the Soriano rumor. Generating a non-cached page (like on the backend, for us authors) took 30-60s if the database connection didn’t time out.
I am not shitting you. With WP-cache on and everything. Essentially it was all mysqld, too – one proc was almost entirely idle while mysql thrashed its disk (disk was ~30% busy)(no, really). I’m increasingly convinced Wordpress can’t scale – that once you’re as large as we are, you’re SOL because it thrashes so hard generating pages.
So! I’m about ready to hit the eject button and port to Movable Type unless we can figure a way to get Wordpress to stay up under load.
But here’s my call – is there a fix? Am I really going to reach past the increasingly weird .htaccess mods I’ve been pulling off and into mysql optimization? Is there possibly some crazy bug?
Or is it MT or bust?
DMZ


I suspect you’ve seen this article on scaling via load balancing multiple machines. But just in case you haven’t…. there it is.
Excuse me, sir. Seeing as how the V.P. is such a V.I.P., shouldn’t we keep the P.C. on the Q.T.? ‘Cause if it leaks to the V.C. he could end up M.I.A., and then we’d all be put out in K.P.
I imagine this is a Dell server. What type of RAID back-end are you using? How many spindles?
My profession is enterprise disk storage, so I may be able to help there. Do you know what kind of seek times you’re seeing on the disk?
What base OS are you running? It would seem that MySQL, at least on your system isn’t threading across multiple processors. I don’t think that’s a MySQL limitation, as I’ve seen it scale across procs before. Which MySQL storage engine are you using? Table types?
I’d be happy to supply that, but I don’t know how I’d find that stuff out. We’re running BSD. No RAID: there’s a disk, and there’s the disk I back stuff up on to. We’re a tiny shop.
As a baseline for my level of tech knowledge, I’m able to monitor the site via top/systat and I’m not horrible with SQL, but it’s slow going.
And TIF — WTF are you talking about? I’m baffled.
I’ve heard WordPress doesn’t set up good database indexes. I don’t know if that’s still true, but if so, that’s your problem. Any way to log the SQL statements that take the most time? If you can get that, building good indexes should be a piece of cake.
I’ve got access to the DB, if there’s a way to log I’ll go do it.
Please, keep talking about computers, its turning me on
Most likely the data being rendered for the site is coming out of memory (database caching), so the disks shouldn’t be a huge issue other than writes for comments and occasional reads for older threads, *unless* you have insufficient memory and it’s paging to disk. Don’t misunderstand me – you would get better performance out of a RAID set, but if the picture you posted when you purchased was what I thought (a Dell 1850 or whatever they’re calling that line now), you only have room for two disks and RAID 0 isn’t all that great (performance better, reliability worse by introducing two possible points of failure for the same data).
I’d check your memory performance using vmstat, paying attention to “swap -> so” (pages of memory swapped to disk) and “swap -> si” (pages of memory swapped in from disk). I usually like a 3 second interval with 200 or so iterations.
Another issue would be why MySQL is only using one processor. I’d check to make sure your version of Linux (assume this is what you’re running) is running a multiprocessor kernel. Linux has to be able to use both of them in order for MySQL to use both.
If your MySQL processes are thrashing, then brasten is almost certainly right–whatever query it is that’s fetching comments is probably doing a full-table-scan or something like it, and once you get several thousand of those going it doesn’t matter how new your box is. You might try rebuilding the indexes by doing an analyze (article here: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html) and see if that helps you any.
Also found this on the MySQL site:
MySQL Platform Selection
BSD.
Well, now I don’t have good stats of course, since we’re way, way off load.
vmstat doesn’t show me those two things – is there a flag I’m missing?
So the natural next question is – is there a good way to monitor mysql performance and see what’s thrashing it so badly?
TIF was quoting Robin Williams from Good Morning, Vietnam. Sort of mocking all the tech talk TLAs.
Which flavour of BSD, DMZ? Nevermind, I see it’s FreeBSD. I’m not as savvy with FreeBSD as I am with Linux (it’s been ~8 years since I’ve even had it installed anywhere), but are you certain you’ve installed the SMP kernel? I know in Linux you have to explicitly request the SMP kernel during install, and a quick Google on FreeBSD dual-CPU suggests that the default kernel in FreeBSD isn’t SMP aware, either. As Jerry mentioned, MySQL should be using both CPUs, and it would appear it isn’t. From my quick Google search:
I would start there, assuming you’ve not done so already.
Jimmie the Ducati 1098 loving Geek
I just finished watching Good Morning, Vietnam for the first time ever and thus I can shed a little light on TIF’s comment above – it’s a Robin William’s quote from the movie
I’ve been looking into improving WP speeds as well lately, but without using WP-cache as I’ve yet to see that make any significant improvements. Honest to goodness, I’m surprised you’ve still got WP running with 200k hits.
I recently found this article which talks about significantly improving WP’s speed by using something called APC, which worked great once he tweaked it a little (he gives instructions for his tweaks). It looks promising, but I can’t vouch for it personally as I haven’t tried it yet.
I’m not well versed on the backend stuff, but I swear I’m going to get WP to work flawlessly on a large site one of these days!
Is there an easy way to see if that’s already been done? I didn’t build the kernel myself, and I’m leary of tampering with it unless I have to (I’ve been trying to find this through reading documentation and I haven’t gotten anywhere)
Just to remind you again, if you switch to MT, you’re going to have the CPU spending all its time rebuilding the pages, rather than PHP. Yes, you can do PHP with MT, but it’s just like running WP. And I’m starting to think PHP is a dead-end, anyway. I’ve never liked its scalability compared to other server-side languages (ASP, JSP).
I think everyone is on the right track, tho — improve MySQL’s performance and everything should be easier. I think it’s a config issue.
I know there’s a MySQL monitor out there. Just need to remember where it is.
DMZ, type ‘dmesg’ at the command prompt (no quotes) and look for something like this:
If the output of dmesg is too much to scroll back, you can try:
dmesg | grep -i smp
or
dmesg | grep cpu
and see what shows up.
Jimmie
I found a couple – mytop, mtop… if I have time, I’m going to look into that tomorrow. We need to pay off a tech wonk. Or, I’m going to nut up, learn all this and put it on my resume if I go back into the IT workplace.
Sweet! Notable lines from dmesg:
* OH YEAH
Great news, then all you have to worry about is tuning MySQL. Since I know exactly FA about databases, I’ll shut up now, other than tossing this link out there for you:
http://wikitest.freebsd.org/MySQL
Good luck, sorry I couldn’t help out.
Jimmie
Wheeeeeeeeeee! I can already see that I am going to be having some fun during the rest of the winter meetings.
I was idly musing earlier about what it would take to write a an AJAX app specifically for serving game threads (which would apply in these winter meeting rumor threads too)… the client keeps track of the most recent comment it has displayed, and passes in id for it back on the request, so the server only has to push down the incremental changes. You could even run the client on a timer so it would fetch automatically say, once a minute (which might stop people from hitting the refresh button obsessively). Kind of like how finance.google does its stock quotes. Hmmmm….
Anyway, it does sound like your MySQL install is out of whack. I’ve never been particularly impressed with MySQL but it should be able to use both processors at the very least (and your webserver should be making use of both of them as well). And as someone else said, I wouldn’t be surprised if you’ve got some indexing issues. Doesn’t Wordpress document this stuff?
Wordpress’ documentation and support in general is…
… inconsistent?
A few thoughts from a professional software engineer:
-Do you administer your MySql database yourself? Is MySql’s query cache properly configured? Do you have your indexes designed properly for all tables? (Improper indexes can degrade performance by, oh, three orders of magnitude)
-WordPress sucks. It just doesn’t scale well; sorry dude. One thing that would probably make a huge difference, though, is to implement a paging mod that serves up comments in those long threads at 25 per page instead of 150+ in one pop. That way old pages in a post get cached and aren’t getting dynamically served up every single time; only at most the most recent 25 posts. That’d lower your MySql bandwidth by, oh, at least 84% or so.
-More later I’m late this morning…
Paged comments plugin:
http://www.keyvan.net/code/paged-comments/
I think there’s another one someone else wrote too.
DMZ, if you are looking to get out of WordPress, let me offer you a suggestion–Expression Engine (which is what my site uses, as does BBTF):
http://www.pmachine.com/ee/pricing_personal/
I know BattersBox, which probably is on the same scale as USSM, uses GeekLog.
***
I can’t help you with your box, and not really sure why you aren’t going with a Web Host. On my site, I get 2,000 GB of bandwidth a month for the basic plan, and I only use 2 GB. I’ll guess you generate up to 100(?) GB a month, which puts you well within the 2,000 GB level.
On the other hand, getting your hands dirty is a great way to get your resume up-to-date. It could turn out to be a huge positive at the end of the day.
This isn’t a bandwidth issue, it’s an issue of the software on the machine the site is hosted on. Your site is shared on a box with a hundred others. USSM is on its own dedicated machine.
w/r/t paged comments:
the plugin doesn’t work. We’ve tried it. It blows up. I don’t know why, and I haven’t investigated thoroughly, but it doesn’t.
In order – sort of!
No idea!
No idea!
Interesting – the only issues I see in Wordpress tables:
in the user table:
UNIQUE and INDEX keys should not both be set for column `user_login`
in the options_values:
UNIQUE and INDEX keys should not both be set for column `option_id`
… and that appears to be a PHP myAdmin problem, rather than an actual db issue.
Though maybe not on the user table.
So here’s the scariest thread of the last ~12h or so: in reading the Wordpress scalability information I’ve found, I keep coming across stuff like “hey, can WP handle lots of comments?” “Sure, why I’ve seen comment databases as large as 50 megs!” “Wow, that’s amazing!” (I look, we’re at 80)
Would it make sense to prune threads that are more than a week old to flat files? It seems to me that few new comments would be added anyway and it would be a huge reduction in database size.
That’s an interesting idea. I have no idea how I’d do that.
8:30 and load’s to 3. Wheeeeeeeee.
Either do I, but it’s the kind of thing I imagine someone’s done before and as such might already be a WordPress mod somewhere.
If it is I haven’t come across it yet.
Fun MySQL –
when I run ‘mysqladmin extended’ to look at some of this stuff, I don’t see any values for the qcache stuff at all – no hits, no misses, nothing. Does that mean query caching’s turned off? Really?
#31:
That tells me that the indexes are probably just set to the primary key of each table. It’s redundant because the primary key is indexed anyway. It’s bad because it means there aren’t any actual indexes set up on the table, and if anywhere in your web-app you’re doing a “SELECT {yada} FROM {yada} WHERE {uh oh}”, you’ll have a performance hit if your WHERE clause is referencing a column that’s not the primary key because nothing else is indexed.
This is a book I recommend, with a free chapter online
I just cranked query caching on. We’ll see how that does.
#28: yes, I know.
The first part of my post made a suggestion of getting out of WordPress and onto something that will scale.
My second part was asking why is he on a dedicated server, since a shared server may offer better pros than cons.
#33: Nah, MySql should have no problem with 80 megs if it’s set up correctly. There are websites online that get tens of millions of hits a day and use MySql (obviously with better hardware). Slashsot.org is one… and they get so much traffic that any website it links to immediately gets “slashdotted,” or in other words, melted into a chaotic pile of smoldering plastic.
#37: Yeah, that’s bad. The query cache is really important for performance. Get that puppy turned on!
Well, disk busy dropped through the floor immediately, that’s something.
Your post discussed bandwidth and suggested getting a web host. The problems we’ve been experiencing aren’t bandwidth related. I’m sure USSM has a web host, they just have their own dedicated machine at that host. Adding more sites to the machine they’re on wouldn’t help anything.
I’m pretty sure we’re talking about the sizes of WordPress databases, not MySQL databases. There are many, many, many MySQL databases bigger than 50 megs, obviously. I’m not sure how many WordPress sites are chugging along with databases this big, and that’s the issue.
In the user table, I show three indexes:
name, type, field
–
primary, primary, (ID)
user_login, unique, (user_login)
user_login_key, index, (user_login)
boy, why would they set it up that way? that’s weird.
I’m going to let the new, query-caching beast run for a minute and get some breakfast.
It’s hard to say anything specific about indexing without being elbow-deep in the back-end of USSM. It’s kind of heavy on the propeller-head end of things. You may not need to bother, though. The difference between having a query cache and no query cache at all is pretty darn big — it might be enough, especially if the bottleneck was disk I/O.
I didn’t even manage to get to breakfast before traffic spiked, load went over 20, and I turned WP-cache back on. Jeeeeeeeez.
At this level of headache, isn’t someone extremely well-funded supposed to ring our virtual doorbell and offer us $bazillion and take over the back end problems for us?
#43: Brian, I already offered a possible solution to the software issue, and that’s to change software (WordPress to EE). That was the first part of my post, and directly related to the issue at hand.
The second part was an aside, and unrelated to the issue at hand. Ok, so he has a single dedicated machine, which sounds like he’s responsible for, for maintaining. He bought it. He’s not leasing it directly from the webhost. I was asking why have such an arrangement.
And it may be related to the issue at hand, since some webhosts offer WordPress preconfigured, and will administer it themselves. Not to mention doing disaster recovery, load balancing, etc.
Can we, for purposes of this discussion, assume that the scope of the question is limited to how best to run USSM on the resources at hand?
When the server was purchased, DMZ did mention that the break-even point for purchasing over leasing was something as short as 8 months, so owning their own server makes a lot of sense.
How much RAM is in the db server? A good solution is to add more, and then increase the MySQL buffer sizes. Some keys you might want to look at are “read_buffer_size”, “sort_buffer_size”, and “key_buffer_size”. When these are bigger, MySQL will hit the disk less often.
This is what I’ve got for a server with 2GB RAM. Please keep in mind that I (more or less) have no idea what I’m doing, so YMMV.
[my.cnf]
set-variable = read_buffer_size=1M
set-variable = sort_buffer_size=8M
#
# Set key_buffer to 5 – 50% of your RAM depending on how much
# you use MyISAM tables, but keep key_buffer_size + InnoDB
# buffer pool size
DMZ, if it’ll help, I’ve got a WordPress/MySQL installation preconfigured with my webhost. If you want me to compare various config files, or whatnot, let me know.
At this level of headache, isn’t someone extremely well-funded supposed to ring our virtual doorbell and offer us $bazillion and take over the back end problems for us?
Yes!….in 1998. Ten years ago, you’d have sold out for hundreds of millions of dollars (most of it in stock that if you didn’t sell quickly would’ve been worthless paper by the middle of 2000). Today, well, you’re debugging MySQL indexing over breakfast. Welcome to the dotcom crater.
It’s hard to say anything specific about indexing without being elbow-deep in the back-end of USSM.
That’s very true. As a general rule of thumb (really general, so you guys who spend your days looking at query plans can just look away now, ok?) you want to index any column that shows up in the WHERE or ORDER BY clause of any frequently-executed query. My guess would be the most important one for your purposes would be whatever determines the “topic” of a post (so that, for example, we see comments on this thread in this page and not all the other topics and all the other comments for those topics). There may be some others, depending on how denormalized your tables are and what queries go into assembling a page (the comment timestamp or ordinal might be another one, since that appears to create the ordering of the comments on a page). In general indexes improve Select performance and degrade Insert/Update performance, but USSM sees vastly more reads than writes so that’s the right tradeoff (you don’t want to index everything, however, because you’re creating more work for MySQL and your server, and that’s already the bottleneck).
I’ll leave the details to the other people here who clearly know more of MySQL than I do. I’m more of a SQL Server guy (I do everything as stored procs as much as possible, and since MySQL didn’t have that feature until recently I never bothered to learn it).
And how much RAM is in this server?
#54: Generally true.
I’d add a provision that the table should be large enough. Indexing forces you to make two reads, one for the index (very efficient), and the other to access the data in the table. A table scan will do the two at the same time. I join Oracle tables with thousands of records, and the index doesn’t help. When I’m in the millions, it helps. It really all depends on the DBMS.
As for the inserts, I guess it’s true for USSM that there aren’t “many”. But, dropping the index and rebuilding might be worthwhile.
An 80MB mysql installation shouldn’t be a problem. Mine at Baseball Toaster is 650 MB right now, on a shared server, and it’s not an issue.
I think joser and some of the other comments are on the right track: you’ve most likely got some query that’s doing a table scan, and you need to build an index on that table.
Is the source code for WordPress available for inspection somewhere? If you pair up the queries that might be causing the trouble with some “SHOW INDEXES ON tablename” results, you might see where you’d need to build that missing index.
Email me at support at baseballtoaster.com if you have more questions.
Okay, so, I’m back.
Our box has 2 gigs of RAM on it. I don’t know what the values for buffer sizes are.
It’s holding up amazingly well today.
And w/r/t size – the issue isn’t that the USSM install is 80mb, it’s way larger than that. My point was that it’s been a little weird, while researching scaling, to find some reference to an amazingly huge comments table of 50mb and realize we’re over 50% larger than that.
And in general, I don’t think this is a mysql issue, though I’m still a little confused about why it’s not multi-threading, which would help.
33, 34 — while probably a good idea in the abstract sense, I doubt putting old threads to flat files will make any real difference. People aren’t bringing USSM down by reading the archives — they’re bringing it down by refreshing continuously and commenting incessantly on the top thread of the day.
Looking at some of the links posted earlier, this mentions
and this says
Oops, the code in that second quote botched the HTML. Let’s try that last bit again:
and this says
WordPress’ post queries always ask for posts with post_date_gmtThat latter link offers a plugin to fix the issue (it sounds like it just substitutes a literal value for the 'now' variable) which might be worth a try, combined with turning on the qcache.
We suspect. I still haven’t found any good tools to monitor what’s actually going on within mysql that’s so brutal.
It’s a fairly good bet, though. Use of WP-Caching does seem to dramatically slow them down, though it also does seem that they’re still able to generate a new page if they’re annoying enough about it.
OK, I give up, clearly wordpress is not going to let me quote that (either because it is trying to prevent some kind of PHP injection attack, or because I’m effectively executing one, I’m not sure). At any rate, have a look at that link — there’s a plugin suggested that is supposed to fix the problem (apparently by substituting a literal for the ‘now’ variable) when combined with turning on the qcache.
Okay. Thanks.
That link is gold. If I wasn’t so depressed about the team, I’d be overjoyed.
vmstat for BSD:
Look for pi and po (same as si and so on Linux). Also, keep an eye on sr (scan rate – the number of times the system is scanning memory for pages which it can free up) – this is a direct indicator of a memory issue.
You’ll want to turn on the slow query log if it’s not set up that way already, but when the site is getting hammered, chances are it’ll over-report. The other thing that comes to mind is ’show processlist;’, in the mysql shell tool, which will describe the queries it’s currently acting on. Usually this tells you what you’d expect, but occasionally you’ll learn something useful, and it will show you how long each active process has been hanging around.
Well… it may yet be, in conjunction with how Wordpress works. With all the info posted so far, if you’re still seeing one CPU used WAY more than another I would STRONGLY suspect it’s one single process or query that’s the problem. Multithreading over SMPs only works of separate threads are generated. If it’s one single thread that never splits, the process will not utilize the other CPU.
Jimmie
model name : Intel(R) Xeon(TM) CPU 3.40GHz
I’ve seen some enormous MT sites, including a few we run at work, and I/O bottlenecks from rebuilds just aren’t a factor. That includes full rebuilds, which are only necessary when you need to make some sort of global template change; the incidental rebuilds as new entries are posted and comment counts get incremented don’t even amount to background noise.
You’re still somewhat dependent on a database for comments, but you’re no longer at its mercy to do simple things like loading the front page (or any article pages, if you put the comments off by themselves).
Really? That seems counter-intuitive.
Also: have you seen how smooth the site’s running today? Same volume as yesterday.
MySQL optimization strikes me as a total rathole vis-a-vis the fundamental question, which is: why is a read-heavy, write-light site optimized for read-expensive, write-cheap? In other words, serving an article page with comments, which is by far your most common operation, should be your cheapest. It should never hit the database. You don’t have customized modules, dynamic content, ajax, or any other complexity. You’ve got a big hunk of text that doesn’t change until someone new comments, and yet you go through the work of rebuilding that page (identically) every time someone hits the article URL. Bad architecture.
Go with an architecture that writes articles to disk and let apache take care of serving it. PHP should only do work when there’s work to be done.
So that’s a vote for.. MT?
What is it that seems counterintuitive?
Well, the difference between MT and Wordpress, as it’s always been explained to me, is that while WP is (to whatever degree) dynamically generated, MT has static pages to serve but for each change or comment has to regenerate that page. It seems like that operation would be more than a trivial drain on system resources.
The site has been running MUCH better today for me. Maybe enabling the query cache (and whatever else was done) hit the root of the problem.
How is it for everyone else?
I know! Isn’t it awesome?
Well, the difference between MT and Wordpress, as it’s always been explained to me, is that while WP is (to whatever degree) dynamically generated, MT has static pages to serve but for each change or comment has to regenerate that page. It seems like that operation would be more than a trivial drain on system resources.
Except that WP has to do exactly the same work to regenerate the page, and it’s doing it for every request. A far better design for a middleware layer (call it X — Wordpress, Movable Type, whatever):
1. Page is first requested
2. X queries the db and generates the page
3. The page is cached somewhere and Apache serves it
Now every subsequent user agent that requests the page gets the cached version from Apache, with no db hit or middleware work required
When a user submits a comment, the cached version is thrown out and a new page is regenerated to be sent down to the submitter. That page goes into the cache for all subsequent requests
I know nothing about MT or WP, but it sounds like MT does this the right way — the work is only done when there’s a write, not when there’s a read.
hey guys, ive had a few beers here, but since i do web dev (not networking) for a living ill chirp in my not thought-out thoughts: mysql optimization is a waste of time. it isnt going to save you any more resources in your time than hardware resources. if you dont have any bandwidth issues, if its all server performance, then just get another box and split the loads. you guys seem to be growing at a steady rate and pinching performance barely gets you buy. my sugg is to just spring for a new box. with load balancing it should be fine. maybe get some cheap machines at re-pc and multiply that sh*t. – another donation round before the new season- its ok, we love your content, we dont like your outages!
My counter-argument to that is that on Day 1, USSM was essentially unusable under extreme load and on Day 2, after I turned on query caching, making some reasonable guesses about settings, it was perfectly usable all day long under the same request load.
If I can do something like that and it saves us from having to buy another $2000 server, that’s worth doing.
sid/#71:
I was thinking about a possible idea on that in terms of limiting the DB doing work, and putting the strain on Apache instead.
Store each comment as its own flat file. The database will hold the name of the file, but not the comment itself. When the application requests a page, it goes to the database to get the names of the files for the comments, and it will request it as a series of server-side includes.
I did this for my archived site here:
http://www.tangotiger.net/archives/stud0141.shtml
Each of those comments you see on that webpage is stored as its own file. If you were to see the actual source code for the page, it would be almost entirely a series of include statements.
#80: Whether the client receives their comments via a separately included flat file or via inclusion into the original HTML via the database doesn’t really seem like it should make a difference.
All a database is is a bunch of flat files with a bunch of double hash tables on top of it for quicker searching, sorting, etc. So either way, you’re still reading the comment from disk and therefore hitting your main bottleneck — disk seek time and disk I/O. Except now your excluding yourself from nice things like query caching and stuff, which can minimize disk usage.
The only time that this would be applicable was if you were doing an insane amount of writes on myISAM (or similar) tables, which lock the entire table whenever it’s getting written to, meaning no one could read from the table easily. And in this case, I’d just say go to a better table style, like InnoDB. =)
Unless you’re storing your comments in CLOBs inside MySql. In my experience they’re f*&^ed up performance-wise and need some fixing. In this case, I’d say yeah, this would be a good workaround until those bugs in MySql are fixed.
#80: the work the DB does is almost totally in scanning the tables or indices for matching rows and merging them for joins. Once it’s got them, actually pumping out the data is negligible. Since Derek said the CPU was spiking we can assume that’s the bottleneck rather than I/O. The problem is that it’s doing the same expensive joins over and over again on every hit. It sounds like DMZ put some caching in the DB layer to prevent excessive querying, so that’s good. But I persist in the belief that any site that executes redundant queries on every page request is cruising for growing pains, and should be moved over to pagebuilding model. An easy first step might be dropping in mod_cache, which could cache your pages rather than rebuild. The problem is you have to figure out the expiration model so the pages are rebuilt after someone comments, and I have no idea whether mod_cache supports explicit expirations.