Tactics for Using PHP in a High-Load Site

Tactics for using PHP in a high-load site

No two sites are alike. You really need to get a tool like jmeter and benchmark to see where your problem points will be. You can spend a lot of time guessing and improving, but you won't see real results until you measure and compare your changes.

For example, for many years, the MySQL query cache was the solution to all of our performance problems. If your site was slow, MySQL experts suggested turning the query cache on. It turns out that if you have a high write load, the cache is actually crippling. If you turned it on without testing, you'd never know.

And don't forget that you are never done scaling. A site that handles 10req/s will need changes to support 1000req/s. And if you're lucking enough to need to support 10,000req/s, your architecture will probably look completely different as well.

Databases

  • Don't use MySQLi -- PDO is the 'modern' OO database access layer. The most important feature to use is placeholders in your queries. It's smart enough to use server side prepares and other optimizations for you as well.
  • You probably don't want to break your database up at this point. If you do find that one database isn't cutting, there are several techniques to scale up, depending on your app. Replicating to additional servers typically works well if you have more reads than writes. Sharding is a technique to split your data over many machines.

Caching

  • You probably don't want to cache in your database. The database is typically your bottleneck, so adding more IO's to it is typically a bad thing. There are several PHP caches out there that accomplish similar things like APC and Zend.
  • Measure your system with caching on and off. I bet your cache is heavier than serving the pages straight.
  • If it takes a long time to build your comments and article data from the db, integrate memcache into your system. You can cache the query results and store them in a memcached instance. It's important to remember that retrieving the data from memcache must be faster than assembling it from the database to see any benefit.
  • If your articles aren't dynamic, or you have simple dynamic changes after it's generated, consider writing out html or php to the disk. You could have an index.php page that looks on disk for the article, if it's there, it streams it to the client. If it isn't, it generates the article, writes it to the disk and sends it to the client. Deleting files from the disk would cause pages to be re-written. If a comment is added to an article, delete the cached copy -- it would be regenerated.

Using PHP for medium to high-load public websites

PHP works fine for just about any size server. The question isn't really the programming language but the infrastructure you set up. 1000-5000 users is not very many unless they are all banging on the site at the same time. Are they doing a lot of DB queries or consuming a lot of CPU resources? If so, then you may want to look at a dedicated MySQL server for the DB queries.

I have nothing against frameworks. However, you are usually shoehorning your problem into their solution. Careful design on your part with common routines, etc., are usually just as good as a framework in my opinion. However, some people are more comfortable working within a framework because it removes some of the plumbing issues.

A lot of large sites use PHP. It may not be obvious because they hide the extension of the scripts in the URLs.

What are the best practices for handling high-load file I/O?

If the requirement is for the client to interface with a CSV file, you don't need to actually use the CSV file as the datastore. Instead, use a database, do all your work in a database, and let PHP generate the CSV file on demand.

So, if a client needs to access http://example.com/SUBSCRIBERS.CSV, just have PHP handle SUBSCRIBERS.CSV and use something like:

header("Content-type: text/csv");
$data = get_subscriber_data();
foreach ($data as $row) {
// $row is an array of columns
print implode(',', $row);
}

Why is PHP apt for high-traffic websites?

What you'll usually find is that it's not as slow as you think. The reason a lot of sites are slow is because the hosts are overloaded.

But one primary benefit of PHP over a compiled language is ease of maintenance. Because PHP is designed from the ground up for HTTP traffic, there's less to build than with most other compiled languages. Plus, merging in changes becomes easier as you don't need to recompile and restart the server (as you would with a compiled binary)...

I've done a considerable amount of benchmarks on both, and for anywhere under about 50k requests per second (based upon my numbers) there really isn't a significant gain to using a compiled binary (FastCGI). Sure, it's a little faster using compiled C, but unless you're talking Facebook level traffic, that's not really going to mean significant $$$. And it's definitely not going to offset the relatively rapid rate of development that PHP will afford in comparison to using C (which more than likely will require many times the code since it's not memory managed)...

PHP, if properly written can be quite scalable. The limiting factors are typically in your database engine. And that's going to be a common factor no matter what technology you use...

What would be considered average or high load for a website using mysql on the backend?

More than the number of queries, what probably matters is what they do, and how : if you have millions of rows in your tables, and are not using the right indexes, you server will fall... If your queries are ultra-optimized, with the right indexes, and/or you don't have much data, you server will live.

You might want to use EXPLAIN on the queries that are the most used, to see if at least those are optimized / using indexes ;-)

Then, you will probably want to add so kind of caching mecanism, like APC or memcached ; at least if you can...

For instance, the lists states and countries probably never change : it could be cached, to not hit the database thousands of times, but just, say, once a day or once an hour.

Best practices for withstanding launch day traffic burst

To prepare or handle a spike (or peak) performance, I would first determine whether you are ready through some simple performance testing with something like jmeter.

It is easy to set up and get started and will give you early metrics whether you will handle an expected peak load.

However, given your time constraints, other steps to take would be to prepare static versions of content that will attract the highest attention (such as press releases, if your launch day). Also ensure that you are making the best use of client-side caching (one fewer request to your server can make all the difference). The web is already designed for extremely high scalability and effective use content caching is your best friend in these situations.

There is an excellent podcast on high scalability on software engineering radio on the design of the new Guardian website when things calm down.

Good luck on the launch.



Related Topics



Leave a reply



Submit