Best File System for Serving 1Gb Files Using Nginx, Under Moderate Write, Read Performance-Wise

Normal disk read/write value for a ubuntu server

How would your determine a max value to set an alarm on the I/O activity of a ubuntu/linux server hosting upto 4 sites running apache, mysql and upto 4 tomcats?

The value at which the expected problems you're alarming about are more expensive than the price of you having to pay attention to the alarm.

What number is that? That depends on a lot of things, including:

Which problems are you trying to avoid?

Do you worry about performance? If so, do you worry more about latency or throughput? How's the tradeoff between interactive and batch-job performance?

Do you worry about wear-and-tear and the lifespan of the media? Do you worry about how often you have to restore backups?

Do you worry about the price of the disks? How much value is better disks going to bring to your operation?

How much can the writes be deferred? How much reading is preventable through caching? How lax can you be with respect to independence (the I in ACID)?

If you really want the best disk for your situation, these are some of the questions you probably want to ask yourself. If I were in your situation, I'd probably pick a random disk from the low to low-mid price range, and then see how it works out. Then you'll have experience to learn from so you know what to do differently next time (if anything) and it's not going to cost you much.

What is the best buffer size when using BinaryReader to read big files ( 1 GB)?

"Sequential File Programming Patterns and Performance with .NET" is a great article in I/O performance improvement.

In page 8 of this PDF file, it shows that the bandwidth for buffer size bigger than eight bytes, is constant. Consider that the article has been written in 2004 and the hard disk drive is "Maxtor 250 GB 7200 RPM SATA disk" and the result should be different by latest I/O technologies.

If you are looking for the best performance take a look at pinvoke.net or the page 9 of the PDF file, the un-buffered file performance measurements shows better results:

In un-buffered I/O, the disk data moves directly between the
application’s address space and the device without any intermediate
copying.

Summary

  • For single disks, use the defaults of the .NET framework – they deliver excellent performance for sequential file access.
  • Pre-allocate large sequential files (using the SetLength() method) when the file is created. This typically improves speed by about 13% when compared to a fragmented file.
  • At least for now, disk arrays require un-buffered I/O to achieve the highest performance - buffered I/O can be eight times slower than un-buffered I/O. We expect this problem will be addressed in later releases of the .NET framework.
  • If you do your own buffering, use large request sizes (64 KB is a good place to start). Using the .NET framework, a single processor can read and write a disk array at over 800 Mbytes/s using un-buffered I/O.

performance: is it better to read all files once, or use b::fs functions over and over again?

You can safely assume that the operating system will cache the directory contents anyway, so that access through file system APIs will come down to memory operations.

So the answer to your question "is it faster?" is likely "No, not measurably".

OTOH, consider that a directories contents can change over time, even in very short time. Thus, reading directory content eagerly or lazyily is not so much a question of speed, but of semantics. It may be that you find that you must/must not read the entire directory, depending on what you are doing.

Does Django scale?

  1. "What are the largest sites built on Django today?"

    There isn't any single place that collects information about traffic on Django built sites, so I'll have to take a stab at it using data from various locations. First, we have a list of Django sites on the front page of the main Django project page and then a list of Django built sites at djangosites.org. Going through the lists and picking some that I know have decent traffic we see:

    • Instagram: What Powers Instagram: Hundreds of Instances, Dozens of Technologies.

    • Pinterest: Alexa rank 37 (21.4.2015) and 70 Million users in 2013

    • Bitbucket: 200TB of Code and 2.500.000 Users

    • Disqus: Serving 400 million people with Python.

    • curse.com: 600k daily visits.

    • tabblo.com: 44k daily visits, see Ned Batchelder's posts Infrastructure for modern web sites.

    • chesspark.com: Alexa rank about 179k.

    • pownce.com (no longer active): alexa rank about 65k.
      Mike Malone of Pownce, in his EuroDjangoCon presentation on Scaling Django Web Apps says "hundreds of hits per second". This is a very good presentation on how to scale Django, and makes some good points including (current) shortcomings in Django scalability.

    • HP had a site built with Django 1.5: ePrint center. However, as for novemer/2015 the entire website was migrated and this link is just a redirect. This website was a world-wide service attending subscription to Instant Ink and related services HP offered (*).

  2. "Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?"

    Yes, see above.

  3. "Could a site like Stack Overflow run on Django?"

    My gut feeling is yes but, as others answered and Mike Malone mentions in his presentation, database design is critical. Strong proof might also be found at www.cnprog.com if we can find any reliable traffic stats. Anyway, it's not just something that will happen by throwing together a bunch of Django models :)

There are, of course, many more sites and bloggers of interest, but I have got to stop somewhere!


Blog post about Using Django to build high-traffic site michaelmoore.com described as a top 10,000 website. Quantcast stats and compete.com stats.


(*) The author of the edit, including such reference, used to work as outsourced developer in that project.



Related Topics



Leave a reply



Submit