Scripting Http More Effeciently

Why don't we make HTTP requests more efficient?

Because HTTP was designed as a way to retrieve documents, i.e. pages of text. Only later this was enriched with images, scripts and other external resources.

Not every request for a document needs all related resources, for example (text-only) crawlers or browsers who have all related resources cached already - they just want to retrieve the document itself.

As for inlining the external resources, yes, that can be done using <script> and <style> elements, and using inline image data, see How to display Base64 images in HTML?.

Using HTTP/2 a compatible browser/server pair one can also utilize server push, doing exactly what you expect the older HTTP versions to support. See HTTP 2 will support server push, what does this mean?.

Also, as technology evolves, things can get added to a protocol - if that protocol is open to backwards-compatible changes. This one can't be easily hacked into HTTP/1.1 in a way that would keep older browsers and servers working.

What is the fastest way to send 100,000 HTTP requests in Python?

Twistedless solution:

from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue

concurrent = 200

def doWork():
while True:
url = q.get()
status, url = getStatus(url)
doSomethingWithResult(status, url)
q.task_done()

def getStatus(ourl):
try:
url = urlparse(ourl)
conn = httplib.HTTPConnection(url.netloc)
conn.request("HEAD", url.path)
res = conn.getresponse()
return res.status, ourl
except:
return "error", ourl

def doSomethingWithResult(status, url):
print status, url

q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=doWork)
t.daemon = True
t.start()
try:
for url in open('urllist.txt'):
q.put(url.strip())
q.join()
except KeyboardInterrupt:
sys.exit(1)

This one is slighty faster than the twisted solution and uses less CPU.

What are the benefits of concatenating all Javascript files into one before sending it to client?

Combining multiple JS files into one file has the following benefits:

  1. Browsers can download a single file more efficiently and faster than downloading multiple smaller files. One http connection downloading the file is usually faster than many http connections downloading smaller files.
  2. The browser has a limit on how many simultaneous connections it will make to the same domain and, if it reaches that limit, some connections have to then wait until others finish. This causes delays in download. Downloading fewer files make it less likely to hit this limit. This limits applies to all connections to a domain (download of JS files, download of CSS files, download of frames, ajax calls, etc...).
  3. Server scalability can be increased because each page download requires fewer http connections to serve the content.
  4. There are cases where version control and the interaction between version upgrades and browsing JS file caching can be simpler with one larger JS file. When all your JS files are concatenated, you can assign a single version number to that combined JS file (like jQuery does with its versions). Then, any change to the JS anywhere causes a bump in the version number for the master combined file. Since a given browser gets the entire combined file all or nothing, there is never an opportunity for a browser to accidentally get one version of one file fresh from the server and another version of another file from a stale browser cache. Also, maintaining one master version number is a lot simpler than versioning lots of smaller files.

Minifying a JS file makes it smaller to download and parse which increases download performance.

If you are both combining multiple files AND minifying, the minifying can be more effective. When minifying multiple small files separately, you cannot minify variable names that are shared between the different files - they must retain their original names. But, if you combine all the JS files and then minify, you can minify all symbols that are shared among the different JS files (as long as they aren't shared externally).


Obviously, there are some limits here and things don't get arbitrarily better if the whole world puts their JS into one file. Some things to think about when deciding what to package together into one file:

  1. You don't want a large group of your pages to be parsing and executing a large block of code that they will not use. This is obviously a tradeoff because if the code is being effectively cached, then it's not so much a download issue, but rather just a runtime efficiency issue. Each use will have to decide how to draw that tradeoff line.

  2. You may not want to package code that is revised fairly regularly with code that hardly ever changes because this degrades the efficiency of browser caching if the large combined JS is always changing.

  3. In a team environment with multiple projects sharing code, it is very important to think about packaging things into combined and minified chunks that work for the largest number of projects sharing the code. You generally want to optimize the packaging for the broader needs, not just for a single project.

  4. Mobile access often has smaller caches, slower CPUs and slower connections so its important to consider the needs of your most accessed mobile pages in how you package things too.


And some downsides to combining and minimizing:

  1. Directly debugging the minimized site can be quite difficult as many symbols have lost their meaningful names. I've found it often required to have a way of serving an unminimized version of the site (or at least some files) for debugging/troubleshooting reasons.

  2. Error messages in browsers will refer to the combined/minimized file, not to the actual source files so it is can be more difficult to track down which code is causing a given browser error that has been reported.

  3. The combined and minimized site has to be tested to make sure no issues were caused by these extra steps.

Asynchronous HTTP requests in PHP

Yes, depending on the traffic of your site, spawning a separate PHP process for running a script could be devastating. It would be more efficient to use shell_exec() to start a background process that saves the output to a filename you already know, but even this could be resource intensive.

You could also have a request queue stored in a database. A single, separate background process would pull the job, execute it, and save the output, possibly setting a flag in the DB that your web process could check.

If you're going to use the DB queue approach, use curl_multi* class of functions to send all queued requests at once. This will limit the execution time of each iteration in your background process to the longest request time.



Related Topics



Leave a reply



Submit