Is There a Ruby Http Client Library with a Response Cache

Is there a Ruby http client library with a response cache?

You might want to check the list of "Ruby HTTP clients features" (archived version from January 2015) for a complete overview.

What are my options for caching fetched web pages?

So I've thought and I've thought, and looked at the source code for Mechanize and for VCR, and I've decided that I'm really just over-thinking the problem. The following works just fine for my needs. (I'm using DataMapper, but translating it into an ActiveRecord model would be straightforward):

class WebCache
  include DataMapper::Resource

  property :id, Serial
  property :serialized_key, Text
  property :serialized_value, Text
  property :created_at, DateTime
  property :updated_at, DateTime

  def with_db_cache(akey)
    serialized_key = YAML.dump(akey)
    if (r = self.all(:serialized_key => serialized_key)).count != 0
      # cache hit: return the de-serialized value
      YAML.load(r.first.serialized_value)
    else
      # cache miss: evaluate the block, serialize and cache the result
      yield(akey).tap {|avalue| 
        self.create(:serialized_key => serialized_key, 
                    :serialized_value => YAML.dump(avalue))
      }
    end
  end
end

Example usage:

def fetch(uri)
  WebCache.with_db_cache(uri) {|uri| 
    # arrive here only on cache miss
    Net::HTTP.get_response(URI(uri))
  }
end

commentary

I previously believed that a proper web-caching scheme would observe and honor header fields like Cache-Control, If-Modified-Since, etc, as well as automatically handle timeouts and other web pathology. But an examination of actual web pages made it clear that truly static data was frequently marked with short cache times. So it makes more sense to let the caller decide how long something should be cached and when a failing query should be retried.

At that point, the code became very simple.

Moral: don't over-think your problems.

ruby http response script sending large gzipped content

There are a number of issues I see with your code, some which are conceptual and some of which are technical, but without more information about the error you receive might be impossible to offer a correct response.

It is my initial thought that the issue is caused by the fact that you are opening Gzipped files without using the binary mode flag, so that the file reading stops ate the first EOF character and new line markers might be converted.

A few technical things to consider:

Your loop is infinite. You should really set up signal traps to allow you to exit the script (catching ^C, for example).
Zip files are usually binary files. You should use a binary mode to open the file, or use the IO.binread method if your loading the whole file up to memory.
Your loading the whole file unto the memory before sending it. That's great for small files, but it isn't the best approach for larger files. Loading 50MB up to the RAM for each client, while serving a 100 clients, means 5GB of RAM...

Considering the first two technical points, I would tweek the code a bit like so:

keep_running = true
trap('INT'){ keep_running = false ; raise ::SystemExit}

begin
    while(run) {
        Thread.start(server.accept) do |client|
            date = Time.now.strftime("%d-%m-%Y_%H-%M-%S")
            file = "#{date}_mt_dump.txt"
            puts date
            puts "Accepting connection"
            #client = server.accept
            #resp = "OKY|So long and thanks for all the fish!|OKY"
            ticket_id = "1235"

            partial_data = ""
            i = 1024
            firstrun = "yes"
            fd = File.open(file,'bw')
            puts "Attempting receive loop"

            puts "Ready to transfer contents to the client"
            f = File.open("output.txt.gz","br")
            puts "Opened file output.txt.gz; size: #{f.size}"
            resp = f.read(f.size)

            headers = ["HTTP/1.1 200 OK",
                 "Content-Encoding: gzip",
                 "Content-Type: text/xml;charset=UTF-8",
                 "Content-Length: #{f.size}\r\n\r\n"].join("\r\n")
            client.puts headers

            #puts all_data.join()
            fd.close unless fd == nil

            puts "Start data transfer"
            client.puts resp
            client.close
            puts "Closed connection"
            puts "\n"
        end
    }
rescue => e
    puts e.message
    puts e.backtrace
rescue SystemExit => e
    puts "exiting... please notice that existing threads will be brutally stoped, as we will not wait for them..."
end

As to my more general pointers:

Your code is opening a new thread per connection. While this is okay for a small load of concurrent connections, your script might grind to a halt if you have a lot of concurrent connections. The context-switching alone (moving between threads) could potentially create a DoS situation.
I recommend that you use a Reactor pattern, where you have a pool of threads. Another option is to fork a few processes listening to the same TCPSocket.
You don't read the data from the socket and you don't parse the HTTP request - this means that someone could potentially fill up the system buffer, which you never empty, by continuously sendings data.
It would be better if you read the information from the socket, or emptied it's buffer, as well as disconnected from any malformed of malicious connections.
Also, most browsers aren't too happy when the response comes in before the request...
You don't catch any exceptions nor print any error messages. This means that your script might throw an exception that will break everything apart. For instance, if your 'server' will reach the 'open file limit' for it's process, the accept method will throw an exception which will shut down the whole script, including existing connections.

I'm not sure why you aren't using one of the many HTTP servers available for Ruby - be it the builtin WEBrick (don't use for production) or one of the native Ruby community gems, such as Iodine.

Here's a short example using Iodine, which has an easy to utilize Http server written in Ruby (no need to compile anything):

require 'iodine/http'

# cache the file, since it's the only response ever sent
file_data = IO.binread "output.txt.gz"

Iodine.on_http do |request, response|
        begin
            # set any headers
            response['content-type'] = 'text/xml;charset=UTF-8'
            response['content-encoding'] = 'gzip'
            response << file_data
            true
        rescue => e
            Iodine.error e
            false
        end
    end
end

#if in irb:
exit

Or, if you insist on writing your own HTTP server, you can at least use a one of the available IO reactors, such as Iodine (I it wrote for Plezi), to help you handle the thread pool and IO management (you can also use EventMachine, but I don't it like so much - than again, I'm biased, as I wrote the Iodine Library):

require 'iodine'
require 'stringio'

class MiniServer < Iodine::Protocol

    # cache the file, since it's the only data sent,
    # and make it available to all the connections.
    def self.data
        @data ||= IO.binread 'output.txt.gz'
    end

    # The on_opne callback is called when a connection is established.
    # We'll use it to initialize the HTTP request's headers Hash.
    def on_open
     @headers = {}
    end

    # the on_message callback is called when data is sent from the client to the socket.
    def on_message input
        input = StringIO.new input
        l = nil
        headers = @headers # easy access
        # loop the lines and parse the HTTP request.
        while (l = input.gets)
            unless l.match /^[\r]?\n/
                if l.include? ':'
                    l = l.strip.downcase.split(':', 2)
                    headers[l[0]] = l[1]
                else
                    headers[:method], headers[:query], headers[:version] = l.strip.split(/[\s]+/, 3)
                    headers[:request_start] = Time.now
                end
                next
            end
            # keep the connection alive if the HTTP version is 1.1 or if the connection is requested to be kept alive
            keep_alive = (headers['connection'].to_s.match(/keep/i) || headers[:version].match(/1\.1/)) && true
            # refuse any file uploads or forms. make sure the request is a GET request
            return close if headers['content-length'] || headers['content-type'] || headers[:method].to_s.match(/get/i).nil?
            # all is well, send the file.
            write ["HTTP/1.1 200 OK",
                    "Connection: #{keep_alive ? 'keep-alive' : 'close'}",
                     "Content-Encoding: gzip",
                     "Content-Type: text/xml;charset=UTF-8",
                     "Content-Length: #{self.class.data.bytesize}\r\n\r\n"].join("\r\n")
            write self.class.data
            return close unless keep_alive

            # reset the headers, in case another request comes in
            headers.clear
        end
    end

end

Iodine.protocol = MiniServer
# # if running within a larget application, consider:
# Iodine.force_start!
# # Server starts automatically when the script ends.
# # on irb, use `exit`:
exit

Good Luck!

Save a response from API call to use in a test so I don't have to continuously repeat requests to API

If I understand correctly, your rails app is using an external api, like a google/fb/twitter api, this kind of stuff

Caching the views won't work, because it only caches the template, so it doesn't waste time rendering the view again, and it validates that the cache is warm by hashing the data, which the code will still hit the api to verify that the hashes still match

For you the best way is to use a class that does all the api calls, and cache them in rails cache and give that cache a timeout period, because you don't want your cache to be too stale, but in the same time you will sacrifice some accuracy for some money ( like only do a single call every 5, 15, 30 mins, which ever you pick )

Here's a sample of what I have in mind, but you should modify it to match your needs

module ApiWrapper
  class << self
    def some_method(some_key) # if keys are needed, like an id or something
      Rails.cache.fetch("some_method/#{some_key}", expires_in: 5.minutes) do
        # assuming ApiLibrary is the external library handler
        ApiLibrary.call_external_library(some_key)
      end
    end
  end
end

Then in your code, call that wrapper, it will only contact the external api if the stored value in the cache is expired.

The call will be something like this

# assuming 5 is the id or value you want to fetch from the api
ApiWrapper.some_method(5)

You can read more about caching methods from the rails guide for caching

Update:

I just thought of another way, for your testing (like rspec tests) you could stub the api calls, and this way you'll save the whole api call, unless you are testing the api it self, using to the same api library I wrote above, we can stub the ApiLibrary it self

allow(ApiLibrary).to receive(:some_method).and_return({ data: 'some fake data' })

PS: the hash key data is part of the return, it's the whole hash not just the string.

Get response headers from Ruby HTTP request

The response object actually contains the headers.

See "Net::HTTPResponse" for more infomation.

You can do:

response['Cache-Control']

You can also call each_header or each on the response object to iterate through the headers.

If you really want the headers outside of the response object, call response.to_hash

How do I read the header without reading the rest of the HTTP resource?

Depending on what you actually want to accomplish:

Don't care about the body at all:

Use HEAD instead of GET:

uri = URI('http://example.com')
http = Net::HTTP.start uri.host, uri.port 
request = Net::HTTP::Head.new uri
response = http.request request
response.body
# => nil

Load the body conditionally

Using blocks with net/http will let you hook before the body is actually loaded:

uri = URI('http://example.com')
res = nil

Net::HTTP.start(uri.host, uri.port) do |http|
  request = Net::HTTP::Get.new uri

  http.request request do |response|
    res = response
    break
  end
end

res
# => #<Net::HTTPOK 200 OK readbody=false> 
res['Content-Type']
# => "text/html; charset=UTF-8"

How may I detect requests generated by other requests using Ruby and Puma?

Maybe you can parametrize your resource, like my-image.jpg?dir='dir1' and parse the params on your middleware. Similar approach is done by Rails to cache the assets. Also you can use some crypt function that encrypt and decrypt the info in the params, like image.jpt?info=HughYF65fFj7t... and then you decrypt the info in your midleware and use the info you sent.

Is There a Ruby Http Client Library with a Response Cache