Enforce File Upload Size on Heroku

Enforce file upload size on Heroku?

Heroku limits the request entity to some small size (30MB), and also limits all request/response cycles to 30s. Both are hard rules and must be lived with.

Rails by default spools file uploads to disk and translates them to Tempfile instances before handing them off to your application.

How to Upload Large Files on Heroku (Particularly Videos)

Update:

OP here. I'm still not exactly sure why I was getting this particular 413 error, but I was able to come up with a solution that works using the s3_swf_upload gem. The implementation involves flash, which is less than ideal, but it was the only solution (out of 3 or 4 that I tried) that I could get working.

As Neil pointed out (thanks Neil!), the error I should have been getting is "H12 - Request timeout". And I did end up running into this error after repeated trials. The problem occurs when you try to upload large files to the heroku server from your controller (using a web dyno), because it takes too long for the server to respond to the post request.

The proper approach is to send the file directly to s3 without passing through heroku.

Here's a high-level overview of my approach:

Use the s3_swf_upload gem to supply a direct upload form to s3.
Detect when the file is done uploading with the javascript callback function provided in the gem.
Using javascript, send rails a post message to let your server know the file is done uploading.
The controller that responds to the javascript post does two things: (a) assigns an s3_key attribute to the video object (served up as a param in the form). (b) initiates a background task using the delayed_job gem.
The background task retrieves the file from s3. I used the aws-sdk gem to accomplish this, because it was already included in s3_swf_upload. Note that this is distinctly different from the aws-s3 gem (in fact they conflict with one another).
After the file has been retrieved from s3, I used the vimeo gem to upload it to vimeo (still in the background).

The implementation above works, but it isn't perfect. For files that are close to 500MB in size, you'll still run into R14 errors in your worker dynos. This occurs because heroku only allots 512MB of memory per dyno, so you can't load the entire file into memory at once. The way around this problem is to implement some sort of chunking in the final step, where you retrieve the file from s3 and upload it to vimeo piece by piece. I'm still working on this part, and I'd love to hear any suggestions you might have.

Hopefully this might help someone. Feel free to ask me any questions. Like I said, my solution isn't perfect so feel free to add your own answer if you think it could be better.

Large file upload to amazon s3 failing after 30 second limit set by heroku

After many months on this issue, i found a gem that works well, by uploading directly to amazon s3, without any complex flash, and javascript suff. I also integrates into carrierwave.
The gem is called Carrierwave_direct

Works without a problem, however if you are using rails 3.0.x checkout this page for a solution.

If you are using rails rails 3.1.x, you are all set to go.

Protecting Puma/rails against large form payload attacks (something like LimitRequestBody)?

It sounds like you're still choosing stack for this feature/app given that you're talking switching between nginx or apahce, and/or passenger or heroku.

The best way to combat this ahead of time is using client side validation of file sizes. Now obviously if you're worried about attack, it's easy for someone to bypass this. So another option is to upload your files to S3 from the client side and setup a callback system to your Rails app. This keeps the traffic off of your main web servers and allows for you to process only the files you deem "safe".

Finally, if you choose to have user's upload to your server, you've mentioned limits that nginx & apahce give you for file size, heroku has a 30MB limit and 30sec timeout for their systems. If you're seeing repeat large size upload offenders and need to throttle the number of requests and/or ban users, you'll want to use Rack::Attack. I've used the gem a ton. It's simple to work with and is effective for what you're talking about.

The next level up from something like this is blocking at the Network level which you'll never get with Heroku, so you'll have to roll your own servers; and if we're talking network level security and attack mitigation, my recommendation is to hire a system admin that knows how to handle that!

As an aside, I'm happy that you're thinking security from the beginning, but designing for someone abusing a file upload feels like premature optimization (obviously for varying degrees of what type of app you're building).

I want to deploy my text scraping program to Heroku, but the file it uses is stored on my PC

Here's my suggestion.

Host the text file on a site like pastebin as long as it doesn't contain any confidential information. This allows you to update it freely without needing to re-deploy your app each time you add to it.
Once you've uploaded/pasted the text into a "paste" & save it you'll be able to get the "raw" link that will return the content of the file when requested.

Use requests to fetch the file from your app & parse it however you need to.

import requests
resp = requests.get("https://pastebin.com/raw/LjcPg3UL")
# if all entries are on individual lines
mywords = [word for word in resp.iter_lines()]
# if comma-separated or otherwise
#mywords = resp.text.split(",")

Now you have all your content in a list to work with in your app.

Edit:
Since you want to accomplish this with larger files you could host the file on dropbox and follow the instructions from here to get the raw link. However, if you're dealing with that large of a file you're going to notice significant overhead. If the file is going to be that large, I'd suggest the added precaution of utilizing requests stream parameter (details), so the request line becomes

resp = requests.get("https://www.dropbox.com/s/FILE_ID/filename.extension?raw=1", stream=True)

This will read chunks of the file instead of reading the entire file at once, which will help cut down of memory consumption.

How to deal with large file uploads, Client - Node.js + Heroku - Cloudinary?

From Heroku:

Hi,

While there aren't any size restrictions on requests such as these, you're most likely going to run into the 30-second timeout problem.
The only way around the timeout is with long polling, as described here. I'm not aware of any drop-in implementations of this for file uploads, though.
I'm guessing this traffic is coming from an app or something that makes it impractical to implement Cloudinary's direct upload out of the box. Are there other reasons you couldn't adapt Cloudinary's direct upload solution? I'd like to get their input on this as well.

Thanks,

Chad

Can I host images in heroku? Or do I need S3?

The short answer: if you allow users or admins to upload images, you should not use Heroku's file system for this as the images will suddenly vanish.

As explained in the Heroku documentation:

Each dyno gets its own ephemeral filesystem, with a fresh copy of the most recently deployed code. During the dyno’s lifetime its running processes can use the filesystem as a temporary scratchpad, but no files that are written are visible to processes in any other dyno and any files written will be discarded the moment the dyno is stopped or restarted.

This means that user uploaded images on the Heroku filesystem are not only wiped out with every push, but also with every dyno restart, which occasionally happens (even if you would ping them frequently to prevent them going to sleep).

Once you start using a second web dyno, it will not be able to read the other dyno's filesystem, so then images would only be visible from one dyno. This would cause weird issues where users can sometimes see images and sometimes they don't.

That said, you can temporarily store images on the Heroku filesystem if you implement a pass-through file upload to an external file store.

Enforce File Upload Size on Heroku