Best Practices to Create and Download a Huge Zip (From Several Blobs) in a Webapp

Best Practices to Create and Download a huge ZIP (from several BLOBs) in a WebApp

For large content that won't fit in memory at once, stream the content from the database to the response.

This kind of thing is actually pretty simple. You don't need AJAX or websockets, it's possible to stream large file downloads through a simple link that the user clicks on. And modern browsers have decent download managers with their own progress bars - why reinvent the wheel?

If writing a servlet from scratch for this, get access to the database BLOB, getting its input stream and copy content through to the HTTP response output stream. If you have Apache Commons IO library, you can use IOUtils.copy(), otherwise you can do this yourself.

Creating a ZIP file on the fly can be done with a ZipOutputStream. Create one of these over the response output stream (from the servlet or whatever your framework gives you), then get each BLOB from the database, using putNextEntry() first and then streaming each BLOB as described before.

Potential Pitfalls/Issues:

  • Depending on the download size and network speed, the request might take a lot of time to complete. Firewalls, etc. can get in the way of this and terminate the request early.
  • Hopefully your users are on a decent corporate network when requesting these files. It would be far worse over remote/dodgey/mobile connections (if it drops out after downloading 1.9G of 2.0G, users have to start again).
  • It can put a bit of load on your server, especially compressing huge ZIP files. It might be worth turning compression down/off when creating the ZipOutputStream if this is a problem.
  • ZIP files over 2GB (or is that 4 GB) might have issues with some ZIP programs. I think the latest Java 7 uses ZIP64 extensions, so this version of Java will write the huge ZIP correctly but will the clients have programs that support the large zip files? I've definitely run into issues with these before, especially on old Solaris servers

Download multiple files with a single action

HTTP does not support more than one file download at once.

There are two solutions:

  • Open x amount of windows to initiate the file downloads (this would be done with JavaScript)
  • preferred solution create a script to zip the files

Generating ZIP files in azure blob storage

There are few ways you can do this **from azure-batch only point of view**: (for the initial part user code should own whatever zip api they use to zip their files but once it is in blob and user want to use in the nodes then there are options mentioned below.)

For initial part of your question I found this which could come handy: https://microsoft.github.io/AzureTipsAndTricks/blog/tip141.html (but this is mainly from idea sake and you will know better + need to design you solution space accordingly)

In option 1 and 3 below you need to make sure you user code handle the unzip or unpacking the zip file. Option 2 is the batch built-in feature for *.zip file both at pool and task level.

  • Option 1: You could have your *rar or *zip file added as azure batch resource files and then unzip them at the start task level, once resource file is downloaded. Azure Batch Pool Start up task to download resource file from Blob FileShare

  • Option 2: The best opiton if you have zip but not rar file in the play is this feature named Azure batch applicaiton package link here : https://learn.microsoft.com/en-us/azure/batch/batch-application-packages

The application packages feature of Azure Batch provides easy
management of task applications and their deployment to the compute
nodes in your pool. With application packages, you can upload and
manage multiple versions of the applications your tasks run, including
their supporting files. You can then automatically deploy one or more
of these applications to the compute nodes in your pool.

  • https://learn.microsoft.com/en-us/azure/batch/batch-application-packages#application-packages

An application package is a .zip file that contains the application binaries and supporting files that are required for your
tasks to run the application. Each application package represents a
specific version of the application.

  • With regards to the size: refer to the max allowed in blob link in the document above.

  • Option 3: (Not sure if this will fit your scenario) Long shot for your specific scenario but you could also mount virtual blob to the drive at join pool via mount feature in azure batch and you need to write code at start task or some thing to unzip from the mounted location.

Hope this helps :)

best way to download large files from azure cloud storage

Your assumption is correct, if you want to use the ActionResult you would need to download the file to the web role first and then stream it down to the client. If you can you want to avoid this particularly with large files and leave it up to Azure Storage because then Microsoft has to worry about dealing with the request, you don't have to pay for more web roles if you get lots of traffic.

This works well if all of the files you're hosting are public, but gets a little trickier if you want to secure the files (look into shared access signatures if that it what you want to do).

Have you tried setting the content type on the blob? Depending on how you've uploaded the files to blob storage they may not be set. If you're uploading the blobs through your own code you can access this through CloudBlob.Attributes.Properties.ContentType (from MSDN)

How to download a zip file in a struts2 application

You can't write in the client File System like that; in your case, your server is in your machine, but don't get fooled, it's a server path, not a client one. You need to write on the response.

You can't use both Struts2 result and writing in the OutputStream together: when manually forging the response, you must bypass the framework convention, and return the result by yourself. The correct result for this case is is Action.NONE:

<action name="Download" class="com.cdac.action.DownloadAction" />
public String execute(){
/*
do your stuff
*/
return NONE;
}

You're lucky, here is a kick off example I've written long time ago, explaining the whole thing (including the need to deal with duplicate filenames in the same ZIP).

Also try using Content-Length properly (to let the browser draw a realistic progress-bar).

Create Zip archive from multiple in memory files in C#

Use ZipEntry and PutNextEntry() for this. The following shows how to do it for a file, but for an in-memory object just use a MemoryStream

FileStream fZip = File.Create(compressedOutputFile);
ZipOutputStream zipOStream = new ZipOutputStream(fZip);
foreach (FileInfo fi in allfiles)
{
ZipEntry entry = new ZipEntry((fi.Name));
zipOStream.PutNextEntry(entry);
FileStream fs = File.OpenRead(fi.FullName);
try
{
byte[] transferBuffer[1024];
do
{
bytesRead = fs.Read(transferBuffer, 0, transferBuffer.Length);
zipOStream.Write(transferBuffer, 0, bytesRead);
}
while (bytesRead > 0);
}
finally
{
fs.Close();
}
}
zipOStream.Finish();
zipOStream.Close();

Java multithreaded file downloading performance

To answer my own questions:

  1. The increased CPU usage was due to a while() {} loop that was waiting for the threads to finish. As it turns out, awaitTermination is a much better alternative to wait for an Executor to finish :)
  2. (And 3 and 4) This seems to be the nature of the beast; in the end I achieved what I wanted to do by using careful synchronization of the different threads that each download a chunk of data (well, in particular the writes of these chunks back to disk).

Can I do direct streaming of ZipFile in vaadin?

The Java JDK and Apache commons-compress don't let you stream ZIP archives lazily, so I implemented a Java ZIP library [1] to handle that. The current limitation is it doesn't support ZIP64 extensions, it means it can't compress files bigger than 4 GiB and can't produce archives bigger than 4 GiB. I'm working on that.

[1] https://github.com/tsabirgaliev/zip



Related Topics



Leave a reply



Submit