GZIP Compression on static Amazon S3 files
Files should be compressed before being uploaded to Amazon S3.
For some examples, see:
- Serving Compressed (gzipped) Static Files from Amazon S3 or Cloudfront
- How to: Gzip compression of CSS and JS files on S3 with s3cmd
Running a COPY command to load gzip-ed data to Redshift in S3
One of your gzipped files is not properly formed. GZip includes the compression "dictionary" at the end of the file and it can't be expanded without it.
If the file does not get fully written, e.g., you run out of disk space, then you get the error you're seeing when you attempt to load it into Redshift.
Speaking from experience… ;-)
How can I pipe a tar compression operation to aws s3 cp?
when using split
you can use the env variable $FILE
to get the generated file name.
See split man page:
--filter=COMMAND
write to shell COMMAND; file name is $FILE
For your use case you could use something like the following:
--filter 'aws s3 cp - s3://backups/backup.tgz.part$FILE'
(the single quotes are needed, otherwise the environment variable substitution will happen immediately)
Which will generate the following file names on aws:
backup.tgz.partx0000
backup.tgz.partx0001
backup.tgz.partx0002
...
Full example:
tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 --filter 'aws s3 cp - s3://backups/backup.tgz.part$FILE' -
Decompress a zip file in AWS Glue
Glue can do decompression. But it wouldn't be optimal. As gzip format is not splittable (that mean only one executor will work with it). More info about that here.
You can try to decompression by lambda and invoke glue crawler for new folder.
Compress file on S3
S3 does not support stream compression nor is it possible to compress the uploaded file remotely.
If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html
If you need this more frequently
Serving gzipped CSS and JavaScript from Amazon CloudFront via S3
How can I compress / gzip my mimified .js and .css files before publishing to AWS S3?
You can add to your upload script the needed code to gzip compress the files.
Some example code could be this:
function Gzip-FileSimple
{
param
(
[String]$inFile = $(throw "Gzip-File: No filename specified"),
[String]$outFile = $($inFile + ".gz"),
[switch]$delete # Delete the original file
)
trap
{
Write-Host "Received an exception: $_. Exiting."
break
}
if (! (Test-Path $inFile))
{
"Input file $inFile does not exist."
exit 1
}
Write-Host "Compressing $inFile to $outFile."
$input = New-Object System.IO.FileStream $inFile, ([IO.FileMode]::Open), ([IO.FileAccess]::Read), ([IO.FileShare]::Read)
$buffer = New-Object byte[]($input.Length)
$byteCount = $input.Read($buffer, 0, $input.Length)
if ($byteCount -ne $input.Length)
{
$input.Close()
Write-Host "Failure reading $inFile."
exit 2
}
$input.Close()
$output = New-Object System.IO.FileStream $outFile, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
$gzipStream.Write($buffer, 0, $buffer.Length)
$gzipStream.Close()
$output.Close()
if ($delete)
{
Remove-Item $inFile
}
}
From this site: Gzip creation in Powershell
Related Topics
Split File into Multiple File Using Awk, But in Date Format
Print Differences of File1 to File2 Without Deleting Anything from File2
Using Ssh to Run a Cleartool Command with Agruments on Remote a Linux MAChine
Bash Script to Remove Directories Based on Modified File Date
Phusion Passenger Nginx Module Installer V3.0.17 Issue on Debian 6.0.5 Amd64 Due to Broken Package
How to Route Webcam Video to Virtual Video Device on Linux (Via Opencv)
Why Count Differs Between Ls and Ls -L Linux Command
Why Does Bash Behave Differently, When It Is Called as Sh
How to Add Text After Last Pattern Match Using Ed
How to Mount a Directory Inside a Docker Container on Linux Host
Script to Get User That Has Process with Most Memory Usage
Simple Way to Get Filesize in X86 Assembly Language
Create Tar with Same Name as File But in Different Folder
How to Create a Bash Variable Like $Random
Using $Origin to Specify the Interpreter in Elf Binaries Isn't Working