Using Gzip to Compress Files to Transfer with Aws Command

GZIP Compression on static Amazon S3 files

Files should be compressed before being uploaded to Amazon S3.

For some examples, see:

  • Serving Compressed (gzipped) Static Files from Amazon S3 or Cloudfront
  • How to: Gzip compression of CSS and JS files on S3 with s3cmd

Running a COPY command to load gzip-ed data to Redshift in S3

One of your gzipped files is not properly formed. GZip includes the compression "dictionary" at the end of the file and it can't be expanded without it.

If the file does not get fully written, e.g., you run out of disk space, then you get the error you're seeing when you attempt to load it into Redshift.

Speaking from experience… ;-)

How can I pipe a tar compression operation to aws s3 cp?

when using split you can use the env variable $FILE to get the generated file name.
See split man page:

--filter=COMMAND
write to shell COMMAND; file name is $FILE

For your use case you could use something like the following:

--filter 'aws s3 cp - s3://backups/backup.tgz.part$FILE'

(the single quotes are needed, otherwise the environment variable substitution will happen immediately)

Which will generate the following file names on aws:

backup.tgz.partx0000
backup.tgz.partx0001
backup.tgz.partx0002
...

Full example:

tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 --filter 'aws s3 cp - s3://backups/backup.tgz.part$FILE' -

Decompress a zip file in AWS Glue

Glue can do decompression. But it wouldn't be optimal. As gzip format is not splittable (that mean only one executor will work with it). More info about that here.

You can try to decompression by lambda and invoke glue crawler for new folder.

Compress file on S3

S3 does not support stream compression nor is it possible to compress the uploaded file remotely.

If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html

If you need this more frequently

Serving gzipped CSS and JavaScript from Amazon CloudFront via S3

How can I compress / gzip my mimified .js and .css files before publishing to AWS S3?

You can add to your upload script the needed code to gzip compress the files.

Some example code could be this:

function Gzip-FileSimple
{
param
(
[String]$inFile = $(throw "Gzip-File: No filename specified"),
[String]$outFile = $($inFile + ".gz"),
[switch]$delete # Delete the original file
)

trap
{
Write-Host "Received an exception: $_. Exiting."
break
}

if (! (Test-Path $inFile))
{
"Input file $inFile does not exist."
exit 1
}

Write-Host "Compressing $inFile to $outFile."

$input = New-Object System.IO.FileStream $inFile, ([IO.FileMode]::Open), ([IO.FileAccess]::Read), ([IO.FileShare]::Read)

$buffer = New-Object byte[]($input.Length)
$byteCount = $input.Read($buffer, 0, $input.Length)

if ($byteCount -ne $input.Length)
{
$input.Close()
Write-Host "Failure reading $inFile."
exit 2
}
$input.Close()

$output = New-Object System.IO.FileStream $outFile, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)

$gzipStream.Write($buffer, 0, $buffer.Length)
$gzipStream.Close()

$output.Close()

if ($delete)
{
Remove-Item $inFile
}
}

From this site: Gzip creation in Powershell



Related Topics



Leave a reply



Submit