How to display only files from aws s3 ls command?
You can't do this with just the aws
command, but you can easily pipe it to another command to strip out the portion you don't want. You also need to remove the --human-readable
flag to get output easier to work with, and the --summarize
flag to remove the summary data at the end.
Try this:
aws s3 ls s3://mybucket --recursive | awk '{print $4}'
Edit: to take spaces in filenames into account:
aws s3 ls s3://mybucket --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//'
how to get list of files on aws bucket with a specific pattern
Sadly you can't do this with s3 ls
. But you could possibly use Exclude and Include Filters along with --dryrun for commands that support filters. For example, for s3 cp
:
aws s3 cp s3://my_bucket ./. --recursive --dryrun --exclude "*" --include "my*.txt"
--dryrun (boolean) Displays the operations that would be performed using the specified command without actually running them.
This should print out all the objects that would be normally copied without --dryrun
option.
Since you have a large bucket, you can test it out on small bucket just to get the feel of the command as its output is different then from s3 ls
.
AWS S3 list all files with specific content type
You would need to:
- List the objects in the bucket
- For each object, call
aws s3api head-object --bucket xxx --key xxx
It will return:
{
"AcceptRanges": "bytes",
"LastModified": "2014-03-10T21:59:20+00:00",
"ContentLength": 41603,
"ETag": "\"eca134ebe408fdb1f3494d7d916bf027\"",
"VersionId": "null",
"ContentType": "image/jpeg",
"ServerSideEncryption": "AES256",
"Metadata": {}
}
You would need some shell-scripting skills to be able to do this with the AWS CLI. It would be easier to accomplish with a scripting language, such as Python:
import boto3
s3_resource = boto3.resource('s3')
for object in s3_resource.Bucket('BUCKETNAME').objects.all():
print(object.key, object.get()['ContentType'])
Extract only file names from an Amazon S3 bucket
Combine awk
and sed
into one command, something like
aws s3 ls <bucket-address-directory-path> | sed -nr 's/.* ([^ ]*.csv)000.*/\1/p'
or
aws s3 ls <bucket-address-directory-path> | awk 'NF>3 { sub(/000$/,"", $4); print $4}'
Search S3 bucket for file extension and size
Here's a Python script that will count objects by extension and compute the total size by extension:
import boto3
s3_resource = boto3.resource('s3')
sizes = {}
quantity = {}
for object in s3_resource.Bucket('jstack-a').objects.all():
if not object.key.endswith('/'):
extension = object.key.split('.')[-1]
sizes[extension] = sizes.get(extension, 0) + object.size
quantity[extension] = quantity.get(extension, 0) + 1
for extension, size in sizes.items():
print(extension, quantity[extension], size)
It goes a bit funny if there is an object without an extension.
Related Topics
How to Increase the /Proc/Pid/Cmdline 4096 Byte Limit
Does Cron Expression in Unix/Linux Allow Specifying Exact Start and End Dates
Accessing the Gpio (Of a Raspberry Pi) Without ''Sudo''
Force Gnu Linker to Generate 32 Bit Elf Executables
What Is Terminal Escape Sequence for Ctrl + Arrow (Left, Right,...) in Term=Linux
Is There Any Shortcut to Reference the Path of the First Argument in a Mv Command
How to Respond to Prompts in a Linux Bash Script Automatically
Disabling Apache Logging to Access.Log
X86 Memory Access Segmentation Fault
Possible Values for 'Uname -M'
Does Linking an '-Lpthread' Changes Application Behaviour? (Linux, Glibc)
How to Find the Main Function's Entry Point of Elf Executable File Without Any Symbolic Information
How to Compile Glibc 32Bit on an X86_64 MAChine
Webpack --Watch Exits After Building Once
Multiple Option Arguments Using Getopts (Bash)
Linux - Without Hardware Soundcard, Capture Audio Playback, and Record It to File