How to Display Only Files from Aws S3 Ls Command

How to display only files from aws s3 ls command?

You can't do this with just the aws command, but you can easily pipe it to another command to strip out the portion you don't want. You also need to remove the --human-readable flag to get output easier to work with, and the --summarize flag to remove the summary data at the end.

Try this:

aws s3 ls s3://mybucket --recursive | awk '{print $4}'

Edit: to take spaces in filenames into account:

aws s3 ls s3://mybucket --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//'

how to get list of files on aws bucket with a specific pattern

Sadly you can't do this with s3 ls. But you could possibly use Exclude and Include Filters along with --dryrun for commands that support filters. For example, for s3 cp:

aws s3 cp s3://my_bucket ./. --recursive --dryrun --exclude "*" --include "my*.txt"

--dryrun (boolean) Displays the operations that would be performed using the specified command without actually running them.

This should print out all the objects that would be normally copied without --dryrun option.

Since you have a large bucket, you can test it out on small bucket just to get the feel of the command as its output is different then from s3 ls.

AWS S3 list all files with specific content type

You would need to:

List the objects in the bucket
For each object, call aws s3api head-object --bucket xxx --key xxx

It will return:

{
    "AcceptRanges": "bytes",
    "LastModified": "2014-03-10T21:59:20+00:00",
    "ContentLength": 41603,
    "ETag": "\"eca134ebe408fdb1f3494d7d916bf027\"",
    "VersionId": "null",
    "ContentType": "image/jpeg",
    "ServerSideEncryption": "AES256",
    "Metadata": {}
}

You would need some shell-scripting skills to be able to do this with the AWS CLI. It would be easier to accomplish with a scripting language, such as Python:

import boto3

s3_resource = boto3.resource('s3')

for object in s3_resource.Bucket('BUCKETNAME').objects.all():
    print(object.key, object.get()['ContentType'])

Extract only file names from an Amazon S3 bucket

Combine awk and sed into one command, something like

aws s3 ls <bucket-address-directory-path> | sed -nr 's/.* ([^ ]*.csv)000.*/\1/p'

aws s3 ls <bucket-address-directory-path> | awk 'NF>3 { sub(/000$/,"", $4); print $4}'

Search S3 bucket for file extension and size

Here's a Python script that will count objects by extension and compute the total size by extension:

import boto3

s3_resource = boto3.resource('s3')

sizes = {}
quantity = {}

for object in s3_resource.Bucket('jstack-a').objects.all():
  if not object.key.endswith('/'):
    extension = object.key.split('.')[-1]
    sizes[extension] = sizes.get(extension, 0) + object.size
    quantity[extension] = quantity.get(extension, 0) + 1
    

for extension, size in sizes.items():
  print(extension, quantity[extension], size)

It goes a bit funny if there is an object without an extension.

How to Display Only Files from Aws S3 Ls Command