Amazon S3 listing directories
Where you have keys that have no content S3 considers them "Common Prefixes":
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ObjectListing.html#getCommonPrefixes%28%29
public List getCommonPrefixes()
Gets the common prefixes included in this object listing. Common
prefixes are only present if a delimiter was specified in the original
request.Each common prefix represents a set of keys in the S3 bucket that have
been condensed and omitted from the object summary results. This
allows applications to organize and browse their keys hierarchically,
similar to how a file system organizes files into directories.For example, consider a bucket that contains the following keys:
"foo/bar/baz"
"foo/bar/bash"
"foo/bar/bang"
"foo/boo"If calling listObjects with the prefix="foo/" and the delimiter="/" on
this bucket, the returned S3ObjectListing will contain one entry in
the common prefixes list ("foo/bar/") and none of the keys beginning
with that common prefix will be included in the object summaries list.Returns: The list of common prefixes included in this object listing,
which might be an empty list if no common prefixes were found.
Retrieving subfolders names in S3 bucket from boto3
S3 is an object storage, it doesn't have real directory structure. The "/" is rather cosmetic.
One reason that people want to have a directory structure, because they can maintain/prune/add a tree to the application. For S3, you treat such structure as sort of index or search tag.
To manipulate object in S3, you need boto3.client or boto3.resource, e.g.
To list all object
import boto3
s3 = boto3.client("s3")
all_objects = s3.list_objects(Bucket = 'bucket-name')
http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.list_objects
In fact, if the s3 object name is stored using '/' separator. The more recent version of list_objects (list_objects_v2) allows you to limit the response to keys that begin with the specified prefix.
To limit the items to items under certain sub-folders:
import boto3
s3 = boto3.client("s3")
response = s3.list_objects_v2(
Bucket=BUCKET,
Prefix ='DIR1/DIR2',
MaxKeys=100 )
Documentation
Another option is using python os.path function to extract the folder prefix. Problem is that this will require listing objects from undesired directories.
import os
s3_key = 'first-level/1456753904534/part-00014'
filename = os.path.basename(s3_key)
foldername = os.path.dirname(s3_key)
# if you are not using conventional delimiter like '#'
s3_key = 'first-level#1456753904534#part-00014'
filename = s3_key.split("#")[-1]
A reminder about boto3 : boto3.resource is a nice high level API. There are pros and cons using boto3.client vs boto3.resource. If you develop internal shared library, using boto3.resource will give you a blackbox layer over the resources used.
S3: get all files at a specific directory level
Set the delimiter
argument to /
in your request. See GET Bucket (List Objects) documentation.
Listing files in a specific folder of a AWS S3 bucket
Everything in S3 is an object. To you, it may be files and folders. But to S3, they're just objects.
Objects that end with the delimiter (/
in most cases) are usually perceived as a folder, but it's not always the case. It depends on the application. Again, in your case, you're interpretting it as a folder. S3 is not. It's just another object.
In your case above, the object users/<user-id>/contacts/<contact-id>/
exists in S3 as a distinct object, but the object users/<user-id>/
does not. That's the difference in your responses. Why they're like that, we cannot tell you, but someone made the object in one case, and didn't in the other. You don't see it in the AWS Management Console because the console is interpreting it as a folder and hiding it from you.
Since S3 just sees these things as objects, it won't "exclude" certain things for you. It's up to the client to deal with the objects as they should be dealt with.
Your Solution
Since you're the one that doesn't want the folder objects, you can exclude it yourself by checking the last character for a /
. If it is, then ignore the object from the response.
How to get a list of all folders that list in a specific s3 location using spark in databricks?
just use dbutils.fs.ls(ls_path)
Amazon S3: How to get a list of folders in the bucket?
For the sake of example, assume I have a bucket in the USEast1
region called MyBucketName
, with the following keys:
temp/
temp/foobar.txt
temp/txt/
temp/txt/test1.txt
temp/txt/test2.txt
temp2/
Working with folders can be confusing because S3 does not natively support a hierarchy structure -- rather, these are simply keys like any other S3 object. Folders are simply an abstraction available in the S3 web console to make it easier to navigate a bucket. So when we're working programatically, we want to find keys matching the dimensions of a 'folder' (delimiter '/', size = 0) because they will likely be 'folders' as presented to us by the S3 console.
Note for both examples: I'm using the AWSSDK.S3 version 3.1 NuGet package.
Example 1: All folders in a bucket
This code is modified from this basic example in the S3 documentation to list all keys in a bucket. The example below will identify all keys that end with the delimiter character /
, and are also empty.
IAmazonS3 client;
using (client = new AmazonS3Client(Amazon.RegionEndpoint.USEast1))
{
// Build your request to list objects in the bucket
ListObjectsRequest request = new ListObjectsRequest
{
BucketName = "MyBucketName"
};
do
{
// Build your call out to S3 and store the response
ListObjectsResponse response = client.ListObjects(request);
// Filter through the response to find keys that:
// - end with the delimiter character '/'
// - are empty.
IEnumerable<S3Object> folders = response.S3Objects.Where(x =>
x.Key.EndsWith(@"/") && x.Size == 0);
// Do something with your output keys. For this example, we write to the console.
folders.ToList().ForEach(x => System.Console.WriteLine(x.Key));
// If the response is truncated, we'll make another request
// and pull the next batch of keys
if (response.IsTruncated)
{
request.Marker = response.NextMarker;
}
else
{
request = null;
}
} while (request != null);
}
Expected output to console:
temp/
temp/txt/
temp2/
Example 2: Folders matching a specified prefix
You could further limit this to only retrieve folders matching a specified Prefix
by setting the Prefix
property on ListObjectsRequest.
ListObjectsRequest request = new ListObjectsRequest
{
BucketName = "MyBucketName",
Prefix = "temp/"
};
When applied to Example 1, we would expect the following output:
temp/
temp/txt/
Further reading:
- S3 Documentation - Working With Folders
- .NET SDK Documentation - ListObjects
How do I get the top-level directories of a bucket in S3?
I see there's a CommonPrefixes
property on ListObjectsResponse
.
using (var client = new AmazonS3Client())
{
var listObjectsResponse = client.ListObjects(new ListObjectsRequest
{
BucketName = bucket,
Prefix = "2",
Delimiter = "/",
});
// Prints: 2017
Console.WriteLine(listObjectsResponse.CommonPrefixes[0]);
}
Related Topics
How to Source Environment Variables for a Command Shell in a Ruby Script
How to Use Mongodb Ruby Driver to Do a "Group" (Group By)
Array Typeerror: Can't Convert Fixnum into String
How to Split String into Array as Integers
In Ruby or Rails, Why Is "Include" Sometimes Inside the Class and Sometimes Outside the Class
Duplicating a Ruby Array of Strings
Sorting an Array by Two Values
Ror - How to Remove Rails 4.1.1 Version
Better Way to Write "Matching Balanced Parenthesis" Program in Ruby
Rails: How to Print a Decimal as a Percent
Finding the Ip Address of a Domain
I = True and False in Ruby Is True
How to Require File from 'Gem' Which Are Not Under 'Lib' Directory
Gem Which Cannot Find Gem Despite It Being Installed
Rails/Activerecord: Save Changes to a Model's Associated Collections
What's a Rails Plugin, or Ruby Gem, to Automatically Fix English Grammar