How Stable Is S3Fs to Mount an Amazon S3 Bucket as a Local Directory

How stable is s3fs to mount an Amazon S3 bucket as a local directory

There's a good article on s3fs here, which after reading I resorted to an EBS Share.

It highlights a few important considerations when using s3fs, namely related to the inherent limitations of S3:

  • no file can be over 5GB
  • you can't partially update a file so changing a single byte will re-upload the entire file.
  • operation on many small files are very efficient (each is a separate S3 object after all) but large files are very inefficient
  • Though S3 supports partial/chunked downloads, s3fs doesn't take advantage of this so if you want to read just one byte of a 1GB file, you'll have to download the entire GB.

It therefore depends on what you are storing whether s3fs is a feasible option. If you're storing say, photos, where you want to write an entire file or read an entire file never incrementally change a file, then its fine, although one may ask, if you're doing this, then why not just use S3's API Directly?

If you're talking about appliation data, (say database files, logging files) where you want to make small incremental change then its a definite no - S3 Just doesn't work that way you can't incrementally change a file.

The article mentioned above does talk about a similar application - s3backer - which gets around the performance issues by implementing a virtual filesystem over S3. This gets around the performance issues but itself has a few issues of its own:

  • High risk for data corruption, due to the delayed writes
  • too small block sizes (e.g., the 4K default) can add significant
    extra costs (e.g., $130 for 50GB with 4K blocks worth of storage)
  • too large block sizes can add significant data transfer and storage
    fees.
  • memory usage can be prohibitive: by default it caches 1000 blocks.

    With the default 4K block size that's not an issue but most users

    will probably want to increase block size.

I resorted to EBS Mounted Drived shared from an EC2 instance. But you should know that although the most performant option it has one big problem
An EBS Mounted NFS Share has its own problems - a single point of failure; if the machine that's sharing the EBS Volume goes down then you lose access on all machines which access the share.

This is a risk I was able to live with and was the option I chose in the end. I hope this helps.

How to mount AWS s3 using S3FS to allow full access to any user

For anyone else seeing access denied type errors, the answer in this case was nothing to do with the command line - as confirmed above. The answer was that AWS KMS (CMK) encryption had been applied to the bucket but the S3FS user did not have read access to read the keys. Simply granting the AWS user read access to KMS keys via IAM policy fixed it.

What does s3fs cache in /tmp?

From the s3fs wiki (which is a bit hard to find).

If enabled via "use_cache" option, s3fs automatically maintains a local cache of files in the folder specified by use_cache. Whenever
s3fs needs to read or write a file on s3 it first downloads the entire
file locally to the folder specified by use_cache and operates on it.
When fuse release() is called, s3fs will re-upload the file to s3 if
it has been changed. s3fs uses md5 checksums to minimize downloads
from s3. Note: this is different from the stat cache (see below).

Local file caching works by calculating and comparing md5 checksums (ETag HTTP header).

The folder specified by use_cache is just a local cache. It can be deleted at any time. s3fs re-builds it on demand. Note: this directory
grows unbounded and can fill up a file system dependent upon the
bucket and reads to that bucket.

Amazon S3 with s3fs and fuse, transport endpoint is not connected

Well, the solution was simple: to unmount and mount the dir. The error transport endpoint is not connected was solved by unmounting the s3 folder and then mounting again.

Command to unmount

fusermount -u /s3

Command to mount

/usr/bin/s3fs -o allow_other bucketname /s3

Takes 3 minutes to sync.

Understanding storage requirement for amazon s3 using fuse-s3fs

By default (i.e. without using caching option), s3fs does nothing more than provide a mount point for your S3 bucket. There will be no local data copies retained. s3fs does however provide an option to utilize a local drive as a cache, which can provide some i/o performance improvements in cases where they are needed (as direct read/write to S3 bucket via the mount is not the quickest process).



Related Topics



Leave a reply



Submit