What Does "File.Sync = True" Do

What does file.sync = true do?

It sets the sync mode of the file.

This affects future operations and causes output to be written without block buffering.

If f.tty? is true, that is, if the file is connected to a console-like device, then output is not block buffered. But when output goes to a pipe or file, f.tty? will be false and the I/O library will switch to block buffering, that is, accumulating output in a buffer and writing it only if the file is closed, the program exits, or the buffer fills up. This is faster and the end result is the same.

Setting f.sync = true defeats this switch. This can be useful if the output of the pipe is connected to something that actually is a console or in some way interactive or if the contents of the file are being actively monitored.

What STDOUT.sync = true means?

Normally puts does not write immediately to STDOUT, but buffers the strings internally and writes the output in bigger chunks. This is done because IO operations are slow and usually it makes more sense to avoid writing every single character immediately to the console.

This behavior leads to problems in certain situations. Imagine you want to build a progress bar (run a loop that outputs single dots between extensive calculations). With buffering the result might be that there isn't any output for a while and then suddenly multiple dots are printed at once.

To avoid this behavior and instead write immediately to STDOUT you can set STDOUT into sync mode like this:

STDOUT.sync = true

From the docs:

When sync mode is true, all output is immediately flushed to the underlying operating system and is not buffered internally.

Really force file sync/flush in Java

You need to tell us more about the hardware and operating system, also the specific Java version. How are you measuring this throughput?

You're correct that force/sync should force the data out to the physical media.


Here's a raw version of copy. Compiled with gcc 4.0 on an Intel Mac, should be clean.

/* rawcopy -- pure C, system calls only, copy argv[1] to argv[2] */

/* This is a test program which simply copies from file to file using
* only system calls (section 2 of the manual.)
*
* Compile:
*
* gcc -Wall -DBUFSIZ=1024 -o rawcopy rawcopy.c
*
* If DIRTY is defined, then errors are interpreted with perror(3).
* This is ifdef'd so that the CLEAN version is free of stdio. For
* convenience I'm using BUFSIZ from stdio.h; to compile CLEAN just
* use the value from your stdio.h in place of 1024 above.
*
* Compile DIRTY:
*
* gcc -DDIRTY -Wall -o rawcopy rawcopy.c
*
*/
#include <fcntl.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <stdlib.h>
#include <unistd.h>
#if defined(DIRTY)
# if defined(BUFSIZ)
# error "Don't define your own BUFSIZ when DIRTY"
# endif
# include <stdio.h>
# define PERROR perror(argv[0])
#else
# define CLEAN
# define PERROR
# if ! defined(BUFSIZ)
# error "You must define your own BUFSIZ with -DBUFSIZ=<number>"
# endif
#endif

char * buffer[BUFSIZ]; /* by definition stdio BUFSIZ should
be optimal size for read/write */

extern int errno ; /* I/O errors */

int main(int argc, char * argv[]) {
int fdi, fdo ; /* Input/output file descriptors */
ssize_t len ; /* length to read/write */
if(argc != 3){
PERROR;
exit(errno);
}

/* Open the files, returning perror errno as the exit value if fails. */
if((fdi = open(argv[1],O_RDONLY)) == -1){
PERROR;
exit(errno);
}
if((fdo = open(argv[2], O_WRONLY|O_CREAT)) == -1){
PERROR;
exit(errno);
}

/* copy BUFSIZ bytes (or total read on last block) fast as you
can. */
while((len = read(fdi, (void *) buffer, BUFSIZ)) > -1){
if(len == -1){
PERROR;
exit(errno);
}
if(write(fdo, (void*)buffer, len) == -1){
PERROR;
exit(errno);
}
}
/* close and fsync the files */
if(fsync(fdo) ==-1){
PERROR;
exit(errno);
}
if(close(fdo) == -1){
PERROR;
exit(errno);
}
if(close(fdi) == -1){
PERROR;
exit(errno);
}

/* if it survived to here, all worked. */
exit(0);
}

How does `aws s3 sync` determine if a file has been updated?

According to this - http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

S3 sync compares the size of the file and the last modified timestamp to see if a file needs to be synced.

In your case, I'd suspect the build system is resulting in a newer timestamp even though the file size hasn't changed?

When to flush a file in Go?

You'll notice that an os.File doesn't have a .Flush() because it doesn't need one because it isn't buffered. Writes to it are direct syscalls to write to the file.

When your program exits(even if it crashes) all files it has open will be closed automatically by the operating system and the file system will write your changes to disk when it gets around to it (sometimes up to few minutes after your program exits).

Calling os.File.Sync() will call the fsync() syscall which will force the file system to flush it's buffers to disk. This will guarantee that your data is on disk and persistent even if the system is powered down or the operating system crashes.

You don't need to call .Sync()

Cloud Firestore file syncing

I think you are mixing up the limit for Cloud Firestore (one of the NoSQL dbs offered by Firebase/GCP) and Cloud Storage, the service for objects/file storage.

The maximum size limit for individual objects stored in Cloud Storage is 5 TiB. Of course, uploading such a file from a web or mobile app may not make sense, but this is more a functional limit.

So you can very well upload large files with Cloud Storage and associate to each Cloud Storage file a Firestore document which contains, for example, a download URL. With that you have the elementary bricks of a file sharing system: The storage is based on Cloud Storage and Firestore contains documents that point to the files and handles the collaborative file-sharing part based on these docs..


To associate the Cloud Storage file and the Firestore doc you can adopt several strategies, depending on your exact use case:

  • Use the Firestore document ID as the file name;
  • Store the Cloud Storage path and/or download URL in the Firestore document;
  • Store the file in a folder named with the Firestore document ID
  • ...

How does unison decide which way to sync a file

Unison keeps a record of the contents of each path after each successful synchronization of that path (i.e., it remembers the contents at the last moment when they were the same in the two replicas).

We say that a path is updated (in some replica) if its current contents are different from its contents the last time it was successfully synchronized. Note that whether a path is updated has nothing to do with its last modification time—Unison considers only the contents when determining whether an update has occurred. This means that touching a file without changing its contents will not be recognized as an update. A file can even be changed several times and then changed back to its original contents; as long as Unison is only run at the end of this process, no update will be recognized.

In other words: Unison knows that you have deleted file X, because it's no longer on the disk in A, it knows it should delete it from B.



Related Topics



Leave a reply



Submit