Copy a File in a Sane, Safe and Efficient Way

Copy a file in a sane, safe and efficient way

Copy a file in a sane way:

#include <fstream>

int main()
{
std::ifstream src("from.ogv", std::ios::binary);
std::ofstream dst("to.ogv", std::ios::binary);

dst << src.rdbuf();
}

This is so simple and intuitive to read it is worth the extra cost. If we were doing it a lot, better to fall back on OS calls to the file system. I am sure boost has a copy file method in its filesystem class.

There is a C method for interacting with the file system:

#include <copyfile.h>

int
copyfile(const char *from, const char *to, copyfile_state_t state, copyfile_flags_t flags);

How to copy a file from one location to another in a fast way with C++ program?

Reading one byte at time is going to waste a lot of time in function calls... use a bigger buffer:

char ch[4096];
while(in) {
in.read(ch, sizeof(ch));
out.write(ch, in.gcount());
}

(you may want to add some more error handling, e.g. out may go in a bad state and the like)

(the most C++-ish way is reported here, but takes advantage of streambuf functionalities that typically a beginner rarely has reason to know, and to me is also way less instructive)

Force file copy with very lower resources requirements

Re-reading this - "Now from time to time i like the program to copy/sync the files to our server. The challenge im facing is that i don't want the copy to use many system resources so it's not slowing the application or the writing of new files."

This means that you're not actually trying to avoid working the box hard; you just don't want to fall behind on your other file work.

I'd create a background thread that does all your disk access; and provide 2 queues for it to work on. The first being the new files to write; the 2nd being the files to copy. This thread can then use the disk at full speed by working on chunks at a time; with the priority being getting a chunk from the new file area. This will allow you to start doing a copy; and then stop copying entirely while there's lots of files to process (in case there's a burst) and then copy as fast as you can once it's all been processed.

C++ most robust way to copy a file

#include <boost/filesystem.hpp>
#include <iostream>

int main()
{
try
{
boost::filesystem::copy_file( "C:\\Users\\Admin\\Desktop\\example file.txt",
"C:\\Users\\Admin\\Desktop\\example copy.txt" );
}
catch ( boost::filesystem::filesystem_error const & ex )
{
std::cerr << "Copy failed: " << ex.what();
}
}

This will call the arguably most robust implementation available -- the one provided by the operating system -- and report any failure.


My point being:

The chance of having your saved data end up corrupted are astronomically small to begin with.

Any application where this might actually be an issue should be running on redundant storage, a.k.a. RAID arrays, filesystems doing checksums (like Btrfs, ZFS) etc., again reducing chance of failure significantly.

Doing complex things in home-grown I/O functions, on the other hand, increases the probability of mistakes and / or false negatives immensely.

Use sendfile() to copy file with threads or other efficient copy file method

File copying is not CPU bound; if it were you're likely to find that the limitation is at the kernel level and nothing you can do at the user leve would parallelize it.

Such "improvements" done on mechanical drives will in fact degrade the throughput. You're wasting time seeking along the file instead of reading and writing it.

If the file is long and you don't expect to need the read or written data anytime soon, it might be tempting to use the O_DIRECT flag on open. That's a bad idea, since the O_DIRECT API is essentially broken by design.

Instead, you should use posix_fadvise on both source and destination files, with POSIX_FADV_SEQUENTIAL and POSIX_FADV_NOREUSE flags. After the write (or sendfile) call is finished, you need to advise that the data is not needed anymore - pass POSIX_FADV_DONTNEED. That way the page cache will only be used to the extent needed to keep the data flowing, and the pages will be recycled as soon as the data has been consumed (written to disk).

The sendfile will not push file data over to the user space, so it further relaxes some of the pressure from memory and processor cache. That's about the only other sensible improvement you can make for copying of files that's not device-specific.

Choosing a sensible chunk size is also desirable. Given that modern drives push over a 100Mbytes/s, you might want to push a megabyte at a time, and always a multiple of the 4096 byte page size - thus (4096*256) is a decent starting chunk size to handle in a single sendfile or read/write calls.

Read parallelization, as you propose it, only makes sense on RAID 0 volumes, and only when both the input and output files straddle the physical disks. You can then have one thread per the lesser of the number of source and destination volume physical disks straddled by the file. That's only necessary if you're not using asynchronous file I/O. With async I/O you wouldn't need more than one thread anyway, especially not if the chunk sizes are large (megabyte+) and the single-thread latency penalty is negligible.

There's no sense for parallelization of a single file copy on SSDs, unless you were on some very odd system indeed.

Best way to copy members from vectorClass to vectorMember_Type

std::transform is often used for exactly this:

#include <algorithm>  // transform
#include <iterator> // back_inserter

// ...

std::vector<int> x_values;
x_values.reserve(list.size());

std::transform(list.cbegin(), list.cend(), std::back_inserter(x_values),
[](const auto& pair) { return pair.first; });


Related Topics



Leave a reply



Submit