Differencebetween Directory.Enumeratefiles VS Directory.Getfiles

What is the difference between Directory.EnumerateFiles vs Directory.GetFiles?

From the docs:

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

So basically, EnumerateFiles returns an IEnumerable which can be lazily evaluated somewhat, whereas GetFiles returns a string[] which has to be fully populated before it can return.

What happens with Directory.EnumerateFiles if directory content changes during iteration?

Thanks Michal Komorowski. However when trying his solution myself I saw a remarkable distinction between Directory.EnumerateFiles and Directory.GetFiles():

Directory.CreateDirectory(@"c:\MyTest");
// Create fies: b c e
File.CreateText(@"c:\MyTest\b.txt").Dispose();
File.CreateText(@"c:\MyTest\c.txt").Dispose();
File.CreateText(@"c:\MyTest\e.txt").Dispose();

string[] files = Directory.GetFiles(@"c:\MyTest");
var fileEnumerator = Directory.EnumerateFiles(@"c:\MyTest");

// delete file c; create file a d f
File.Delete(@"c:\MyTest\c.txt");
File.CreateText(@"c:\MyTest\a.txt").Dispose();
File.CreateText(@"c:\MyTest\d.txt").Dispose();
File.CreateText(@"c:\MyTest\f.txt").Dispose();

Console.WriteLine("Result from Directory.GetFiles");
foreach (var file in files) Console.WriteLine(file);
Console.WriteLine("Result from Directory.EnumerateFiles");
foreach (var file in fileEnumerator) Console.WriteLine(file);

This will give different output.

Result from Directory.GetFiles
c:\MyTest\b.txt
c:\MyTest\c.txt
c:\MyTest\e.txt
Result from Directory.EnumerateFiles
c:\MyTest\b.txt
c:\MyTest\d.txt
c:\MyTest\e.txt
c:\MyTest\f.txt

Results:

  • GetFiles still saw the old files: B C E as expected
  • EnumerateFiles saw the new files D and F. It correctly skipped the deleted file C, but it missed the new file A.

So the difference in usage between EnumerateFiles and GetFiles is more than just performance.

  • GetFiles returns the files that were in the folder the moment you called the function. Which could be expected, because it's just an enumeration over a string collection
  • EnumerateFiles correctly skips deleted files, but doesn't see all added files. If the folder changes while enumerating the result is fairly undefined.

So if you expect that your folder changes while enumerating carefully choose the desired function

  • Expect GetFiles to see deleted files
  • Expect EnumerateFiles to miss some of the new files.

Using Directory.EnumerateFiles()

The variable files here is an IEnumerable, lazily evaluated. If you hover over files in the debugger and click 'Results view' then the full evaluation will take place (just as if you'd called, say, ToArray()). Otherwise the files will only be fetched as you need them (i.e. one at a time by the foreach loop).

So when you say:

"I see the files (returned enumerable of strings) contains all the entries in my directory"

I think you are mistaken.

C# directory.getfiles memory help

Directory.GetFiles really sucks. If you can use .NET 4.0 you should look into using Directory.EnumerateFiles. From the docs:

The EnumerateFiles and GetFiles
methods differ as follows: When you
use EnumerateFiles, you can start
enumerating the collection of names
before the whole collection is
returned; when you use GetFiles, you
must wait for the whole array of names
to be returned before you can access
the array. Therefore, when you are
working with many files and
directories, EnumerateFiles can be
more efficient.

GetFiles/GetDirectories and EnumerateDirectories/EnumerateFiles Performance penalty on UNC/virtual directory

I feel iterating entire directory and files is very costly. Try to find out why you need that. Why not to get one directory and search files inside it. I mean try to do some analysis with your tech team and business owners so that every one is realistic.

The only thing that I can think of is using Parallel for each loop rather than the normal one so that you can utilize your CPU's. But it will reduce the cost but not drastically.

File.GetFiles or File.EnumerateFiles locks the files?

A simple error in the foreach.

I was trying to copy and paste in the same place.

Fixed this way:

    foreach (string file in Directory.EnumerateFiles(sourceFolder))
{
File.Copy(file, Path.Combine(destinationFolder, Path.GetFileName(file)), true);
}

Overly Broad Directory.EnumerateFiles() vs Multiple File.Exists()?

I would go with the second method, a bunch of calls to File.Exists, and do it in a parallelized fashion. There is low throughput cost but high latency cost on File.Exists, so for speed you want it to be asynchronous.

Basic idea to start with:

var fileExists = myFilePaths
.AsParallel()
.ToDictionary(path => path, path => File.Exists(path));

(Of course, trying the first method and doing a speed comparison is completely reasonable - I'm just giving my 'best guess' and comments on the second method.)



Related Topics



Leave a reply



Submit