How to Calculate the Size of a Folder

What's the best way to calculate the size of a directory in .NET?

I do not believe there is a Win32 API to calculate the space consumed by a directory, although I stand to be corrected on this. If there were then I would assume Explorer would use it. If you get the Properties of a large directory in Explorer, the time it takes to give you the folder size is proportional to the number of files/sub-directories it contains.

Your routine seems fairly neat & simple. Bear in mind that you are calculating the sum of the file lengths, not the actual space consumed on the disk. Space consumed by wasted space at the end of clusters, file streams etc, are being ignored.

Calculating folder size / Enumerate filesystem

You must enumerate the folder on a background thread.

Suggestions to improve performance

When using the DriveInfo API you can further improve the performance for the case that the folder path is a drive. In this case, you can omit the enumeration of the complete drive, which usually takes a while.

Furthermore, your current implementation aborts the calculation when the enumeration throws the UnauthorizedAccessException exception. You don't want that. You want the algorithm to ignore forbidden filesystem paths.

The following two examples show a fixed and improved version of your implementation.

The first solution targets the modern .NET Standard 2.1 compliant .NET versions.

The second solution targets the old .NET Framework.

.NET Standard 2.1 (.NET Core 3.0, .NET 5)

When using a .NET version compatible with .NET Standard 2.1 like .NET Core 3.0 and .NET 5 you can eliminate the exception handling. Using EnumerationOptions as an argument allows the API to ignore inaccessible directories, which significantly improves performance (no more UnauthorizedAccessException exceptions) and readability:

internal static async Task TryGetDirectorySize(string directoryPath, out long spaceUsedInBytes)
{
spaceUsedInBytes = -1;
var drives = DriveInfo.GetDrives();
DriveInfo targetDrive = drives.FirstOrDefault(drive => drive.Name.Equals(directoryPath, StringComparison.OrdinalIgnoreCase));

// Directory is a drive: skip the expensive enumeration of complete drive.
if (targetDrive != null)
{
spaceUsedInBytes = targetDrive.TotalSize - targetDrive.TotalFreeSpace;
return true;
}

if (!Directory.Exists(folderPath))
{
return false;
}

// Consider to make this local variable a private property
var enumerationOptions = new EnumerationOptions { RecurseSubdirectories = true };

var targetFolderInfo = new DirectoryInfo(directoryPath);
spaceUsedInBytes = await Task.Run(
() => targetFolderInfo.EnumerateFiles("*", enumerationOptions)
.Sum(fileInfo => fileInfo.Length));

return true;
}


.NET Framework

A .NET Framework compliant version. It fixes the issue with your original code where the enumeration is aborted as soon as an UnauthorizedAccessException exception is thrown. This version continues to enumerate all remaining directories using recursion:

internal static async Task GetDirectorySize(string directoryPath)
{
long spaceUsedInBytes = -1;
var drives = DriveInfo.GetDrives();
DriveInfo targetDrive = drives.FirstOrDefault(drive => drive.Name.Equals(directoryPath, StringComparison.OrdinalIgnoreCase));

// Directory is a drive: skip enumeration of complete drive.
if (targetDrive != null)
{
spaceUsedInBytes = targetDrive.TotalSize - targetDrive.TotalFreeSpace;
return spaceUsedInBytes;
}

var targetDirectoryInfo = new DirectoryInfo(directoryPath);
spaceUsedInBytes = await Task.Run(() => SumDirectorySize(targetDirectoryInfo));
return spaceUsedInBytes;
}

private static long SumDirectorySize(DirectoryInfo parentDirectoryInfo)
{
long spaceUsedInBytes = 0;
try
{
spaceUsedInBytes = parentDirectoryInfo.EnumerateFiles("*", SearchOption.TopDirectoryOnly)
.Sum(fileInfo => fileInfo.Length);
}
catch (UnauthorizedAccessException)
{
return 0;
}

foreach (var subdirectoryInfo in parentDirectoryInfo.EnumerateDirectories("*", SearchOption.TopDirectoryOnly))
{
spaceUsedInBytes += SumDirectorySize(subdirectoryInfo);
}

return spaceUsedInBytes;
}


How to instantiate a type that requires to run async operations on construction

FolderModel.cs

class FolderModel
{
// Make a constructor private to force instantiation using the factory method
private FolderModel(string folderPath)
{
// Do non-async initialization
}

// Async factory method: add constructor parameters to async factory method
public static async Task CreateAsync(string folderPath)
{
var instance = new FolderModel(folderPath);
await instance.InitializeAsync(folderPath);
return instance;
}

// Define member as protected virtual to allow derived classes to add initialization routines
protected virtual async Task InitializeAsync(string directoryPath)
{
// Consider to throw an exception here ONLY in case the folder is generated programmatically.
// If folder is retrieved from user input, use input validation
// or even better use a folder picker dialog
// to ensure that the provided path is always valid!
if (!Directory.Exists(directoryPath))
{
throw new DirectoryNotFoundException($"Invalid directory path '{directoryPath}'.");
}

long folderSize = await GetDirectorySize(directoryPath);

// TODO::Do something with the 'folderSize' value
// and execute other async code if necessary
}
}

Usage

// Create an instance of FolderModel example
private async Task SomeMethod()
{
// Always await async methods (methods that return a Task).
// Call static CreateAsync method instead of the constructor.
FolderModel folderModel = await FolderModel.CreateAsync(@"C:\");
}

In a more advanced scenario when you want to defer the initialization for example because you want to avoid to allocate expensive resources that are not needed now or never, you can make the instance call InitializeAsync when a certain member that depends on these resources is referenced or you can make the constructor and the InitializeAsync method public to allow the user of the class to call InitializeAsync explicitly.

Get Folder Size from Windows Command Line

You can just add up sizes recursively (the following is a batch file):

@echo off
set size=0
for /r %%x in (folder\*) do set /a size+=%%~zx
echo %size% Bytes

However, this has several problems because cmd is limited to 32-bit signed integer arithmetic. So it will get sizes above 2 GiB wrong1. Furthermore it will likely count symlinks and junctions multiple times so it's at best an upper bound, not the true size (you'll have that problem with any tool, though).

An alternative is PowerShell:

Get-ChildItem -Recurse | Measure-Object -Sum Length

or shorter:

ls -r | measure -sum Length

If you want it prettier:

switch((ls -r|measure -sum Length).Sum) {
{$_ -gt 1GB} {
'{0:0.0} GiB' -f ($_/1GB)
break
}
{$_ -gt 1MB} {
'{0:0.0} MiB' -f ($_/1MB)
break
}
{$_ -gt 1KB} {
'{0:0.0} KiB' -f ($_/1KB)
break
}
default { "$_ bytes" }
}

You can use this directly from cmd:

powershell -noprofile -command "ls -r|measure -sum Length"

1 I do have a partially-finished bignum library in batch files somewhere which at least gets arbitrary-precision integer addition right. I should really release it, I guess :-)

What is the fastest way to calculate a Windows folders size?

There is no simple way to do this in .Net; you will have to loop through every file and subdir.
See the examples here to see how it's done.

Directory file size calculation - how to make it faster?

If fiddled with it a while, trying to Parallelize it, and surprisingly - it speeded up here on my machine (up to 3 times on a quadcore), don't know if it is valid in all cases, but give it a try...

.NET4.0 Code (or use 3.5 with TaskParallelLibrary)

    private static long DirSize(string sourceDir, bool recurse)
{
long size = 0;
string[] fileEntries = Directory.GetFiles(sourceDir);

foreach (string fileName in fileEntries)
{
Interlocked.Add(ref size, (new FileInfo(fileName)).Length);
}

if (recurse)
{
string[] subdirEntries = Directory.GetDirectories(sourceDir);

Parallel.For(0, subdirEntries.Length, () => 0, (i, loop, subtotal) =>
{
if ((File.GetAttributes(subdirEntries[i]) & FileAttributes.ReparsePoint) != FileAttributes.ReparsePoint)
{
subtotal += DirSize(subdirEntries[i], true);
return subtotal;
}
return 0;
},
(x) => Interlocked.Add(ref size, x)
);
}
return size;
}

How to calculate size of a folder located on different machine(Remote) in sql server 2008?

not tested but your path is looking locally for your remote machine ?

set @path = '\\ewp-dev18\\c$\\Attachments\\' +  CONVERT(varchar(50),@tenantId) 

should be

set @path = '\\\\ewp-dev18\\c$\\Attachments\\' +  CONVERT(varchar(50),@tenantId)  

You should also include the error message you're getting.

How to calculate a Directory size in ADLS using PySpark?

The dbutils.fs.ls doesn't have a recurse functionality like cp, mv or rm. Thus, you need to iterate yourself. Here is a snippet that will do the task for you. Run the code from a Databricks Notebook.

from dbutils import FileInfo
from typing import List

root_path = "/mnt/datalake/.../XYZ"

def discover_size(path: str, verbose: bool = True):
def loop_path(paths: List[FileInfo], accum_size: float):
if not paths:
return accum_size
else:
head, tail = paths[0], paths[1:]
if head.size > 0:
if verbose:
print(f"{head.path}: {head.size / 1e6} MB")
accum_size += head.size / 1e6
return loop_path(tail, accum_size)
else:
extended_tail = dbutils.fs.ls(head.path) + tail
return loop_path(extended_tail, accum_size)

return loop_path(dbutils.fs.ls(path), 0.0)

discover_size(root_path, verbose=True) # Total size in megabytes at the end

If the location is mounted in the dbfs. Then you could use the du -h approach (have not test it). If you are in the Notebook, create a new cell with:

%sh
du -h /mnt/datalake/.../XYZ

What’s the best way to calculate the size of a directory in VB .NET?

Though this answer is talking about Python, the concept applies here as well.

Windows Explorer uses system API calls FindFirstFile and FindNextFile recursively to pull file information, and then can access the file sizes very quickly through the data that's passed back via a struct, WIN32_FIND_DATA: http://msdn.microsoft.com/en-us/library/aa365740(v=VS.85).aspx.

My suggestion would be to implement these API calls using P/Invoke, and I believe you will experience significant performance gains.

Calculating a directory's size using Python?

This walks all sub-directories; summing file sizes:

import os

def get_size(start_path = '.'):
total_size = 0
for dirpath, dirnames, filenames in os.walk(start_path):
for f in filenames:
fp = os.path.join(dirpath, f)
# skip if it is symbolic link
if not os.path.islink(fp):
total_size += os.path.getsize(fp)

return total_size

print(get_size(), 'bytes')

And a oneliner for fun using os.listdir (Does not include sub-directories):

import os
sum(os.path.getsize(f) for f in os.listdir('.') if os.path.isfile(f))

Reference:

  • os.path.getsize - Gives the size in bytes
  • os.walk
  • os.path.islink

Updated
To use os.path.getsize, this is clearer than using the os.stat().st_size method.

Thanks to ghostdog74 for pointing this out!

os.stat - st_size Gives the size in bytes. Can also be used to get file size and other file related information.

import os

nbytes = sum(d.stat().st_size for d in os.scandir('.') if d.is_file())

Update 2018

If you use Python 3.4 or previous then you may consider using the more efficient walk method provided by the third-party scandir package. In Python 3.5 and later, this package has been incorporated into the standard library and os.walk has received the corresponding increase in performance.

Update 2019

Recently I've been using pathlib more and more, here's a pathlib solution:

from pathlib import Path

root_directory = Path('.')
sum(f.stat().st_size for f in root_directory.glob('**/*') if f.is_file())



Related Topics



Leave a reply



Submit