Files.walk(), calculate total size
No, this exception cannot be avoided.
The exception itself occurs inside the the lazy fetch of Files.walk()
, hence why you are not seeing it early and why there is no way to circumvent it, consider the following code:
long size = Files.walk(Paths.get("C://"))
.peek(System.out::println)
.mapToLong(this::count)
.sum();
On my system this will print on my computer:
C:\
C:\$Recycle.Bin
Exception in thread "main" java.io.UncheckedIOException: java.nio.file.AccessDeniedException: C:\$Recycle.Bin\S-1-5-18
And as an exception is thrown on the (main) thread on the third file, all further executions on that thread stop.
I believe this is a design failure, because as it stands now Files.walk
is absolutely unusable, because you never can guarantee that there will be no errors when walking over a directory.
One important point to notice is that the stacktrace includes a sum()
and reduce()
operation, this is because the path is being lazily loaded, so at the point of reduce()
, the bulk of stream machinery gets called (visible in stacktrace), and then it fetches the path, at which point the UnCheckedIOException
occurs.
It could possibly be circumvented if you let every walking operation execute on their own thread. But that is not something you would want to be doing anyway.
Also, checking if a file is actually accessible is worthless (though useful to some extent), because you can not guarantee that it is readable even 1ms later.
Future extension
I believe it can still be fixed, though I do not know how FileVisitOption
s exactly work.
Currently there is a FileVisitOption.FOLLOW_LINKS
, if it operates on a per file basis, then I would suspect that a FileVisitOption.IGNORE_ON_IOEXCEPTION
could also be added, however we cannot correctly inject that functionality in there.
How to calculate size of immediate subfolders of a folder using os.walk()
I made this finally and works fine-
import os
from pathlib import Path
root='/dbfs/mnt/datalake/.../'
size = 0
for path, subdirs, files in os.walk(root):
for f in Path(root).iterdir():
if name in files:
if f.is_dir():
size += os.path.getsize(os.path.join(path, name))
dirSize = size/(1048576)
print(f, "--Size:", dirSize)
Get size of folder or file
java.io.File file = new java.io.File("myfile.txt");
file.length();
This returns the length of the file in bytes or 0
if the file does not exist. There is no built-in way to get the size of a folder, you are going to have to walk the directory tree recursively (using the listFiles()
method of a file object that represents a directory) and accumulate the directory size for yourself:
public static long folderSize(File directory) {
long length = 0;
for (File file : directory.listFiles()) {
if (file.isFile())
length += file.length();
else
length += folderSize(file);
}
return length;
}
WARNING: This method is not sufficiently robust for production use. directory.listFiles()
may return null
and cause a NullPointerException
. Also, it doesn't consider symlinks and possibly has other failure modes. Use this method.
Calculating a directory's size using Python?
This walks all sub-directories; summing file sizes:
import os
def get_size(start_path = '.'):
total_size = 0
for dirpath, dirnames, filenames in os.walk(start_path):
for f in filenames:
fp = os.path.join(dirpath, f)
# skip if it is symbolic link
if not os.path.islink(fp):
total_size += os.path.getsize(fp)
return total_size
print(get_size(), 'bytes')
And a oneliner for fun using os.listdir (Does not include sub-directories):
import os
sum(os.path.getsize(f) for f in os.listdir('.') if os.path.isfile(f))
Reference:
- os.path.getsize - Gives the size in bytes
- os.walk
- os.path.islink
Updated
To use os.path.getsize, this is clearer than using the os.stat().st_size method.
Thanks to ghostdog74 for pointing this out!
os.stat - st_size Gives the size in bytes. Can also be used to get file size and other file related information.
import os
nbytes = sum(d.stat().st_size for d in os.scandir('.') if d.is_file())
Update 2018
If you use Python 3.4 or previous then you may consider using the more efficient walk
method provided by the third-party scandir
package. In Python 3.5 and later, this package has been incorporated into the standard library and os.walk
has received the corresponding increase in performance.
Update 2019
Recently I've been using pathlib
more and more, here's a pathlib
solution:
from pathlib import Path
root_directory = Path('.')
sum(f.stat().st_size for f in root_directory.glob('**/*') if f.is_file())
Using os.walk to find total size of FTP server
Use this function to fetch the size of directory using ftp client.
def get_size_of_directory(ftp, directory):
size = 0
for name in ftp.nlst(directory):
try:
ftp.cwd(name)
size += get_size_of_directory(name)
except:
ftp.voidcmd('TYPE I')
size += ftp.size(name)
return size
You can recursively call the get_size_of_directory for each directory you find in the directory
Hope this helps !!
very quickly getting total size of folder
You are at a disadvantage.
Windows Explorer almost certainly uses FindFirstFile
/FindNextFile
to both traverse the directory structure and collect size information (through lpFindFileData
) in one pass, making what is essentially a single system call per file.
Python is unfortunately not your friend in this case. Thus,
os.walk
first callsos.listdir
(which internally callsFindFirstFile
/FindNextFile
)- any additional system calls made from this point onward can only make you slower than Windows Explorer
os.walk
then callsisdir
for each file returned byos.listdir
(which internally callsGetFileAttributesEx
-- or, prior to Win2k, aGetFileAttributes
+FindFirstFile
combo) to redetermine whether to recurse or notos.walk
andos.listdir
will perform additional memory allocation, string and array operations etc. to fill out their return value- you then call
getsize
for each file returned byos.walk
(which again callsGetFileAttributesEx
)
That is 3x more system calls per file than Windows Explorer, plus memory allocation and manipulation overhead.
You can either use Anurag's solution, or try to call FindFirstFile
/FindNextFile
directly and recursively (which should be comparable to the performance of a cygwin
or other win32 port du -s some_directory
.)
Refer to os.py
for the implementation of os.walk
, posixmodule.c
for the implementation of listdir
and win32_stat
(invoked by both isdir
and getsize
.)
Note that Python's os.walk
is suboptimal on all platforms (Windows and *nices), up to and including Python3.1. On both Windows and *nices os.walk
could achieve traversal in a single pass without calling isdir
since both FindFirst
/FindNext
(Windows) and opendir
/readdir
(*nix) already return file type via lpFindFileData->dwFileAttributes
(Windows) and dirent::d_type
(*nix).
Perhaps counterintuitively, on most modern configurations (e.g. Win7 and NTFS, and even some SMB implementations) GetFileAttributesEx
is twice as slow as FindFirstFile
of a single file (possibly even slower than iterating over a directory with FindNextFile
.)
Update: Python 3.5 includes the new PEP 471 os.scandir()
function that solves this problem by returning file attributes along with the filename. This new function is used to speed up the built-in os.walk()
(on both Windows and Linux). You can use the scandir module on PyPI to get this behavior for older Python versions, including 2.x.
Avoid Java 8 Files.walk(..) termination cause of ( java.nio.file.AccessDeniedException )
Answer
Here is a temporary solution , which can be improved to use Java 8 Streams and Lambdas.
int[] count = {0};
try {
Files.walkFileTree(
Paths.get(dir.getPath()),
new HashSet<FileVisitOption>(Arrays.asList(FileVisitOption.FOLLOW_LINKS)),
Integer.MAX_VALUE, new SimpleFileVisitor<Path>() {
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
throws IOException {
System.out.printf("Visiting file %s\n", file);
++count[0];
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path file, IOException e)
throws IOException {
System.err.printf("Visiting failed for %s\n", file);
return FileVisitResult.SKIP_SUBTREE;
}
@Override
public FileVisitResult preVisitDirectory(Path dir,
BasicFileAttributes attrs)
throws IOException {
System.out.printf("About to visit directory %s\n", dir);
return FileVisitResult.CONTINUE;
}
});
} catch (IOException e) {
// handle exception
}
How to get directory total size?
Using a global like that at best is bad practice.
It's also a race if DirSizeMB
is called concurrently.
The simple solution is to use a closure, e.g.:
func DirSize(path string) (int64, error) {
var size int64
err := filepath.Walk(path, func(_ string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.IsDir() {
size += info.Size()
}
return err
})
return size, err
}
Playground
You could assign the closure to a variable if you think that looks better.
Related Topics
What Is the Most Accurate Encoding Detector
How to Insert an Pdpage Within Another Pdpage with PDFbox
How to Dynamically Build a Multi-Dimensional Array in Java
Tablecellrenderer and How to Refresh Cell Background Without Using Jtable.Repaint()
Command Line Progress Bar in Java
Maximum Size of Hashset, Vector, Linkedlist
Websphere All Logs Are Going to Systemout.Log
Method Calls Inside a Java Class Return an "Identifier Expected After This Token" Error
Waiting on Multiple Threads to Complete in Java
09 Is Not Recognized Where as 9 Is Recognized
What Are Shadow Variables in Java
How Is Driver Class Located in Jdbc4
Hibernate: "Field 'Id' Doesn't Have a Default Value"
Mocking Time in Java 8's Java.Time API