How to Do a Recursive Sub-Folder Search and Return Files in a List

Python recursive folder read

Make sure you understand the three return values of os.walk:

for root, subdirs, files in os.walk(rootdir):

has the following meaning:

  • root: Current path which is "walked through"
  • subdirs: Files in root of type directory
  • files: Files in root (not in subdirs) of type other than directory

And please use os.path.join instead of concatenating with a slash! Your problem is filePath = rootdir + '/' + file - you must concatenate the currently "walked" folder instead of the topmost folder. So that must be filePath = os.path.join(root, file). BTW "file" is a builtin, so you don't normally use it as variable name.

Another problem are your loops, which should be like this, for example:

import os
import sys

walk_dir = sys.argv[1]

print('walk_dir = ' + walk_dir)

# If your current working directory may change during script execution, it's recommended to
# immediately convert program arguments to an absolute path. Then the variable root below will
# be an absolute path as well. Example:
# walk_dir = os.path.abspath(walk_dir)
print('walk_dir (absolute) = ' + os.path.abspath(walk_dir))

for root, subdirs, files in os.walk(walk_dir):
print('--\nroot = ' + root)
list_file_path = os.path.join(root, 'my-directory-list.txt')
print('list_file_path = ' + list_file_path)

with open(list_file_path, 'wb') as list_file:
for subdir in subdirs:
print('\t- subdirectory ' + subdir)

for filename in files:
file_path = os.path.join(root, filename)

print('\t- file %s (full path: %s)' % (filename, file_path))

with open(file_path, 'rb') as f:
f_content = f.read()
list_file.write(('The file %s contains:\n' % filename).encode('utf-8'))
list_file.write(f_content)
list_file.write(b'\n')

If you didn't know, the with statement for files is a shorthand:

with open('filename', 'rb') as f:
dosomething()

# is effectively the same as

f = open('filename', 'rb')
try:
dosomething()
finally:
f.close()

Recursively read files from sub-folders into a list and merge each sub-folder's files into one csv per sub-folder

The issue is most probably that in the main directory - Folder (or /dir according to your code) , you do not have any files , so file_list is empty and hence df_list is also empty. so when you pass an empty list into pd.concat() , you are getting that error. Example -

In [5]: pd.concat([])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython> in <module>()
----> 1 pd.concat([])

/path/to/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
752 keys=keys, levels=levels, names=names,
753 verify_integrity=verify_integrity,
--> 754 copy=copy)
755 return op.get_result()
756

/path/to/merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
797
798 if len(objs) == 0:
--> 799 raise ValueError('All objects passed were None')
800
801 # consolidate data & figure out what our result ndim is going to be

ValueError: All objects passed were None

I would suggest you should check that the files you are reading are really files and that they end with .csv and that the df_list is not empty, when you pass it into pd.concat(). Also I would suggest that you use os.path.join() , rather than concatenating strings, to create paths. Example -

import pandas as pd
import os.path
import os

working_dir = "/dir/"

for root, dirs, files in os.walk(working_dir):
file_list = []
for filename in files:
if filename.endswith('.csv'):
file_list.append(os.path.join(root, filename))
df_list = [pd.read_table(file) for file in file_list]
if df_list:
final_df = pd.concat(df_list)
final_df.to_csv(os.path.join(root, "Final.csv"))

EDIT:

As you say -

Also the output is adding another column that looks to be an id column.

The new column that comes in is most probably the index of the DataFrames.

When doing DataFrame.to_csv() , if you do not want the index of the DataFrame to be written to csv , you should specify index keyword argument as False so that the index is not written to the csv. Example -

final_df.to_csv(os.path.join(root, "Final.csv"), index=False)

Getting a list of all subdirectories in the current directory

Do you mean immediate subdirectories, or every directory right down the tree?

Either way, you could use os.walk to do this:

os.walk(directory)

will yield a tuple for each subdirectory. Ths first entry in the 3-tuple is a directory name, so

[x[0] for x in os.walk(directory)]

should give you all of the subdirectories, recursively.

Note that the second entry in the tuple is the list of child directories of the entry in the first position, so you could use this instead, but it's not likely to save you much.

However, you could use it just to give you the immediate child directories:

next(os.walk('.'))[1]

Or see the other solutions already posted, using os.listdir and os.path.isdir, including those at "How to get all of the immediate subdirectories in Python".

Python list directory, subdirectory, and files

Use os.path.join to concatenate the directory and file name:

for path, subdirs, files in os.walk(root):
for name in files:
print(os.path.join(path, name))

Note the usage of path and not root in the concatenation, since using root would be incorrect.


In Python 3.4, the pathlib module was added for easier path manipulations. So the equivalent to os.path.join would be:

pathlib.PurePath(path, name)

The advantage of pathlib is that you can use a variety of useful methods on paths. If you use the concrete Path variant you can also do actual OS calls through them, like changing into a directory, deleting the path, opening the file it points to and much more.

How can I recursively find all files in current and subfolders based on wildcard matching?

Use find:

find . -name "foo*"

find needs a starting point, so the . (dot) points to the current directory.

How to use glob() to find files recursively?

pathlib.Path.rglob

Use pathlib.Path.rglob from the the pathlib module, which was introduced in Python 3.5.

from pathlib import Path

for path in Path('src').rglob('*.c'):
print(path.name)

If you don't want to use pathlib, use can use glob.glob('**/*.c'), but don't forget to pass in the recursive keyword parameter and it will use inordinate amount of time on large directories.

For cases where matching files beginning with a dot (.); like files in the current directory or hidden files on Unix based system, use the os.walk solution below.

os.walk

For older Python versions, use os.walk to recursively walk a directory and fnmatch.filter to match against a simple expression:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
for filename in fnmatch.filter(filenames, '*.c'):
matches.append(os.path.join(root, filename))

How to list all files of a directory recursively in c

I can't for the life of me think why anybody would want to enumerate directories by calling main() recursively. But, since I can't resist a pointless challenge, here's a version that does. Do I get the prize for "most fruitless waste of ten minutes?" ;)

#include <stdio.h>
#include <string.h>
#include <dirent.h>
#include <errno.h>
#include <stdlib.h>

int main (int argc, char **argv)
{
const char *path;
if (argc != 2) path = "/etc"; /* Set starting directory, if not passed */
else
path = argv[1];

DIR *dir = opendir (path);
if (dir)
{
struct dirent *dp;
while ((dp = readdir(dir)) != NULL)
{
if (dp->d_name[0] != '.')
{
char *fullpath = malloc (strlen (path) + strlen (dp->d_name) + 2);
strcpy (fullpath, path);
strcat (fullpath, "/");
strcat (fullpath, dp->d_name);
if (dp->d_type == DT_DIR)
{
char **new_argv = malloc (2 * sizeof (char *));
new_argv[0] = argv[0];
new_argv[1] = fullpath;
main (2, new_argv);
free (new_argv);
}
else
printf ("%s\n", fullpath);
free (fullpath);
}
}
closedir(dir);
}
else
fprintf (stderr, "Can't open dir %s: %s", path, strerror (errno));
return 0;
}

How to recursively list all the files in a directory in C#?

This article covers all you need. Except as opposed to searching the files and comparing names, just print out the names.

It can be modified like so:

static void DirSearch(string sDir)
{
try
{
foreach (string d in Directory.GetDirectories(sDir))
{
foreach (string f in Directory.GetFiles(d))
{
Console.WriteLine(f);
}
DirSearch(d);
}
}
catch (System.Exception excpt)
{
Console.WriteLine(excpt.Message);
}
}

Added by barlop

GONeale mentions that the above doesn't list the files in the current directory and suggests putting the file listing part outside the part that gets directories. The following would do that. It also includes a Writeline line that you can uncomment, that helps to trace where you are in the recursion that may help to show the calls to help show how the recursion works.

            DirSearch_ex3("c:\\aaa");
static void DirSearch_ex3(string sDir)
{
//Console.WriteLine("DirSearch..(" + sDir + ")");
try
{
Console.WriteLine(sDir);

foreach (string f in Directory.GetFiles(sDir))
{
Console.WriteLine(f);
}

foreach (string d in Directory.GetDirectories(sDir))
{
DirSearch_ex3(d);
}
}
catch (System.Exception excpt)
{
Console.WriteLine(excpt.Message);
}
}


Related Topics



Leave a reply



Submit