How to Find All of the Distinct File Extensions in a Folder Hierarchy

How can I find all of the distinct file extensions in a folder hierarchy?

Try this (not sure if it's the best way, but it works):

find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

It work as following:

  • Find all files from current folder
  • Prints extension of files if any
  • Make a unique sorted list

How can I find all unique file extensions fin a folder hierarchy in java

A custom FileFilter:

public class FileExtensionFilter implements FilenameFilter {
private Set<String> filteredExtensions;
public FileExtensionFilter() {
filteredExtensions = new HashSet<String>();
}
@Override
public boolean accept(File dir, String name) {
boolean accept = true;
for (String filteredExtension:filteredExtensions) {
accept = accept && !name.endsWith(filteredExtension);
}
return accept;
}
public void addFilteredExtension(String extension) {
filteredExtensions.add(extension);
}
}

Recursive method solution:

public Set<String> checkForExtensions(File file) {
Set<String> extensions = new HashSet<String>();
if (file.isDirectory()) {
for (File f : file.listFiles(fileExtensionFilter)) {
extensions.addAll(checkForExtensions(f));
}
} else {
//NOTE: if you don't want the '.' in the extension you'll need to add a '+1' to the substring call
String extension = file.getName().substring(Math.max(file.getName().lastIndexOf('.'),0));
extensions.add(extension);
fileExtensionFilter.addFilteredExtension(extension);
}
return extensions;
}

Originally I had the same solution without the FileExtensionFilter but noticed I could improve the efficiency a bit by dynamically adding to the filtering. The savings was exponential. I went from 47 seconds down to 700 milliseconds.

You could also clean up memory usage a bit more now by eliminating the Set all together since the FileExtensionFilter will contain a duplicate copy of all the extensions in the Set.

How can I find all unique file extensions fin a folder hierarchy in java

A custom FileFilter:

public class FileExtensionFilter implements FilenameFilter {
private Set<String> filteredExtensions;
public FileExtensionFilter() {
filteredExtensions = new HashSet<String>();
}
@Override
public boolean accept(File dir, String name) {
boolean accept = true;
for (String filteredExtension:filteredExtensions) {
accept = accept && !name.endsWith(filteredExtension);
}
return accept;
}
public void addFilteredExtension(String extension) {
filteredExtensions.add(extension);
}
}

Recursive method solution:

public Set<String> checkForExtensions(File file) {
Set<String> extensions = new HashSet<String>();
if (file.isDirectory()) {
for (File f : file.listFiles(fileExtensionFilter)) {
extensions.addAll(checkForExtensions(f));
}
} else {
//NOTE: if you don't want the '.' in the extension you'll need to add a '+1' to the substring call
String extension = file.getName().substring(Math.max(file.getName().lastIndexOf('.'),0));
extensions.add(extension);
fileExtensionFilter.addFilteredExtension(extension);
}
return extensions;
}

Originally I had the same solution without the FileExtensionFilter but noticed I could improve the efficiency a bit by dynamically adding to the filtering. The savings was exponential. I went from 47 seconds down to 700 milliseconds.

You could also clean up memory usage a bit more now by eliminating the Set all together since the FileExtensionFilter will contain a duplicate copy of all the extensions in the Set.

How can I find all of the distinct greedy file suffixes in a folder hierarchy?

All bash solution for files in cwd:

declare -A a         # declare an associative array a
for f in *.* # loop all filenames with a .
do
((a[${f#*.}]++)) # increment array elements value
done

Outputing counts:

for k in "${!a[@]}"  # loop all array keys
do
echo ${a[$k]} $k # output value and key
done
1 zip
2 txt
1 txt~

How to find all file extensions recursively from a directory?

How about this:

find . -type f -name '*.*' | sed 's|.*\.||' | sort -u

How can i list of unique file types-extensions in recursive directory in Python3 in script?

That other question is not about Python anyway. One way to do this is to walk the path, which recursively enters subdirectories and add the file types to a set:

import os
exts = set(f.split('.')[-1] for dir,dirs,files in os.walk('.') for f in files if '.' in f)

Use [-1] after splitting to extract the last part, in-case the filename contains a ..

Use if '.' in f to make sure the file actually has an extension.

Mulled it over

and my insistence to not use splitext seems unwarranted, it's much cleaner:

import os
exts = set(os.splitext(f)[1] for dir,dirs,files in os.walk('.') for f in files)

which will return empty extensions for files with no extension.

How to list all distinct extensions of tracked files in a git repository?

git ls-tree -r HEAD --name-only | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u 

When you declare it as an alias, you have to escape $1:

alias gitFileExtensions="git ls-tree -r HEAD --name-only | perl -ne 'print \$1 if m/\.([^.\/]+)$/' | sort -u"

This is better than naive find, because:

  • it excludes untracked (gitignored) files
  • it excludes .git directory which contains usually hundreds/thousands of files and hence slows down the search

(inspired by How can I find all of the distinct file extensions in a folder hierarchy?)

Select distinct file type extensions from all hard drives?

Well, yes, there is always the brute-force approach.

    static void Main(string[] args)
{
Dictionary<string, int> Extenstions = new Dictionary<string, int>();

PopulateExtenstions("C:\\", Extenstions);

foreach (string key in Extenstions.Keys)
{
Console.Write("{0}\t{1}", key, Extenstions[key]);
}
Console.ReadKey(true);
}

private static void PopulateExtenstions(string path, Dictionary<string, int> extenstions)
{
string[] files = null;
string[] subdirs = null;

try
{
files = Directory.GetFiles(path);
}
catch (UnauthorizedAccessException)
{
}

try
{
subdirs = Directory.GetDirectories(path);
}
catch (UnauthorizedAccessException)
{
}

if (files != null)
{
foreach (string file in files)
{
var fi = new FileInfo(file);

if (extenstions.ContainsKey(fi.Extension))
{
extenstions[fi.Extension]++;
}
else
{
extenstions[fi.Extension] = 1;
}
}
}
if (subdirs != null)
{
foreach (string sub in subdirs)
{
PopulateExtenstions(sub, extenstions);
}
}
}

This will find the count of all files with any given extension on your system (that is accessable to you).

However, I would suggest if you just want a list of filetypes, to check the HKEY_CLASSES_ROOT section of your registry.

Anyways, here is my result:

...

.tga 1453
.inf 1491
.mum 1519
.cs 1521
.sys 1523
.gif 1615
.vdf 1615
.txt 1706
.h 1775
.DLL 1954
.bmp 2522
.xml 2540
.exe 2832
3115
.png 3128
.jpg 3385
.GPD 3629
.cat 3979
.vcd 5140
.mui 6153
.wav 8522
.dll 14669
.manifest 19344
Elapsed Time: 17561


Related Topics



Leave a reply



Submit