Find in Files Using Ruby or Python

find in files using ruby or python

I know you said you don't feel like writing it yourself, but for what it's worth, it would be very easy using os.walk - you could do something like this:

results = []
if regex_search:
p = re.compile(__searchtext__)
for dir, subdirs, subfiles in os.walk('c:/docs/2009'):
for name in fnmatch.filter(subfiles, '*.txt'):
fn = os.path.join(dir, name)
with open(fn, 'r') as f:
if regex_search:
results += [(fn,lineno) for lineno, line in enumerate(f) if p.search(line)]
else:
results += [(fn,lineno) for lineno, line in enumerate(f) if line.find(__searchtext__) >= 0]

(that's Python, btw)

Search for text in files in the path using ruby

As already pointed out, you can use Dir#glob to simplify your file-finding. You could also consider switching your loops, which would mean opening each C file once, instead of once per H file.

I'd consider going with something like the following, which ran on the Ruby source in 3 seconds:

# collect the File.basename for all h files in tree
hfile_names = Dir.glob("**/*.h").collect{|hfile| File.basename(hfile) }

h_counts = Hash.new(0) # somewhere to store the counts

Dir.glob("**/*.c").each do |cfile| # enumerate the C files
file_text = File.read(cfile) # downcase here if necessary
hfile_names.each do |hfile|
h_counts[hfile] += 1 if file_text.include?(hfile)
end
end

h_counts.each { |file, found| puts "#{file} used #{found} times" }

EDIT: That won't list H files not referenced in any C files. To be certain to catch those, the hash would have to be explicitly initialised:

h_counts = {}
hfile_names.each { |hfile| h_counts[hfile] = 0 }

How to search file text for a pattern and replace it with a given value

Disclaimer: This approach is a naive illustration of Ruby's capabilities, and not a production-grade solution for replacing strings in files. It's prone to various failure scenarios, such as data loss in case of a crash, interrupt, or disk being full. This code is not fit for anything beyond a quick one-off script where all the data is backed up. For that reason, do NOT copy this code into your programs.

Here's a quick short way to do it.

file_names = ['foo.txt', 'bar.txt']

file_names.each do |file_name|
text = File.read(file_name)
new_contents = text.gsub(/search_regexp/, "replacement string")

# To merely print the contents of the file, use:
puts new_contents

# To write changes to the file, use:
File.open(file_name, "w") {|file| file.puts new_contents }
end

Get names of all files from a folder with Ruby

You also have the shortcut option of

Dir["/path/to/search/*"]

and if you want to find all Ruby files in any folder or sub-folder:

Dir["/path/to/search/**/*.rb"]

How to search a folder and all of its subfolders for files of a certain type

You want the Find module. Find.find takes a string containing a path, and will pass the parent path along with the path of each file and sub-directory to an accompanying block. Some example code:

require 'find'

pdf_file_paths = []
Find.find('path/to/search') do |path|
pdf_file_paths << path if path =~ /.*\.pdf$/
end

That will recursively search a path, and store all file names ending in .pdf in an array.

Python unable to find file in folder tree

The solution was:

def run(action):
base_path = os.path.join(os.path.dirname(sys.argv[0]), 'extras/BZ/')

Ruby - Search and collect files in all directorys

Check out the Find module:
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/find/rdoc/Find.html

Using Dir.glob is less than ideal since globbing doesn't handle recursion nearly as well as something like find.

Also if you're on a *nix box try using the find command. Its pretty amazingly useful for one liners.

What's the equivalent of python's __file__ in ruby?

There's nothing exactly equivalent.

All files that have been required are listed in $LOADED_FEATURES in the order they were required. So, if you want to know where a file came from directly after it was required, you simply need to look at the end:

$LOADED_FEATURES.last if require 'yaml'
# => 'C:/Program Files/Ruby/lib/ruby/1.9.1/yaml.rb'

However, unless you record every call to require it's going to be hard to figure out which entry corresponds to which call. Also, if a file is already in $LOADED_FEATURES, it will not get loaded again:

require 'yaml'
# => true
# true means: the file was loaded

$LOADED_FEATURES.last
# => 'C:/Program Files/Ruby/lib/ruby/1.9.1/yaml.rb'

require 'json'
$LOADED_FEATURES.last
# => 'C:/Program Files/Ruby/lib/ruby/1.9.1/json.rb'

require 'yaml'
# => false
# false means: the file wasn't loaded again, because it has already been loaded

$LOADED_FEATURES.last
# => 'C:/Program Files/Ruby/lib/ruby/1.9.1/json.rb'
# Last loaded feature is still JSON, because YAML wasn't actually loaded twice

Also, many libraries aren't contained in a single file. So, the required files might themselves contain calls to require. In my case, for example, require 'yaml' not only loads yaml.rb but a whole bunch of files (15 to be exact):

  1. C:/Program Files/Ruby/lib/ruby/1.9.1/i386-mingw32/stringio.so
  2. C:/Program Files/Ruby/lib/ruby/1.9.1/i386-mingw32/syck.so
  3. C:/Program Files/Ruby/lib/ruby/1.9.1/syck/error.rb
  4. C:/Program Files/Ruby/lib/ruby/1.9.1/syck/basenode.rb
  5. C:/Program Files/Ruby/lib/ruby/1.9.1/syck/syck.rb
  6. C:/Program Files/Ruby/lib/ruby/1.9.1/syck/tag.rb
  7. C:/Program Files/Ruby/lib/ruby/1.9.1/syck/stream.rb
  8. C:/Program Files/Ruby/lib/ruby/1.9.1/syck/constants.rb
  9. C:/Program Files/Ruby/lib/ruby/1.9.1/date/format.rb
  10. C:/Program Files/Ruby/lib/ruby/1.9.1/date.rb
  11. C:/Program Files/Ruby/lib/ruby/1.9.1/syck/rubytypes.rb
  12. C:/Program Files/Ruby/lib/ruby/1.9.1/syck/types.rb
  13. C:/Program Files/Ruby/lib/ruby/1.9.1/yaml/syck.rb
  14. C:/Program Files/Ruby/lib/ruby/1.9.1/syck.rb
  15. C:/Program Files/Ruby/lib/ruby/1.9.1/yaml.rb

How do I find files in a directory that do not have extensions?

Since your question didn't explicitly specify whether you want to search for files without extensions recursively (though in the comments it sounded like you might), or whether you would like to keep files with a leading dot (i.e. hidden files in unix), I'm including options for each scenario.

Visible Files (non-recursive)

Dir['*'].reject { |file| file.include?('.') }

will return an array of all files that do not contain a '.' and therefore only files that do not have extensions.

Hidden Files (non-recursive)

Dir.new('.').entries.reject { |file| %w(. ..).include?(file) or file[1..-1].include?('.') }

This finds all of the files in the current directory and then removes any files with a '.' in any character except the first (i.e. any character from index 1 to the end, a.k.a index -1). Also note that since Dir.new('.').entries contains '.' and '..' those are rejected as well.

Visible Files (recursive)

require 'find'
Find.find('.').reject { |file| File.basename(file).include?('.') }.map { |file| file[2..-1] }

The map on the end of this one is just to remain consistent with the others by removing the leading './'. If you don't care about that, you can remove it.

Hidden Files (recursive)

require 'find'
Find.find('.').reject { |file| File.basename(file)[1..-1].include?('.') }.map { |file| file[2..-1] }

Note: each of the above will also include directories (which are sometimes considered files too, well, in unix at least). To remove them, just add .select { |file| File.file?(file) } to the end of any one of the above.



Related Topics



Leave a reply



Submit