How to Do a Safe Join Pathname in Ruby

How to do a safe join pathname in ruby?

I recommend using File.join

>> File.join("path", "to", "join")
=> "path/to/join"

Implement #absolute_path for Pathname child

Are you trying to get a faux-realpath for files that don't exist? If so, you might be better served by using .join and handing in the necessary components:

Pathname.new(Dir.pwd).join("some/nonexistent/path")

If you have a file that exists, but your path string is a fragment and you need to provide another base directory to realpath, you can do that too:

path = Pathname.new('/path/and/file.jpg')
path.realpath('some/existing')
#=> '/some/existing/path/and/file.jpg'

An example implementation might be…

class FileResolver
BASE_DIR = Dir.pwd.freeze

def initialize(filepath)
@filepath = filepath
end

def absolute_path
Pathname.new(BASE_DIR).join(@filepath)
end
end

Or, if your files aren't relative to this class but instead your Rails project, a better way is to use Rails.root.join:

Rails.root.join('your/path', 'some/filename.txt')

How to make a Ruby string safe for a filesystem?

From http://web.archive.org/web/20110529023841/http://devblog.muziboo.com/2008/06/17/attachment-fu-sanitize-filename-regex-and-unicode-gotcha/:

def sanitize_filename(filename)
returning filename.strip do |name|
# NOTE: File.basename doesn't work right with Windows paths on Unix
# get only the filename, not the whole path
name.gsub!(/^.*(\\|\/)/, '')

# Strip out the non-ascii character
name.gsub!(/[^0-9A-Za-z.\-]/, '_')
end
end

Ruby's Dir vs File vs Pathname?

According to the Ruby docs for Dir, File, and Pathname, they definitely appear to have a lot in common.

The principle different between Dir and File seems to be that Dir assumes the object it's working with is a directory and File assumes files. For most purposes they can apparently be used interchangeably, but even if the code works, it might be confusing to anyone reading your code if you manipulate directories using File and files using Dir.

Pathname looks to be a multi-OS method of locating files and directories. Since Windows and *nix machines handle file management differently it can be a pain to refer to files or directories in an OS-specific way if you want scripts to run anywhere. From the docs:

Pathname represents a pathname which locates a file in a filesystem. The pathname depends on OS: Unix, Windows, etc. Pathname library works with pathnames of local OS. However non-Unix pathnames are supported experimentally.

It does not represent the file itself. A Pathname can be relative or absolute. It’s not until you try to reference the file that it even matters whether the file exists or not.

Pathname is immutable. It has no method for destructive update.

Hope this helps.

How to split a directory string in Ruby?

There's no built-in function to split a path into its component directories like there is to join them, but you can try to fake it in a cross-platform way:

directory_string.split(File::SEPARATOR)

This works with relative paths and on non-Unix platforms, but for a path that starts with "/" as the root directory, then you'll get an empty string as your first element in the array, and we'd want "/" instead.

directory_string.split(File::SEPARATOR).map {|x| x=="" ? File::SEPARATOR : x}

If you want just the directories without the root directory like you mentioned above, then you can change it to select from the first element on.

directory_string.split(File::SEPARATOR).map {|x| x=="" ? File::SEPARATOR : x}[1..-1]

Escape spaces in a linux pathname with Ruby gsub

Stefan is right; I just want to point out that if you have to escape strings for shell use you should check Shellwords::shellescape:

require 'shellwords'

puts Shellwords.shellescape "/mnt/drive/site/usa/1201 East/1201 East Invoice.pdf"
# prints /mnt/drive/site/usa/1201\ East/1201\ East\ Invoice.pdf

# or

puts "/mnt/drive/site/usa/1201 East/1201 East Invoice.pdf".shellescape
# prints /mnt/drive/site/usa/1201\ East/1201\ East\ Invoice.pdf

# or (as reported by @hagello)
puts shellwords.escape "/mnt/drive/site/usa/1201 East/1201 East Invoice.pdf"
# prints /mnt/drive/site/usa/1201\ East/1201\ East\ Invoice.pdf

Extract filename with and without terminating characters

Do it in three stages.

  1. Split on ; to separate out the statements.
  2. Split the key/value pair on =.
  3. Deal with the quoting of the value.

Here's a basic example.

def get_value(line)
# Split into statements
statements = line.split(/\s*;\s*/)

# Extract the value of the 2nd statement
_,value = statements[1].split(/\s*=\s*/)

# Strip the quotes
value.gsub!(/^(['"]?)(.*)\1$/, '\2')

return value
end

There's a few edge cases that doesn't handle: What if the statement you're interested in isn't the second one? But that can be fixed up as needed. It's a lot easier to improve your parsing when it's done in multiple steps rather than trying to cram it into one regex.

For example, this correctly handles embedded and escaped quotes like %q[inline; filename="name's.extension"] and %q[inline; filename="name's.\\"extension\\""].


If you really want to do it as a single regex, ok, you asked for it.

re = /
\bfilename
\s*=\s*
(?:
(?<quote>['"])(?<value>.*)\k<quote> |
(?<value>[^;]+)
)
/x
return re.match(line)['value']

That splits the handling of the extension into two alternatives: one with quotes and one without. Otherwise filename=name.ext; will pick up the semicolon and I can't figure out another way to stop it that doesn't introduce a new problem.

For example, /\bfilename\s*=\s*(?<quote>['"]?)(?<value>.*?)\k<quote>;?$/ will work on the test data, but then it will fail if there's anything after the semicolon like %q[inline; filename='name.extension'; foo].

You asked for expert regex knowledge. Part of being a regex expert is to know when you shouldn't use a regex. This should probably be handled with a grammar or you'll be constantly chasing edge cases.



Related Topics



Leave a reply



Submit