Difference Between \A \Z and ^ $ in Ruby Regular Expressions

Difference between \A \z and ^ $ in Ruby regular expressions

If you're depending on the regular expression for validation, you always want to use \A and \z. ^ and $ will only match up until a newline character, which means they could use an email like me@example.com\n<script>dangerous_stuff();</script> and still have it validate, since the regex only sees everything before the \n.

My recommendation would just be completely stripping new lines from a username or email beforehand, since there's pretty much no legitimate reason for one. Then you can safely use EITHER \A \z or ^ $.

Difference between ^ , $ and \A , \Z in ruby regex

You can have a multi-line string where \A and \Z become significant:

s = "this\ntest"
# => "this\ntest"

s.match(/^this$/)
# => <MatchData "this">

s.match(/\Athis\Z/)
# => nil

There are cases when validating user data that \A and \Z are imperative. For example:

if (site.match(%r[^http://sitename.com/$]))
# ...
end

In this case an attack could be constructed around supplying "http://sitename.com/\nhttp://evil.com/" as the site string.

Why do Ruby's regular expressions use \A and \z instead of ^ and $?

This isn't specific to Ruby; \A and \Z are not the same thing as ^ and $. ^ and $ are the start and end of line anchors, whereas \A and \Z are the start and end of string anchors.

Ruby differs from other languages in that it automatically uses "multiline mode" (which enables the aforementioned behaviour of having ^ and $ match per line) for regular expressions, but in most other flavours you need to enable it yourself, which is probably why that article contains the warning.

Reference: http://www.regular-expressions.info/anchors.html

What is the difference between these three alternative ways to write Ruby regular expressions?

The first snippet is the only correct one.

The second example is... misleading. That string literal "/\A\/\z/" is, obviously, not a regex. It's a string. Strings have #match method which converts its argument to a regexp (if not already one) and match against it. So, in this example, it's '/' that is the regular expression, and it matches a forward slash found in the other string.

The third line is completely broken: don't need the surrounding slashes there, they are part of regex literal, which you didn't use. Also use single quoted strings, not double quoted (which try to interpret escape sequences like \A)

Regexp.new('\A/\z').match("/") # => #<MatchData "/">

And, of course, none of the above is needed if you just want to check if a string consists of only one forward slash. Just use the equality check in this case.

s == '/'

Ruby Regexp: difference between new and union with a single regexp

Passing a string to Regexp.union is designed to match that string literally. There is no need to escape it, Regexp.escape is already called internally.

Regexp.union(".")
#=> /\./

If you want to pass regular expressions to Regexp.union, don't use strings:

Regexp.union(Regexp.new("\\."))
#=> /\./

Ruby regex to allow A-Za-z0-9

You could add the a-zA-Z in a character class, and in the repetition of 0+ times match either a hyphen or an underscore [-_] followed by 1+ times what is listed in the character class [A-Za-z0-9]+.

Use a capturing group with a backreference to get a consistent using of - or _

\A[A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*(?:([-_])[A-Za-z0-9]+(?:\1[A-Za-z0-9]+)*)?\z

About the pattern

  • \A Start of string
  • [A-Za-z0-9]*[A-Za-z][A-Za-z0-9]* Match at least 1 a-zA-Z
  • (?: Non capturing group

    • ([-_]) Capturing group 1, match either - or _
    • [A-Za-z0-9]+ Match 1+ times what is listed
    • (?:

      • \1[A-Za-z0-9]+ Backreference \1 to what is captured in group 1 to get consistent delimiters (to prevent matching a-b_c) and match 1+ times what is listed
    • )*Close non capturing group and make it optional
  • )? Close non capturing group and make it optional
  • \z End of string

Regex demo

See this page for a detailed explanation about the anchors.

Regular Expression of only spaces, letters, and numbers no special characters

You almost got it right. You could use this:

/\A[a-z0-9\s]+\Z/i

\s matches whitespace characters including tab. You could use (space) within square brackets if you need exact match for space.

/i at the end means match is not case sensitive.

Take a look at Rubular for testing your regexes.

EDIT: As pointed out by Jesus Castello, for some scenarios one should use \A and \Z instead of ^ and $ to denote string boundaries. See Difference between \A \Z and ^ $ in Ruby regular expressions for the explanation.

Are Ruby 1.9 regular expressions equally powerful to a context free grammar?

This is one of the awesome things about the Oniguruma regexp engine used in Ruby 1.9 – it has the power of a parser, and is not restricted to recognizing regular languages. It has positive and negative lookahead/lookbehind, which even can be used to recognize some languages which are not context-free! Take the following as an example:

regexp = /\A(?<AB>a\g<AB>b|){0}(?=\g<AB>c)a*(?<BC>b\g<BC>c|){1}\Z/

This regexp recognizes strings like “abc”, “aabbcc”, “aaabbbccc”, and so on – the number of “a”, “b”, and “c” must be equal, or it will not match.

(One limitation: you can’t use named groups in the lookahead and lookbehind.)

Although I haven’t peeked under the hood, Oniguruma seems to deal with named groups by simple recursive descent, backing up when something doesn’t match. I’ve observed that it can’t deal with left recursion. For example:

irb(main):013:0> regexp = /(?<A>\g<A>a|)/
SyntaxError: (irb):13: never ending recursion: /(?<A>\g<A>a|)/
from C:/Ruby192/bin/irb:12:in `<main>'

I don’t remember my parsing theory very clearly, but I think that a non-deterministic top-down parser like this should be able to parse any context-free language. (“language”, not “grammar”; if your grammar has left recursion, you will have to convert it to right recursion.) If that is incorrect, please edit this post.

Testing for word characters in Ruby/Rails regular expressions for all languages

Yes. Definitely on the right track with :alpha: Here's a locale aware example from (https://stackoverflow.com/a/3879835/499581):

/\A[[:alpha:]]+\Z/

also for certain punctuation consider using:

/[[:punct:]]/

more here.



Related Topics



Leave a reply



Submit