Regex to Match Hashtags in a Sentence Using Ruby

Regex to match hashtags in a sentence using ruby

Can you try this regex:

/(?:^|\s)(?:(?:#\d+?)|(#\w+?))\s/i

UPDATE 1:
There are a few cases where the above regex will not match like: #blah23blah and #23blah23.
Hence modified the regex to take care of all cases.

Regex:

/(?:\s|^)(?:#(?!\d+(?:\s|$)))(\w+)(?=\s|$)/i

Breakdown:

  • (?:\s|^) --Matches the preceding space or start of line. Does not
    capture the match.
  • # --Matches hash but does not capture.
  • (?!\d+(?:\s|$))) --Negative Lookahead to avoid ALL numeric characters
    between # and space (or end of line)
  • (\w+) --Matches and captures all word characters
  • (?=\s|$) --Positive Lookahead to ensure following space or end of
    line. This is required to ensure it matches adjacent valid hash tags.

Sample text modified to capture most cases:

#blah Pack my #box with #5 dozen #good2 #3good liquor.#jugs
link.com/liquor#jugs #mkvef214asdwq sd #3e4 flsd #2good #first#second #3

Matches:

Match 1: blah

Match 2: box

Match 3: good2

Match 4: 3good

Match 5: mkvef214asdwq

Match 6: 3e4

Match 7: 2good

Rubular link

UPDATE 2:

To exclude words starting or ending with underscore, just include your exclusions in the negative lookahead like this:

/(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s|$)))(\w+)(?=\s|$)/i

The sample, regex and matches are recorded in this Rubular link

How to write a hashtag matching regex

does [^#\w](#[\w]*)|^(#[\w]*) works?

getting an # not following a character, and capturing everything until not a word.

the or case handle the case where the first char is #.

Live demo: http://regexr.com/3al01

Regular expression to match a pattern either at the beginning of the line or after a space character

/(?:^|\s)#(\w+)/i

Adding the ?: prefix to the first group will cause it to not be a matching group, thus only the second group will actually be a matchgroup. Thus, each match of the string will have a single capturing group, the contents of which will be the hashtag.

Regular expression to match hashtags in both English and Chinese

string = "我来自#中国#。 I'm from #China."        
string.scan(/#\w+|#\p{Han}+#/)
=> ["#中国#", "#China"]

Regex Expression for Hashtags

Just add the pattern which was present inside the first capturing group that is \w plus - into a character class. So that it would capture a word character or a - symbol. + after the character class makes the previous token to repeat one or more times.

(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s|$)))([-\w]+)(?=\s|$)
|here|

DEMO

Regex for finding Instagram-style hashtag in string - Ruby

If you want to refer to a capture group, you can use \\1 but in the current pattern there is no group.

You can add a capture group in the pattern:

description.gsub(/#(\w+)/, '#[\\1]')

Or use the full match in the replacement:

description.gsub(/#\w+/, '[\\0]')

Regex to match all alphanumeric hashtags, no symbols

A regex such as this: #([A-Za-z0-9]+) should match what you need and place it in a capture group. You can then access this group later. Maybe this will help shed some light on regular expressions (from a Ruby context).

The regex above will start matching when it finds a # tag and will throw any following letters or numbers into a capture group. Once it finds anything which is not a letter or a digit, it will stop the matching. In the end you will end up with a group containing what you are after.

Is there a faster way to parse hashtags than using Regular Expressions?

You don't show any code (which you should have) so we're guessing how you are using your regex.

#\S+ is as good of a pattern as you'll need, but scan is probably the best way to retrieve all occurrences in the string.

'This is a #hashtag, and this is #another one!'.scan(/#\S+/)
=> ["#hashtag,", "#another"]

Its should be /\B#\w+/, if you don't want to parse commas

Yes, I agree. /\B#\w+/ makes more sense.

Extract Hashtags from String

str = "Visualize how you can e-enable #vertical #architectures?..."
str.scan(/#\w+/).flatten #=> ["#vertical", "#architectures", "#convergence",...]

Ruby - trying to make hashtags within string into links

Here is the code to get you started. This should replace (in-place) the hashtags in the string with the links:

<% string.gsub!(/#\w+/) do |tag| %>
<% link_to("##{tag}", url_you_want_to_replace_hashtag_with) %>
<% end %>

You may need to use html_safe on the string to display it afterwards.

The regex doesn't account for more complex cases, like what do you do in case of ##tag0 or #tag1#tag2. Should tag0 and tag2 be considered hashtags? Also, you may want to change \w to something like [a-zA-Z0-9] if you want to limit the tags to alphanumerics and digits only.



Related Topics



Leave a reply



Submit