Regex to match hashtags in a sentence using ruby
Can you try this regex:
/(?:^|\s)(?:(?:#\d+?)|(#\w+?))\s/i
UPDATE 1:
There are a few cases where the above regex will not match like: #blah23blah and #23blah23.
Hence modified the regex to take care of all cases.
Regex:
/(?:\s|^)(?:#(?!\d+(?:\s|$)))(\w+)(?=\s|$)/i
Breakdown:
(?:\s|^)
--Matches the preceding space or start of line. Does not
capture the match.#
--Matches hash but does not capture.(?!\d+(?:\s|$)))
--Negative Lookahead to avoid ALL numeric characters
between # and space (or end of line)(\w+)
--Matches and captures all word characters(?=\s|$)
--Positive Lookahead to ensure following space or end of
line. This is required to ensure it matches adjacent valid hash tags.
Sample text modified to capture most cases:
#blah Pack my #box with #5 dozen #good2 #3good liquor.#jugs
link.com/liquor#jugs #mkvef214asdwq sd #3e4 flsd #2good #first#second #3
Matches:
Match 1: blah
Match 2: box
Match 3: good2
Match 4: 3good
Match 5: mkvef214asdwq
Match 6: 3e4
Match 7: 2good
Rubular link
UPDATE 2:
To exclude words starting or ending with underscore, just include your exclusions in the negative lookahead like this:
/(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s|$)))(\w+)(?=\s|$)/i
The sample, regex and matches are recorded in this Rubular link
How to write a hashtag matching regex
does [^#\w](#[\w]*)|^(#[\w]*)
works?
getting an # not following a character, and capturing everything until not a word.
the or case handle the case where the first char is #
.
Live demo: http://regexr.com/3al01
Regular expression to match a pattern either at the beginning of the line or after a space character
/(?:^|\s)#(\w+)/i
Adding the ?:
prefix to the first group will cause it to not be a matching group, thus only the second group will actually be a matchgroup. Thus, each match of the string will have a single capturing group, the contents of which will be the hashtag.
Regular expression to match hashtags in both English and Chinese
string = "我来自#中国#。 I'm from #China."
string.scan(/#\w+|#\p{Han}+#/)
=> ["#中国#", "#China"]
Regex Expression for Hashtags
Just add the pattern which was present inside the first capturing group that is \w
plus -
into a character class. So that it would capture a word character or a -
symbol. +
after the character class makes the previous token to repeat one or more times.
(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s|$)))([-\w]+)(?=\s|$)
|here|
DEMO
Regex for finding Instagram-style hashtag in string - Ruby
If you want to refer to a capture group, you can use \\1
but in the current pattern there is no group.
You can add a capture group in the pattern:
description.gsub(/#(\w+)/, '#[\\1]')
Or use the full match in the replacement:
description.gsub(/#\w+/, '[\\0]')
Regex to match all alphanumeric hashtags, no symbols
A regex such as this: #([A-Za-z0-9]+)
should match what you need and place it in a capture group. You can then access this group later. Maybe this will help shed some light on regular expressions (from a Ruby context).
The regex above will start matching when it finds a #
tag and will throw any following letters or numbers into a capture group. Once it finds anything which is not a letter or a digit, it will stop the matching. In the end you will end up with a group containing what you are after.
Is there a faster way to parse hashtags than using Regular Expressions?
You don't show any code (which you should have) so we're guessing how you are using your regex.
#\S+
is as good of a pattern as you'll need, but scan
is probably the best way to retrieve all occurrences in the string.
'This is a #hashtag, and this is #another one!'.scan(/#\S+/)
=> ["#hashtag,", "#another"]
Its should be /\B#\w+/, if you don't want to parse commas
Yes, I agree. /\B#\w+/
makes more sense.
Extract Hashtags from String
str = "Visualize how you can e-enable #vertical #architectures?..."
str.scan(/#\w+/).flatten #=> ["#vertical", "#architectures", "#convergence",...]
Ruby - trying to make hashtags within string into links
Here is the code to get you started. This should replace (in-place) the hashtags in the string
with the links:
<% string.gsub!(/#\w+/) do |tag| %>
<% link_to("##{tag}", url_you_want_to_replace_hashtag_with) %>
<% end %>
You may need to use html_safe
on the string to display it afterwards.
The regex doesn't account for more complex cases, like what do you do in case of ##tag0
or #tag1#tag2
. Should tag0
and tag2
be considered hashtags? Also, you may want to change \w
to something like [a-zA-Z0-9]
if you want to limit the tags to alphanumerics and digits only.
Related Topics
Rails Way to Detect Mobile Device
(Ruby) How to Check Whether a Range Contains a Subset of Another Range
Implementing Bayesian Classifier in Ruby
Ruby Mechanize Post with Header
Why Does Rails Titlecase Add a Space to a Name
Ruby Net::Smtp - Send Email with Bcc: Recipients
What Are the Limitations of Opal
Convert HTML to Plain Text (With Inclusion of <Br>S)
Override the Protect_From_Forgery Strategy in a Controller
How to Integrate Rubocop with Rake
Ruby Amazon S3 Access Denied When Listing Buckets
Can't Install Rmagick Gem on Ubuntu 13.04
Retrieving Image Height with Carrierwave
Is There a Ruby Http Client Library with a Response Cache
Discriminate First and Last Element in Each
Wicked-Pdf Not Showing Images, 'Wicked_Pdf_Image_Tag' Undefined