Finding @Mentions in String

Finding @mentions in string

Replace the question mark (?) quantifier ("optional") and add in a + ("one or more") after your character class:

@([^@ ]+)

How to list all mentions in a String

Use this Regex. It finds groups of word characters (letters, digits, underscores) that follow an @.

val atMentions: List<String> = "(?<=@)\\w+".toRegex().findAll(editText.text).map { it.value }

If you need to define a different set of word characters, replace the \\w above with [\\w] and put the other acceptable characters right after the w.

how to pull @ mentions out of strings like twitter in javascript

I have found that this is the best way to find mentions inside of a string in javascript.

var str = "@jpotts18 what is up man? Are you hanging out with @kyle_clegg";
var pattern = /\B@[a-z0-9_-]+/gi;
str.match(pattern);
["@jpotts18", "@kyle_clegg"]

I have purposefully restricted it to upper and lowercase alpha numeric and (-,_) symbols in order to avoid periods that could be confused for usernames like (@j.potts).

This is what twitter-text.js is doing behind the scenes.

// Mention related regex collection
twttr.txt.regexen.validMentionPrecedingChars = /(?:^|[^a-zA-Z0-9_!#$%&*@@]|RT:?)/;
twttr.txt.regexen.atSigns = /[@@]/;
twttr.txt.regexen.validMentionOrList = regexSupplant(
'(#{validMentionPrecedingChars})' + // $1: Preceding character
'(#{atSigns})' + // $2: At mark
'([a-zA-Z0-9_]{1,20})' + // $3: Screen name
'(\/[a-zA-Z][a-zA-Z0-9_\-]{0,24})?' // $4: List (optional)
, 'g');
twttr.txt.regexen.endMentionMatch = regexSupplant(/^(?:#{atSigns}|[#{latinAccentChars}]|:\/\/)/);

Please let me know if you have used anything that is more efficient, or accurate. Thanks!

How @mention works, how can I find mention during comment in .Net

Looks like a good fit for regular expressions. There are multiple ways to solve this.

Here's the simplest one:

 (?<mention>@[a-zA-Z0-9_.]+)[^a-zA-Z0-9_.]
  • it searches matching characters followed by non-matching character. [^ ... ] does the negation bit
  • (?<mention> ... ) declares an explictit group to capture mention without including the non-matching character immediately following the mention.
  • not that this pattern requires a non-matching character after mention, so if it matters work around that.

A cleaner pattern would use a feature called look-ahead:

@[a-zA-Z0-9_.]+?(?![a-zA-Z0-9_.])
  • (?!) is negative lookahead. Meaning "only match if it is NOT followed by this"
  • named capture not required as lookahead does not consume the lookahead part.
  • It supports multiple mention lookups by adding using non-greedy quantifier +?. This ensures that matched mention is as short as possible.

Lookaheads are a tad less known and may become a pain to read if pattern grows too long. But it is a useful tool to know.

Full example using C#:

string comment = "hi @fri.tara3^ @hjh not a mention @someone";
const String pattern = "@[a-zA-Z0-9_.]+?(?![a-zA-Z0-9_.])";
var matches = Regex.Matches(comment, pattern);

for (int i = 0; i < matches.Count; i++)
{
Console.WriteLine(matches[i].Value);
}

Find all valid user mentions in text with regex

You may consider a good-enough pattern like

r'\B@(?!(?:[a-z0-9.]*_){2})(?!(?:[a-z0-9_]*\.){2})[._a-z0-9]{3,24}\b'

See the regex demo. The only drawback of the pattern is that if the valid mention can end with ., it will match up to that . (see demo).

Details

  • \B@ - a @ not preceded with a word char
  • (?!(?:[a-z0-9.]*_){2}) - no two _ chars anywhere after @
  • (?!(?:[a-z0-9_]*\.){2}) - no two . chars anywhere after @
  • [._a-z0-9]{3,24} - three to twenty-four letters, digits, . and _
  • \b - word boundary

Note you may actually use some Python code to filter your results obtained with \B(@[a-z_.]{3,24}):

import re
s = 'text @valid_username text @unvalid_username_ text @valid.username text @unvalid..username @validusername.'
print([x for x in re.findall(r'\B@[._a-z0-9]{3,24}', s) if x.count('.') < 2 and x.count('_') < 2 ])
# => ['@valid_username', '@valid.username', '@validusername.']

PHP regex on mention (@name)

You can use this regex (\@(?P<name>[a-zA-Z\-\_]+)) :

<?php
$matches = [];
$text = "I recently saw @john-doe riding a bike, did you noticed that too @foo-bar?";
preg_match_all ("(\@(?P<names>[a-zA-Z\-\_]+))" ,$text, $matches);
var_dump($matches['names']);

In this example, I used the ?P<names> to name the capture groups, it's easier to get it.

I've made a Regex101 for you, and a PHP sandbox for test

https://regex101.com/r/ZFWvCG/1

http://sandbox.onlinephpfunctions.com/code/1d04ce64a2a290994bf0effd7cf8f0039f20277b

Regex Valid Twitter Mention

Here's a regex that should work:

/^(?!.*\bRT\b)(?:.+\s)?@\w+/i

Explanation:

/^             //start of the string
(?!.*\bRT\b) //Verify that rt is not in the string.
(?:.*\s)? //Find optional chars and whitespace the
//Note: (?: ) makes the group non-capturing.
@\w+ //Find @ followed by one or more word chars.
/i //Make it case insensitive.

regex for mentions

You may use the following regex:

/\B@\w+/g

\B matches at a non-word boundary, thus, it requires a non-word (or start of string) to be right before @.

See the regex demo

var re = /\B@\w+/g; var str = 'The @dog went to the park.\nBut not here: The d@og went to the park.\nOr here: The@dog went to the park.';var res = str.match(re);document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";

How to fix this regex for mentions and hashtags?

Try this pattern:

(?:^|\s+)(?:(?<mention>@)|(?<hash>#))(?<item>\w+)(?=\s+)

Here it is broken down:

  • (?: creates a non-capturing group
  • ^|\s+ matches the beginning of the String or Whitespace
  • (?: creates a non-capturing group
  • (?<mention>@|(?<hash>#) creates a group to match @ or # and respectively named the groups mention and hash
  • (?<item>\w+) matches any alphanumeric character one or more times and helps pull the item from the group for easy usage.
  • (?=\s+) creates a positive look ahead to match any white-space

Fiddle: Live Demo

You would then need to use the underlying language to trim the returning match to remove any leading/trailing whitespace.

Update
Since you mentioned that you were using C#, I thought that I'd provide you with a .NET solution to solve your problem that does not require RegEx; while I did not test the results, I would guess that this would also be faster than using RegEx too.

Personally, my flavor of .NET is Visual Basic, so I'm providing you with a VB.NET solution, but you can just as easily run it through a converter since I never use anything that can't be used in C#:

Private Function FindTags(ByVal lead As Char, ByVal source As String) As String()
Dim matches As List(Of String) = New List(Of String)
Dim current_index As Integer = 0

'Loop through all but the last character in the source
For index As Integer = 0 To source.Length - 2
'Reset the current index
current_index = index

'Check if the current character is a "@" or "#" and either we're starting at the beginning of the String or the last character was whitespace and then if the next character is a letter, digit, or end of the String
If source(index) = lead AndAlso (index = 0 OrElse Char.IsWhiteSpace(source, index - 1)) AndAlso (Char.IsLetterOrDigit(source, index + 1) OrElse index + 1 = source.Length - 1) Then
'Loop until the next character is no longer a letter or digit
Do
current_index += 1
Loop While current_index + 1 < source.Length AndAlso Char.IsLetterOrDigit(source, current_index + 1)

'Check if we're at the end of the line or the next character is whitespace
If current_index = source.Length - 1 OrElse Char.IsWhiteSpace(source, current_index + 1) Then
'Add the match to the collection
matches.Add(source.Substring(index, current_index + 1 - index))
End If
End If
Next

Return matches.ToArray()
End Function

Fiddle: Live Demo



Related Topics



Leave a reply



Submit