Check If String Is Repetition of an Unknown Substring

Check if string is repetition of an unknown substring

I knew it couldn't be that complicated, so I thought it over and found a solution:

def unrepeat(str)
n = str.size

newstr = str
n.times do |i|
newstr = newstr[-1] + newstr[0..-2]
if newstr == str
return i + 1
end
end
end

This will return the length of the repeated pattern. It finds this by generating rotations of the string.

Check if string is repetition of an unknown substring in javascript

I currently have no explanation for why it returns e, but . matches any character and .{2,} basically just means "match any two or more characters".

What you want is to match whatever you captured in the capture group, by using backreferences:

/^(.+)\1+$/m

I just noticed that this is also what the answer you linked to suggests to use: /(.+)\1+/. The expression is exactly the same, there is nothing you have to change for JavaScript.

Regex to match repeated (unknown) substrings

This regex will do it:

\b[a-z]*?([a-z]{2,}?)\1+[a-z]*?\b

Usage:

self.regex_pattern = re.compile(r'\b[a-z]*?([a-z]{2,}?)\1+[a-z]*?\b', re.IGNORECASE)

Here's a working demo.

The gist is similar to what you were doing, but the "core" is different. The heart of the regex is this piece:

([a-z]{2,}?)\1+

The logic is to find a group consisting of 2 or more letters, then match the same group (\1) one or more additional times.

How To replace an partial unknown string

Take a look at regular expressions.

The following will meet this case:

var result = Regex.Replace(originalString, @"O\(.*?\)", "");

What it means:

  • @ - switch off C# interpreting \ as escape, because otherwise the compiler will see our \( and try to replace it with another char like it does for \n becoming a newline (and there is no \( so it's a compiler error). Regex also uses \ as an escape char, so without the @ to get a slash into the string for regex to interpret as a slash to perform a regex escape, it needs a double C# slash, and that can make regex patterns more confusing
  • " start of c# string
  • O\( literal character O followed by literal character ( - brackets have special meaning in regex, so backslash disables special meaning)
  • .*? match zero or more of any character (lazy/pessimistic)
  • \) literal )
  • " end of string

.*? is a complex thing warrants a bit more explanation:

In regex . means "match any single character", and * means "zero or more of the previous character". In this way .* means "zero or more of any character".

So what's the ? for?

By default regex * is "greedy" - a .* with eat the entire input string and then start working backwards, spitting characters back out, and checking for a match. If you had 2 in succession like you put:

K(hello);O(mystring);O(otherstring);L(byebye)

And you match it greedily, then O\(.*\) will match the initial O(, then consume all the input, then spit one trailing ) back out and declare it's found a match, so the .* matches mystring);O(otherstring;L(byebye

We don't want this. Instead we want it to work forwards a character at a time, looking for a matching ). Putting the ? after the * changes from greedy mode to pessimistic(/lazy) mode, and the input is scanned forwards rather than zipping to the end and scanning backwards. This means that O\(.*?) matches mystring and then later otherstring, leaving a result of K(hello);;;L(byebye), rather than K(hello);

Extract the repetitive parts of a String by Regex pattern matching in Scala

You can't use repeated capturing group like that, it only saves the last captured value as the current group value.

You can still get the matches you need with a \b[a-zA-Z]+(?::[a-zA-Z]+)*\b regex and then split each match with ::

val text = "it:is:very:great just:because:it is"
val regex = """\b[a-zA-Z]+(?::[a-zA-Z]+)*\b""".r
val results = regex.findAllIn(text).map(_ split ':').toList
results.foreach { x => println(x.mkString(", ")) }
// => it, is, very, great
// just, because, it
// is

See the Scala demo. Regex details:

  • \b - word boundary
  • [a-zA-Z]+ - one or more ASCII letters
  • (?::[a-zA-Z]+)* - zero or more repetitions of
    • : - a colon
    • [a-zA-Z]+ - one or more ASCII letters
  • \b - word boundary

Find unknown permutation from string pairs

You need to know the basic concepts of graph theory and matching.

Say each position of before is a left node and each position of after is a right node.
For left position i and right position j, connect an edge from left node i to right node j, if and only if x[i] equals to y[j] in all pairs x -> y.
Then the problem becomes finding a perfect matching of this bipartite graph, which is a solved problem.

"most likely" permutation would be much harder and it requires the exact definition of "most likely". Would you like to satisfy as many pairs as possible? or more matched characters are preferred?

Groovy Error while trying to split a String

You should be using Groovy's native regexp syntax:

def res = '{RANDOM:4{LETTER:5}}'.split( /[\{\}]/ )
assert ['', 'RANDOM:4', 'LETTER:5'] == res

Also, I don't think that split() is what you really need. Based on your data you'd rather want:

String txt = '{RANDOM:4{LETTER:5}}'
def res = [:]
txt.eachMatch( /[\{\}]?([A-Z]+):(\d+)[\{\}]?/ ){ res[ it[ 1 ] ] = it[ 2 ].toInteger() }
assert [RANDOM:4, LETTER:5] == res

Regex to replace a repeating string pattern

I think you want this (works for any length of the repeated string):

String result = source.replaceAll("(.+)\\1+", "$1")

Or alternatively, to prioritize shorter matches:

String result = source.replaceAll("(.+?)\\1+", "$1")

It matches first a group of letters, and then it again (using back-reference within the match pattern itself). I tried it and it seems to do the trick.


Example

String source = "HEY HEY duuuuuuude what'''s up? Trololololo yeye .0.0.0";

System.out.println(source.replaceAll("(.+?)\\1+", "$1"));

// HEY dude what's up? Trolo ye .0


Related Topics



Leave a reply



Submit