How Does This Giant Regex Work

How does this giant regex work?

This is not entirely a regular expression. The regex is /.*/, which basically means "match everything". The /e Modifier however eval()'s the code in the next parameter. In fact this is a way for someone to hide code. The following proof that this is a backdoor, and you must remove it immediately. Your system maybe compromised further.

This is what the backdoor looks like when it is accessed:

Server security information

the hex part of the code:

\x65\x76\x61\x6C\x28\x67\x7A\x69\x6E\x66\x6C\x61\x74\x65\x28\x62\x61\x73\x65\x36\x34\x5F\x64\x65\x63\x6F\x64\x65\x28

is acutally:
eval(gzinflate(base64_decode(

This is the code will print out the source code for this backdoor. However i would not execute the resulting PHP code, unless it is on a disposable virtual machine.

<?php
print gzinflate(base64_decode(""));
?>

How to lock down your server:

There are a number of ways this could have gotten on your site. Most likely you have been hacked using Exploit Code because one of your web applications is out of date. Try updating everything, including libraries. Change passwords for everything, especially FTP, although should be using sftp or ftps.

If you control your MySQL server make sure your web application's MySQL user account is not root, and make sure you remove MySQL FILE privileges from the account. You should also go a step further and do a chmod 500 -R /path/to/web/root and do a chown www-data -R /path/to/web/root www-data is a common user for apache, but it might be differnt for your system try running <?php system("whoami")?> to figure out the user account.

Next run phpsecinfo. Modify your php.ini or .htaccess and remove all RED, and try and remove as much yellow as possible.

Is a single big regex more efficient than a bunch of smaller ones?

It depends what kind of big regex you write. If you end with a pathological pattern it's better to test smaller patterns. Example:

UK[A-Za-z]{10}|DE[A-Za-z]{20}|PL[A-Za-z]{7}

this pattern is very inefficient because it starts with an alternation, this means that in the worst case (no match) each alternative needs to be tested for all positions in the string.

(* Note that a regex engine like PCRE is able to quickly find potentially matching positions when each branch of an alternation starts with literal characters.)

But if you write your pattern like this:

(?=[UDP][KEL])(?:UK[A-Za-z]{10}|DE[A-Za-z]{20}|PL[A-Za-z]{7})

or the variation:

[UDP][KEL](?:(?<=UK)[A-Za-z]{10}|(?<=DE)[A-Za-z]{20}|(?<=PL)[A-Za-z]{7})

Most of the positions where the match isn't possible are quickly discarded before the alternation.

Also, when you write a single pattern, obviously, the string is parsed only once.

Regex for huge string

You could use the following regex.

-\s*(\S+)\s*\(\s*=\s*(\S+)\s*\)

Your key match results will be in capturing group #1 and the value match results will be in group #2

Rubular

what does this regex mean [^]*

Match 0 or more characters, which are not >. The ^ as the first item in character class [] will make the character class a negated character class, and any character after ^ are excluded.

this regex is true but does not work in localhost

You need to use a single quoted literal to make \1 be treated as a backreference (otherwise, you need to double escape it).

Also, to match either 1+ whitespaces or a ~ you need to use grouping with parentheses, not a character class. Note that [\s+|\~] matches 1 character: a whitespace, a +, | or a ~, and I doubt you actually want that behavior.

Use

$s = "{block content\nddggggggggggggggg\n/endcontent}";
$pt='~\{\s*block\s*-?\s*(\w+)(\s+|\~)(.*)\/end\1}~s';
preg_match($pt, $s, $match1);
print_r($match1);

See the IDEONE demo

How can convert this regex to R regex and regex R usage

You can use

x <- ">XP_002499978.1 predicted protein [Micromonas commoda]"
x <- gsub("(?:\\G(?!^)|\\[)[^][\\s]*\\K\\s+", "_", x, perl=TRUE) # Replace spaces in [ ]
## [1] ">XP_002499978.1 predicted protein [Micromonas_commoda]"
sub("^>(\\S+).*\\[([^][]*)].*", ">\\2_\\1", x) # Reformat the string
## => [1] ">Micromonas_commoda_XP_002499978.1"

There are two actions here:

  • gsub("(?:\\G(?!^)|\\[)[^][\\s]*\\K\\s+", "_", x, perl=TRUE) replaces all whitespace chunks inside square brackets with a single _
  • sub("^>(\\S+).*\\[([^][]*)].*", ">\\2_\\1", x) reformats the string.

See the regex #1 demo and the regex #2 demo.

Regex for this pattern: @ or # + 1 or 2 words + : + 1 words or more + link does not work

Give this a try

[@|#]((?:\w+\s?){1,2}):\s?((?:\w+\s?){1,})((?:http|https):\/\/.+)

test

@hello sss: xxx https://t.co/3WHshzDG7m
#hello sss: xxx another word https://t.co/3WHshzDG7m
#hello sss third: xxx another word https://t.co/3WHshzDG7m

Result

  1. MATCH 1

    1. [1-10] hello sss
    2. [12-16] xxx
    3. [16-39] https://t.co/3WHshzDG7m
  2. MATCH 2

    1. [41-50] hello sss
    2. [52-69] xxx another word
    3. [69-92] https://t.co/3WHshzDG7m

Online demo https://regex101.com/r/zU7lP2/1

Another version if you do not want to fix the link protocol

[@|#]((?:\w+\s?){1,2}):\s?((?:\w+\s?){1,})((?<=\s)\w+:\/\/.+)

Online demo https://regex101.com/r/zU7lP2/2

Match multiple instances of mysteriously ordered complex regular expression non-capturing groups

You shouldn't give the regex engine the option to match nothing.

It will roam around finding a lot of 'nothing' before it finds an optional something.

edit

If you just want a block match (any order, but sequential) something like this will work.

Your way now, with modifications:

(?:
(?: Section ... (?<sec_7> 7)
)
| (?: Section ... (?<sec_C> C)
)?
| (?: Section ... (?<sec_Z> Z)
)
)
(?: Section ... (?!\k<sec_7>) (?<sec_7> 7) )?
(?: Section ... (?!\k<sec_C>) (?<sec_C> C) )?
(?: Section ... (?!\k<sec_Z>) (?<sec_Z> Z) )?

If it can be factored, then this way:

(?: Section ...  (?<sec_a>(?:7|C|Z) )
(?: Section ... (?<sec_b>(?!\k<sec_a>)(?:7|C|Z) )?
(?: Section ... (?<sec_c>(?!\k<sec_a>|\k<sec_b>)(?:7|C|Z) )?
#
# Then after match check <sec_a/b/c> for its value

If you don't care about a block match:
Your case revolves around just an OR condition.
So, it could be as easy as this:

# base 10 statistic sections
(?: ..)
|
(?: ..)
|
(?: ..)

Where each match in 'base 10' section match has to be checked in a while loop as

Match m = Regex.Match(input, regex, RegexOptions.IgnorePatternWhitespace);
while (m.Success)
{
if (m.Groups["base10_Section7"].Success) { }
else
if (m.Groups["base10_SectionZ"].Success) { }
else
if (m.Groups["base10_SectionC"].Success) { }
m = m.NextMatch();
}

Even this can be reduced. For instance 7,Z,C can be combined in a single chunk.

This will leave the OR (|) for other distinct items to match, like say, 'base 2',

or any other form. One form will match. It has to be checked anyway.



Related Topics



Leave a reply



Submit