Regex to Split Bbcode into Pieces

Regex to split BBCode into pieces

irb(main):001:0> str = "some html code [img]......[/img] some html \
code [img]......[/img]"
"some html code [img]......[/img] some html code [img]......[/img]"
irb(main):002:0> str.scan(/\[img\].*?\[\/img\]/)
["[img]......[/img]", "[img]......[/img]"]

Keep in mind that this is a very specific answer that is based on your exact question. Change str by, say, adding an image tag within an image tag, and all Hell will break loose.

Parse BBCode in array

You can try this using regex

$code = '[date format="j M, Y" type="jalali"]';

preg_match_all("/\[([^\]]*)\]/", $code, $matches);

$codes = [];

foreach($matches[1] as $match) {
// Normalize quotes into double quotes
$match = str_replace("'",'"',$match);
// Split by space but ignore inside of double quotes
preg_match_all('/(?:[^\s+"]+|"[^"]*")+/',$match,$tokens);
$parsed = [];
$prevToken = '';
foreach($tokens[0] as $token) {
if(strpos($token,'=') !== false) {
if($prevToken !== '') {
$parts = explode('=',$token);
$parsed[$prevToken][$parts[0]] = trim($parts[1],'"\'');
}
} else {
$parsed[$token] = [];
$prevToken = $token;
}
}

$codes[] = $parsed;
}

var_dump($codes);

Result:

array(1) {
[0]=>
array(1) {
["date"]=>
array(2) {
["format"]=>
string(6) "j M, Y"
["type"]=>
string(6) "jalali"
}
}
}

How to to parse pseudocode similar to BBCode in PHP?

Regular expressions. While you could write a parser for schemes like this, it's overkill and provides no resiliency against garbled tokens.

The trick is to use two regular expressions, one for finding the [field] tokens and a second to split out the attributes.

preg_replace_callback('/\[(\w+)(\s+\w+=\pP[^"\']*\pP)*\]/', "block", $);

function block($match) {

$field = $match[1];

preg_match_all('/(\w+)=\pP([^"\']+)\pP/', $match[2], $attr);
$attr = array_combine($attr[1], $attr[2]);

// ...
return $html;
}

How do you get multiple args in a PEAR BBCODE Parser?

BBcode does not have the concept of multiple attributes - you cannot do what you want.

Only single, unnamed attributes are supported:

[url=http://example.org]name[/url]

Remove nested bbcode quotes in Python?

Not sure if you just want the quotes, or the whole input with nested quotes removed. This pyparsing sample does both:

stuff = """
Other stuff
[quote user2]
[quote user1]Hello[/quote]
World
[/quote]
Other stuff after the stuff
"""

from pyparsing import (Word, printables, originalTextFor, Literal, OneOrMore,
ZeroOrMore, Forward, Suppress)

# prototype username
username = Word(printables, excludeChars=']')

# BBCODE quote tags
openQuote = originalTextFor(Literal("[") + "quote" + username + "]")
closeQuote = Literal("[/quote]")

# use negative lookahead to not include BBCODE quote tags in tbe body of the quote
contentWord = ~(openQuote | closeQuote) + (Word(printables,excludeChars='[') | '[')
content = originalTextFor(OneOrMore(contentWord))

# define recursive definition of quote, suppressing any nested quotes
quotes = Forward()
quotes << ( openQuote + ZeroOrMore( Suppress(quotes) | content ) + closeQuote )

# put separate tokens back together
quotes.setParseAction(lambda t : '\n'.join(t))

# quote extractor
for q in quotes.searchString(stuff):
print q[0]

# nested quote stripper
print quotes.transformString(stuff)

Prints:

[quote user2]
World
[/quote]

Other stuff
[quote user2]
World
[/quote]
Other stuff after the stuff

Match 123. some string into two variables

Like this?

(full, pos, title) =  your_string.match(/(\d+)\.\s*(.*)/).to_a


Related Topics



Leave a reply



Submit