Regexp to Add Attribute in Any Xml Tags

Regexp to add attribute in any xml tags

Don't use regular expressions for working on xml. Xml is not a regular language. Use the xml extensions of php instead:

$xml = new SimpleXml(file_get_contents($xmlFile));
function process_recursive($xmlNode) {
$xmlNode->addAttribute('attr', 'myAttr');
foreach ($xmlNode->children() as $childNode) {
process_recursive($childNode);
}
}
process_recursive($xml);
echo $xml->asXML();

All answers containing regular expressions will break this valid xml, for example:

<?xml version="1.0" encoding='UTF-8'?>
<html>
<head>
<!-- <meta> ... </meta> -->
<script>//<![CDATA[
function load() {document.write('<tt>Test</tt>');}
//]]></script>
<title><![CDATA[Fancy <<SiteName>> [with Breadcrumbs] > in > title]]></title>
</head>
<body onload="load()">
<input
type="submit"
value="multiline
button
text"
/>
</body>
</html>

regex match all xml tags that contain a certain attribute value

The appropriate Perl solution is not regex. With Mojo::DOM (one of many options):

use strict;
use warnings;
use Mojo::DOM;
use File::Slurper 'read_text';

my $xml = read_text 'test.xml';
my $dom = Mojo::DOM->new->xml(1)->parse($xml);
my $tags = $dom->find('item[attr*=".htm"]');
print "$_\n" for @$tags;

Regex to find attribute in xml tag and replace its value (node/Javascript)

You can try the following regex and test it on regex101:

/(<widget [\S\s]*?version=")[^"]+("[\S\s]*?>)/gmi

In short, what am I doing:

  1. Group everything from <widget to version="
  2. Select everything except a "
  3. Group everything until the next >.

You can then replace it with $1(new version)$2.

Here is a simple demo:





const versionRegex = /(<widget [\S\s]*?version=")[^"]+("[\S\s]*?>)/gmi;


const content = `<?xml version="1.0" encoding="utf-8"?>

<widget android-packageName="com.demo.android" id="com.demo.ios" ios-CFBundleIdentifier="com.demo.ios.dev" version="1.16.6" xmlns="http://www.w3.org/ns/widgets" xmlns:android="http://schemas.android.com/apk/res/android" xmlns:cdv="http://cordova.apache.org/ns/1.0">

<name short="DEMO DEV">DEMO</name>`


const newVersion = 'testVersion';

const replaced = content.replace( versionRegex, `$1${ newVersion }$2` )

document.getElementById( 'result' ).innerText = replaced;
pre {

white-space: pre-line;

}
<pre id="result"></pre>

Insert elements (strings) within existing XML tags using a RegEX?


  • I changed the regex delimiter from / to ! to have a little less confusion)
  • to escape characters you use a backslash not a forward slash

Try:

s!(<pagenum page="normal" id=")([a-z0-9_-]+)(">)([0-9]+)(<pagenum/>)!\1\2\4\3\4\5!i

e.g.:

echo '<pagenum page="normal" id="page">1<pagenum/>' | \
sed -r 's!(<pagenum page="normal" id=")([a-z0-9_-]+)(">)([0-9]+)(<pagenum/>)!\1\2\4\3\4\5!i'

Note - isn't a closing tag usually </pagenum> as opposed to <pagenum/>?

How can I match XML tags and attributes with a regular expression in Perl?

You'll have much more succes using an XML parser, for example, XML::Parser. Parsing XML using regular expressions is very difficult (impossible?) and unless your use case is trivial, a proper XML parser is the reliable solution.

Add attribute to element which has an attribute with a particular value


(<\s*meta\s+name\s*=\s*"stack")(\s*/>)

replace with

$1 value="overflow" $2

RegEx for replacing and adding attributes to an HTML tag

I think the best approach is to use preg_replace_callback.

Also I would recommend a slightly more stringent regexp than those suggested so far - what if your page contains an <img /> tag that does not contain an id attribute?

$page = '
<body>
<img src="source.jpg" />
<p>
<img src="source.jpg" id ="hello" alt="nothing" />
<img src="source.jpg" id ="world"/>
</p>
</body>';

function my_callback($matches)
{
static $i = 0;
return $matches[1]."img_".$i++;
}

print preg_replace_callback('/(<img[^>]*id\s*=\s*")([^"]*)/', "my_callback", $page);

Which produces the following for me:

<body>
<img src="source.jpg" />
<p>
<img src="source.jpg" id ="img_0" alt="nothing" />
<img src="source.jpg" id ="img_1"/>
</p>
</body>

The regexp has two capturing groups, the first we preserve, the second we replace. I've used lots of negative character classes (e.g. [^>]* = up to closing >) to make sure that <img /> tags arn't required to have id attributes.

RegEx Notepad++: add quotes to XML attribute values using find replace

This may be beyond regex, but as long as you definitely don't have any equals symbols in your values the following should work:

Search: \b(\w+)=((?:\s*[^=>]+\b(?!=))+)?(\s+|\/?>)

Replace: $1="$2"$3

  • \b matches a word boundary http://www.regular-expressions.info/wordboundaries.html
  • (\w+) matches one or more word characters and captures as 'group 1' - referenced in the replace as $1
  • ( start 'group 2' - referenced in the replace as $2

    • (?: start a group, but do not capture - we do this so we can use the + char to repeat at the end

      • \s* matches zero or more whitespace characters
      • [^=>]+ matches one or more characters that are not = or >
      • \b matches another word boundary - without this it will continue matching part of the next property
      • (?!=) makes sure that the next character is not = This is known as a negative lookahead - be careful with these, they are a good way to make regex inefficient. http://www.regular-expressions.info/lookaround.html
    • )+ closes the non capturing group, and match it one or more times
  • )? closes group 2 and make it optional using the ? character
  • (\s+|/?>) make sure it ends with whitespace or the end of a tag - capture this as 'group 3' - use in replace as $3

    • \s+ whitespace or
    • /? optional forward slash for self closing tags
    • > end of tag

See it in action here: https://regex101.com/r/zYdzQB/2

Some caveats:

  • You will need to carefully check the results
  • You should not automate this, it is not an efficient way of solving the problem, but if you have a broken file to fix then it may be suitable.
  • If you have any chance of reviewing how the data was generated and fixing this you would be much better off doing that.

Regular expression for XML element with arbitrary attribute value

Your (.) will capture only a single character; add a quantifier like + (“one or more”):

/<data name=\\"(.+)\\" xml:space=\\"preserve\\">/

Depending on what exactly your input is (element by element or entire document) and on what you want to achieve (removing/replacing/testing/capturing), you should make the regex global (by adding the g flag), so it is applied not only once. Also, you should make the + quantifier lazy by adding a ? to it. That will make it non-greedy, because you want capturing to stop at the ending quote of the attribute (like all but quotation mark: [^"]). Then, it will look like this:

/<data name=\\"(.+?)\\" xml:space=\\"preserve\\">/g


Related Topics



Leave a reply



Submit