How to Check If String Is a Valid Xml Element Name

How to check if string is a valid XML element name?

How about

/\A(?!XML)[a-z][\w0-9-]*/i

Usage:

if (preg_match('/\A(?!XML)[a-z][\w0-9-]*/i', $subject)) {
# valid name
} else {
# invalid name
}

Explanation:

\A  Beginning of the string
(?!XML) Negative lookahead (assert that it is impossible to match "XML")
[a-z] Match a non-digit, non-punctuation character
[\w0-9-]* Match an arbitrary number of allowed characters
/i make the whole thing case-insensitive

java - How to check if string is a valid XML element name?

If you are using Xerces XML parser, you can use the XMLChar (or XML11Char) class isValidName() method, like this:

org.apache.xerces.util.XMLChar.isValidName(String name)

There is also sample code available here for isValidName.

How to check if string is valid xml name

This is not really doable without parsing, or at least—in a limited form—without using a regular expression. Names in XML permit different characters as the first character and as second and further characters — see the Name production.

Should you implement IsValidXmlChar without a context, i.e. just checking if the given character is a NameChar, as per the XML specification, the output of your example would be GridAttributeStuff.

So you should at least tokenize the input text to retrieve valid names, and parse the input to retrieve element names, i.e. output Grid in your example.

To check if a string is a XML name, the XmlReader class offers the IsName static method. To categorize characters in an XML text, there is the XmlCharType struct in .NET Framework as well as in .NET Core, but it's internal.

Efficient way to determine if a string is a legal XML element name

There exists a static string VerifyName(string name) function, but it throws an exception for invalid names.

I would still prefer to use this:

try
{
XmlConvert.VerifyName(name);
return true;
}
catch
{
return false;
}

Why is it invalid to have ( or ) characters in an XML Element Name?

The answer you found lists the characters reserved in the text of an XML document, i.e. the contents of elements and the values of attributes. However, your example uses punctuation within the name of an element, which is subject to stricter limits.

The full list of allowed characters can be found in the XML specification; note that the first character of the name is even further restricted. (XML 1.1 expands the list of allowed characters slightly to reflect evolution of the Unicode standard.) The main thing to notice is that most of the common punctuation from ASCII (which would have Unicode code points below #x7f) are excluded.

It is common practice to use only names which begin with a letter, and proceed with letters, digits, underscores and hyphens, but a well-written XML parser should handle a wider range of Unicode characters should you wish to use them.

Names beginning with "xml" (in any combination of upper and lower case) are specially reserved, and names containing colons will be interpreted as using namespaces, so those should also be avoided.

Note that there is no escape mechanism for these restricted characters, you just have to design your format not to need them.

What would be a regex for valid xml names?

Do you mean XML element names? If so, no, that's too exclusive, there are lots of valid characters that that doesn't cover. More in the spec here and here:

NameStartChar    ::=    ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
[#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] |
[#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
[#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] |
[#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 |
[#x0300-#x036F] | [#x203F-#x2040]

Name ::= NameStartChar (NameChar)*

How to write a regex expression to check a valid XML element NCName in javascript?

Your ^([_]|[a-zA-Z]+[\w\W])$ pattern matches a string that is either equal to _ ([_]) or (|) is formed of 1+ letters ([a-zA-Z]+) followed with any char ([\w\W]). So, it cannot validate the strings of the type you mention.

You may use

/^[a-zA-Z_][\w.-]*$/

See the regex demo and the graph (source) below:

Sample Image

Details

  • ^ - start of string
  • [a-zA-Z_] - a letter or _
  • [\w.-]* - 0 or more letters, digits, underscores, dots or hyphens
  • $ - end of string

How to check if a long string is a valid XML?

I've seen this before and the problem is that XmlDocument tries to download the DTD for the document. In your sample this is http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd which lets you open a connection but never returns anything. So a simple solution (without any type of error checking mind you) is to remove anything before the -tag like this.

WebClient wc = new WebClient();
wc.Encoding = Encoding.UTF8;
string data = wc.DownloadString("http://1pezeshk.com/");
data = data.Remove(0, data.IndexOf("<html"));
XmlDocument xml = new XmlDocument();
xml.LoadXml(data);

Edit

Browsing to http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd actully returns the DTD, but it took well over a minute to respond. Since you still won't do DTD-validation you should really just strip this from your HTML and then try to validate it as HTML.

How to generate valid XML element name from String value in Java

Here is one possible algorithm.

  1. Initialize the result buffer to "_"

  2. For every (Java 16-bit) character in the input:

(2a) If the character is a valid name character other than underscore, append it to the buffer

(2b) Otherwise, append _HHHH to the buffer where HHHH is the character code in hexadecimal.

This algorithm generates a unique name for every input string and is reversible so you can reconstruct the input string from the generated name.



Related Topics



Leave a reply



Submit