How to Validate That a String Doesn't Contain HTML Using C#

How to validate that a string doesn't contain HTML using C#

I just tried my XElement.Parse solution. I created an extension method on the string class so I can reuse the code easily:

public static bool ContainsXHTML(this string input)
{
try
{
XElement x = XElement.Parse("<wrapper>" + input + "</wrapper>");
return !(x.DescendantNodes().Count() == 1 && x.DescendantNodes().First().NodeType == XmlNodeType.Text);
}
catch (XmlException ex)
{
return true;
}
}

One problem I found was that plain text ampersand and less than characters cause an XmlException and indicate that the field contains HTML (which is wrong). To fix this, the input string passed in first needs to have the ampersands and less than characters converted to their equivalent XHTML entities. I wrote another extension method to do that:

public static string ConvertXHTMLEntities(this string input)
{
// Convert all ampersands to the ampersand entity.
string output = input;
output = output.Replace("&", "amp_token");
output = output.Replace("&", "&");
output = output.Replace("amp_token", "&");

// Convert less than to the less than entity (without messing up tags).
output = output.Replace("< ", "< ");
return output;
}

Now I can take a user submitted string and check that it doesn't contain HTML using the following code:

bool ContainsHTML = UserEnteredString.ConvertXHTMLEntities().ContainsXHTML();

I'm not sure if this is bullet proof, but I think it's good enough for my situation.

How to check if a string contains HTML code in ASP.NET MVC model validation?

This is the regex for that:

<(\s*[(\/?)\w+]*)

It checks for even if single closing tag is there or opening tag is there, it matches that.

DEMO here

Fail validation if text contains HTML

You have a few options.

  1. Use When or Unless.
  2. Change your regex to match a non match.
  3. Pass in a lambda.

    RuleFor(x => x.CodeDescription)
    .Must(x=> !Regex.IsMatch(x, ValidatorUtility.Contains_Html_Regex));

How to check if string has a correct html syntax

You can use Html Agility Pack : http://html-agility-pack.net/?z=codeplex

string html = "<span>Hello world</sspan>";

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

if (doc.ParseErrors.Count() > 0)
{
//Invalid HTML
}

Check if a string is html or not

A better regex to use to check if a string is HTML is:

/^/

For example:

/^/.test('') // true
/^/.test('foo bar baz') //true
/^/.test('<p>fizz buzz</p>') //true

In fact, it's so good, that it'll return true for every string passed to it, which is because every string is HTML. Seriously, even if it's poorly formatted or invalid, it's still HTML.

If what you're looking for is the presence of HTML elements, rather than simply any text content, you could use something along the lines of:

/<\/?[a-z][\s\S]*>/i.test()

It won't help you parse the HTML in any way, but it will certainly flag the string as containing HTML elements.

Validate that a string contain some exact words

Since you need to find the occurrences of the word in any order, you can use the following pattern:

string pattern = @"^(?=.*\bMaster\b)(?=.*Language=""C#"").+$";

This uses positive look-arounds to check for the existence of Master and Language="C#". Notice the use of the word-boundary meta-character, \b, which ensures that "Master" is an exact match. That ensures that a partial match in "MasterPage" won't occur.

Example:

string[] inputs = 
{
"Master Language=\"C#\" MasterPageFile=\"~/masterpages/Libraries.master\"", // true
"Language=\"C#\" MasterPageFile=\"~/masterpages/Libraries.master\" Master", // true
"Language=\"C#\" MasterPageFile=\"~/masterpages/Libraries.master\"" // false
};

string pattern = @"^(?=.*\bMaster\b)(?=.*Language=""C#"").+$";

foreach (var input in inputs)
{
Console.WriteLine(Regex.IsMatch(input, pattern));
}

Regex that does not match any html tag

Regular expressions cannot do "negative" matches.

But they can do "positive" matches and you can then throw out of the string everything that they have found.


Edit - after the question was updated, things became a little clearer. Try this:

public class MessageViewModel
{
[Required]
[RegularExpression(@"^(?!.*<[^>]+>).*", ErrorMessage = "No html tags allowed")]
public string UserName { get; set; }
}

Explanation:

^            # start of string
(?! # negative look-ahead (a position not followed by...)
.* # anything
<[^>]+> # something that looks like an HTML tag
) # end look-ahead
.* # match the remainder of the string

Regular Expression to check if a string has HTML code

Should be something like this

var pattern:RegExp = /<a\s.*?<\/a>/;
var index = str.search(pattern);
if (index != -1) // we have a match

How to check if string contents have any HTML in it?

If you want to test if a string contains a "<something>", (which is lazy but can work for you), you can try something like that :

function is_html($string)
{
return preg_match("/<[^<]+>/",$string,$m) != 0;
}

How to check whether a string is a valid HTTP URL?

Try this to validate HTTP URLs (uriName is the URI you want to test):

Uri uriResult;
bool result = Uri.TryCreate(uriName, UriKind.Absolute, out uriResult)
&& uriResult.Scheme == Uri.UriSchemeHttp;

Or, if you want to accept both HTTP and HTTPS URLs as valid (per J0e3gan's comment):

Uri uriResult;
bool result = Uri.TryCreate(uriName, UriKind.Absolute, out uriResult)
&& (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps);


Related Topics



Leave a reply



Submit