How to Get All Input Elements in a Form with HTMLagilitypack Without Getting a Null Reference Error

How to get all input elements in a form with HtmlAgilityPack without getting a null reference error

You can do the following:

HtmlNode.ElementsFlags.Remove("form");

HtmlDocument doc = new HtmlDocument();

doc.Load(@"D:\test.html");

HtmlNode secondForm = doc.GetElementbyId("form2");

foreach (HtmlNode node in secondForm.Elements("input"))
{
HtmlAttribute valueAttribute = node.Attributes["value"];

if (valueAttribute != null)
{
Console.WriteLine(valueAttribute.Value);
}
}

By default HTML Agility Pack parses forms as empty node because they are allowed to overlap other HTML elements. The first line, (HtmlNode.ElementsFlags.Remove("form");) disables this behavior allowing you to get the input elements inside the second form.

Update:
Example of form elements overlap:

<table>
<form>
<!-- Other elements -->
</table>
</form>

The element begins inside a table but is closed outside the table element. This is allowed in the HTML specification and HTML Agility Pack has to deal with it.

How to get form input with HtmlAgilityPack without getting a null reference error?

Call HtmlNode.ElementsFlags.Remove("form") before you load the document. The following works fine:

public static void Main()
{
HtmlNode.ElementsFlags.Remove("form");

var doc = new HtmlDocument();
doc.Load("HtmlPage1.html");

HtmlNode formNode = doc.DocumentNode.SelectNodes("//form")[0];
foreach (HtmlNode innode in formNode.Elements("input"))
{
Console.WriteLine(innode.OuterHtml);
}

Console.WriteLine("Press Enter to exit...");
Console.ReadLine();
}

Get entire form element as string using Html Agility Pack

Seems you're looking for HtmlNode.OuterHtml:

//
// Summary:
// Gets or Sets the object and its content in HTML.
public virtual string OuterHtml { get; }

So you just have to select your form node and get its OuterHtml property:

HtmlDocument doc = ... // load your HTML
HtmlNode formNode = doc.DocumentNode.SelectSingleNode("//form[@id='aspnetForm']");
string entireElementAsString = formNode.OuterHtml;

UPDATE

It seems there's a very old bug with how HAP treats form tags. Or maybe it's a feature!

In any case, here's a workaround:

HtmlNode.ElementsFlags.Remove("form");

So this should work:

HtmlNode.ElementsFlags.Remove("form");
HtmlDocument doc = ... // load your HTML
HtmlNode formNode = doc.DocumentNode.SelectSingleNode("//form[@id='aspnetForm']");
string entireElementAsString = formNode.OuterHtml;

HTMLAgilitliyPack: select Form input subnodes not working

Solved!

        HtmlNode.ElementsFlags.Remove("form");

should be called before loading the document.

My bad :D

HTML agility parsing error

Try adding the following statement before loading the document:

HtmlNode.ElementsFlags.Remove("form");

HtmlAgilityPack's default behaviour adds all the form's inner-elements as siblings in stead of children. The statement above alters that behaviour so that they (meaning the input tags) will appear as childnodes.

Your code would look like this:

HtmlNode.ElementsFlags.Remove("form");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(File.ReadAllText(@"C:\sample.html"));
HtmlNode nd = doc.DocumentNode.SelectSingleNode("//form[@id='form1']");
etc...

references:

  1. bug issue & fix: http://htmlagilitypack.codeplex.com/workitem/23074
  2. codeplex forum post: http://htmlagilitypack.codeplex.com/discussions/247206

Powershell 2.0 - Using HtmlAgilityPack to get children of FORM elements

Actually I'm not a user of PowerShell, but according to this blog post, you may want to try something like this :

[HtmlAgilityPack.HtmlNode.ElementsFlags]::Remove("form")


Related Topics



Leave a reply



Submit