How to Download HTML Source in C#

How can I download HTML source in C#

You can download files with the WebClient class:

using System.Net;

using (WebClient client = new WebClient ()) // WebClient class inherits IDisposable
{
client.DownloadFile("http://yoursite.com/page.html", @"C:\localfile.html");

// Or you can get the file content without saving it
string htmlCode = client.DownloadString("http://yoursite.com/page.html");
}

Quick Download HTML Source in C#

I know that this is dated, but I think I found the cause: I've encountered this at other sites. If you look at the response cookies, you will find one named ak_bmsc. That cookie shows that the site is running the Akamai Bot Manager. It offers bot protection, thus blocks requests that 'look' suspicious.

In order to get a quick response from the host, you need the right request settings. In this case:

  • Headers:
    • Host: (their host data) www.faa.gov
    • Accept: (something like:) */*
  • Cookies:
    • AkamaiEdge = true

example:

class Program
{
private static readonly HttpClient _client = new HttpClient();
private static readonly string _url = "https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/";

static async Task Main(string[] args)
{
var sw = Stopwatch.StartNew();
using (var request = new HttpRequestMessage(HttpMethod.Get,_url))
{
request.Headers.Add("Host", "www.faa.gov");
request.Headers.Add("Accept", "*/*");
request.Headers.Add("Cookie", "AkamaiEdge=true");
Console.WriteLine(await _client.SendAsync(request));
}
Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
}
}

Takes 896 ms for me.

by the way, you shouldn't put HttpClient in a using block. I know it's disposable, but it's not designed to be disposed.

how to get html page source by C#

here is the way

string url = "https://www.digikala.com/";

using (HttpClient client = new HttpClient())
{
using (HttpResponseMessage response = client.GetAsync(url).Result)
{
using (HttpContent content = response.Content)
{
string result = content.ReadAsStringAsync().Result;
}
}
}

and result variable will contains the page as HTML then you can save it to a file like this

System.IO.File.WriteAllText("path/filename.html", result);

NOTE you have to use the namespace

using System.Net.Http;

Update if you are using legacy VS then you can see this answer for using WebClient and WebRequest for the same purpose, but Actually updating your VS is a better solution.

Download and save image from HTML source

Found the answer. We have to set the cookie container from the web site to your request.

public static Stream DownloadImageData(CookieContainer cookies, string siteURL)
{
HttpWebRequest httpRequest = null;
HttpWebResponse httpResponse = null;

httpRequest = (HttpWebRequest)WebRequest.Create(siteURL);

httpRequest.CookieContainer = cookies;
httpRequest.AllowAutoRedirect = true;

try
{
httpResponse = (HttpWebResponse)httpRequest.GetResponse();
if (httpResponse.StatusCode == HttpStatusCode.OK)
{
var httpContentData = httpResponse.GetResponseStream();

return httpContentData;
}
return null;
}
catch (WebException we)
{
return null;
}
finally
{
if (httpResponse != null)
{
httpResponse.Close();
}
}
}

Downloading Entire Html of a website UWP C#

Use WebView to get HTML of the site(as I mentioned in this answer) using below code. This will get all the code(Including JS).

WebView webView = new WebView();
public LoadURI()
{
webView.Navigate(new Uri("https://www.bing.com/"));
webView.NavigationCompleted += webView_NavigationCompletedAsync;
}

string siteHtML = null;
private async void webView_NavigationCompletedAsync(WebView sender, WebViewNavigationCompletedEventArgs args)
{
siteHtML = await webView.InvokeScriptAsync("eval", new string[] { "document.documentElement.outerHTML;" });
}

If it didn't get then try by waiting for some time and then get the HTML code

Fastest C# Code to Download a Web Page


public static void DownloadFile(string remoteFilename, string localFilename)
{
WebClient client = new WebClient();
client.DownloadFile(remoteFilename, localFilename);
}

Save HTML Source Code to string in WinForms Application

As you wish to get the dynamic html content, and webBrowser.Document, webBrowser.DocumentText and webBrowser.DocumentStream are not working to your wish.

Here's the trick: You can always run your custom JavaScript code from C#. And here's how you can get the current HTML in your WebBrowser control:

webBrowser.Document.InvokeScript("eval", new string[]{"document.body.outerHTML"});

Refer to How to inject Javascript in WebBrowser control?.

Update

For iframe inside your document, you can try the following:

webBrowser.Document.InvokeScript("eval", new string[]{"document.querySelector(\"iframe\").contentWindow.document.documentElement.outerHTML"});

Another update

As your site contains the frame instead of iframe, here is how you can get the html content of that frame:

webBrowser.Document.InvokeScript("eval", new string[]{"document.querySelector(\"frame[name='mainframe'\").contentWindow.document.documentElement.outerHTML"});

Final tested and working update

querySelector is not working in WebControl. So the workaround is: Provide some id to your <frame>, and fetch that <frame> element using that id. Here is how you can achieve your task.

HtmlElement frame = webBrowser1.Document.GetElementsByTagName("frame").Cast<HtmlElement>().FirstOrDefault(m => m.GetAttribute("name") == "mainframe");
if (frame != null)
{
frame.Id = "RandID_" + DateTime.Now.Ticks;
string html = webBrowser1.Document.InvokeScript("eval", new string[] { "document.getElementById('" + frame.Id + "').contentWindow.document.documentElement.outerHTML" }).ToString();
Console.WriteLine(html);
}
else
{
MessageBox.Show("Frame not found");
}


Related Topics



Leave a reply



Submit