How can I download HTML source in C#
You can download files with the WebClient
class:
using System.Net;
using (WebClient client = new WebClient ()) // WebClient class inherits IDisposable
{
client.DownloadFile("http://yoursite.com/page.html", @"C:\localfile.html");
// Or you can get the file content without saving it
string htmlCode = client.DownloadString("http://yoursite.com/page.html");
}
Quick Download HTML Source in C#
I know that this is dated, but I think I found the cause: I've encountered this at other sites. If you look at the response cookies, you will find one named ak_bmsc
. That cookie shows that the site is running the Akamai Bot Manager. It offers bot protection, thus blocks requests that 'look' suspicious.
In order to get a quick response from the host, you need the right request settings. In this case:
- Headers:
Host
: (their host data)www.faa.gov
Accept
: (something like:)*/*
- Cookies:
AkamaiEdge = true
example:
class Program
{
private static readonly HttpClient _client = new HttpClient();
private static readonly string _url = "https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/";
static async Task Main(string[] args)
{
var sw = Stopwatch.StartNew();
using (var request = new HttpRequestMessage(HttpMethod.Get,_url))
{
request.Headers.Add("Host", "www.faa.gov");
request.Headers.Add("Accept", "*/*");
request.Headers.Add("Cookie", "AkamaiEdge=true");
Console.WriteLine(await _client.SendAsync(request));
}
Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
}
}
Takes 896 ms for me.
by the way, you shouldn't put HttpClient
in a using block. I know it's disposable, but it's not designed to be disposed.
how to get html page source by C#
here is the way
string url = "https://www.digikala.com/";
using (HttpClient client = new HttpClient())
{
using (HttpResponseMessage response = client.GetAsync(url).Result)
{
using (HttpContent content = response.Content)
{
string result = content.ReadAsStringAsync().Result;
}
}
}
and result
variable will contains the page as HTML
then you can save it to a file like this
System.IO.File.WriteAllText("path/filename.html", result);
NOTE you have to use the namespace
using System.Net.Http;
Update if you are using legacy VS then you can see this answer for using WebClient
and WebRequest
for the same purpose, but Actually updating your VS is a better solution.
Download and save image from HTML source
Found the answer. We have to set the cookie container from the web site to your request.
public static Stream DownloadImageData(CookieContainer cookies, string siteURL)
{
HttpWebRequest httpRequest = null;
HttpWebResponse httpResponse = null;
httpRequest = (HttpWebRequest)WebRequest.Create(siteURL);
httpRequest.CookieContainer = cookies;
httpRequest.AllowAutoRedirect = true;
try
{
httpResponse = (HttpWebResponse)httpRequest.GetResponse();
if (httpResponse.StatusCode == HttpStatusCode.OK)
{
var httpContentData = httpResponse.GetResponseStream();
return httpContentData;
}
return null;
}
catch (WebException we)
{
return null;
}
finally
{
if (httpResponse != null)
{
httpResponse.Close();
}
}
}
Downloading Entire Html of a website UWP C#
Use WebView to get HTML of the site(as I mentioned in this answer) using below code. This will get all the code(Including JS).
WebView webView = new WebView();
public LoadURI()
{
webView.Navigate(new Uri("https://www.bing.com/"));
webView.NavigationCompleted += webView_NavigationCompletedAsync;
}
string siteHtML = null;
private async void webView_NavigationCompletedAsync(WebView sender, WebViewNavigationCompletedEventArgs args)
{
siteHtML = await webView.InvokeScriptAsync("eval", new string[] { "document.documentElement.outerHTML;" });
}
If it didn't get then try by waiting for some time and then get the HTML code
Fastest C# Code to Download a Web Page
public static void DownloadFile(string remoteFilename, string localFilename)
{
WebClient client = new WebClient();
client.DownloadFile(remoteFilename, localFilename);
}
Save HTML Source Code to string in WinForms Application
As you wish to get the dynamic html content, and webBrowser.Document
, webBrowser.DocumentText
and webBrowser.DocumentStream
are not working to your wish.
Here's the trick: You can always run your custom JavaScript code from C#. And here's how you can get the current HTML in your WebBrowser
control:
webBrowser.Document.InvokeScript("eval", new string[]{"document.body.outerHTML"});
Refer to How to inject Javascript in WebBrowser control?.
Update
For iframe
inside your document
, you can try the following:
webBrowser.Document.InvokeScript("eval", new string[]{"document.querySelector(\"iframe\").contentWindow.document.documentElement.outerHTML"});
Another update
As your site contains the frame
instead of iframe
, here is how you can get the html content of that frame
:
webBrowser.Document.InvokeScript("eval", new string[]{"document.querySelector(\"frame[name='mainframe'\").contentWindow.document.documentElement.outerHTML"});
Final tested and working update
querySelector
is not working in WebControl
. So the workaround is: Provide some id
to your <frame>
, and fetch that <frame>
element using that id
. Here is how you can achieve your task.
HtmlElement frame = webBrowser1.Document.GetElementsByTagName("frame").Cast<HtmlElement>().FirstOrDefault(m => m.GetAttribute("name") == "mainframe");
if (frame != null)
{
frame.Id = "RandID_" + DateTime.Now.Ticks;
string html = webBrowser1.Document.InvokeScript("eval", new string[] { "document.getElementById('" + frame.Id + "').contentWindow.document.documentElement.outerHTML" }).ToString();
Console.WriteLine(html);
}
else
{
MessageBox.Show("Frame not found");
}
Related Topics
How to Set a Program to Launch at Startup
Dependency Injection in Attributes
When to Dispose Cancellationtokensource
Wixsharp Debug Custom Action in Console
How to Add an Attribute to a Property at Runtime
Searching for a Specific Jtoken by Name in a Jobject Hierarchy
How to Create 7-Zip Archives with .Net
Execute Multiple Command Lines with the Same Process Using .Net
Failed to Serialize the Response in Web API with JSON
What Is the Use of "Ref" for Reference-Type Variables in C#
"A Project with an Output Type of Class Library Cannot Be Started Directly"
Format Excel Column to Decimal Doing Export from C#
The Entity Type <Type> Is Not Part of the Model for the Current Context
When Is a Static Constructor Called in C#
What Is the Best Data Type to Use for Money in C#
ASP.NET Calling Webmethod with Jquery Ajax "401 (Unauthorized)"