How can I use WebClient.DownloadString from a secure URL (https)?
If you look at the headers in Fiddler, the response is GZip-encoded (compressed). See this answer for how to deal with this, since there's no "quick and easy" way with the WebClient
class.
Can't download webpage via C# webclient and via request/respond
Some pages load in stages. First they load the core of the page and only then they evaluate any JavaScript inside which loads further content via AJAX. To scrape these pages you will need more advanced content scraping libraries, than just simple HTTP request sender.
EDIT:
Here is a question in SO about the same problem that you are having now:
Jquery Ajax Web page scraping using c#
Get html that is generated via AJAX in webclient
The general approach is this:
- using a tool like Fiddler, find out which HTTP requests are made by the browser in order to fetch the data you're looking for.
- use WebClient to fetch the HTTP request(s) you need.
Take a look at my answer to this question for more info about HTML screen scraping for more details and how to work around various issues you may run across.
For #1 above, here's how to use fiddler to understand how a specific request is being made:
First, find the request you care about (the request which contains the data you want in its response). You can do this by inspecting each request by double-clicking it on the left pane in fiddler and looking inside the "text fiew" tab on the lower-right pane. You can also use CTRL+F to find content across multiple requests, but some requests are compressed so you'll want to ensure the "autodecode" button is selected in the toolbar before making your requests if you want to be sure you can text-search across all of them.
Once you've found the request you want, double-click it in Fiddler and select the "headers" tab in the upper-right pane. Those are the headers being sent. If your client sends exactly these headers to the server, you should get back the same data. But usually not all the headers are needed, so you'll want to figure out which ones are needed. You do this using Fiddler's Request Builder tab in the upper-right pane. Select that tab and drag your data request over from the left pane onto the request builder. Then submit the request to validate that it returns the correct results. Then start deleting headers, one header at a time, until the request stops working-- you know that that header was required. Try to delete each header until you find the ones that are required.
Then, you'll need to write code to generate the right header. Don't worry about the Host:
header, that's generated automatically for you. For the Cookie:
header, you'll need to generate it using the CookieContainer
class. For the other headers (e.g. UserAgent:, Accept:, etc. you can generally copy them and add them to your request as-is.
Related Topics
Using Graphics.Drawimage() to Draw Image with Transparency/Alpha Channel
Set Environment Variables for a Process
How to Atomically Swap 2 Ints in C#
Capture Screen on Server Desktop Session
Why Generic Ilist<> Does Not Inherit Non-Generic Ilist
Open File in Exclusive Mode in C#
Internal .Net Framework Data Provider Error 1025
How to Get a Unique Identifier for a Device Within Windows 10 Universal
C# - Asserting Two Objects Are Equal in Unit Tests
Troubles Implementing Ienumerable<T>
Programmatically Getting the Last Filled Excel Row Using C#
The Type Initializer for 'Emgu.Cv.Cvinvoke' Threw an Exception
How to Write a Comment to an Xml File When Using the Xmlserializer
Error Deserializing Xml to Object - Xmlns='' Was Not Expected
Detecting If Paste Event Occurred Inside a Rich Text Box
Visual Studio 2005 Designer Moves Controls and Resizes Form