DownloadString and Special Characters
DownloadString doesn't look at HTTP response headers. It uses the previously set WebClient.Encoding property. If you have to use it, get the headers first:
// call twice
// (or to just do a HEAD, see http://stackoverflow.com/questions/3268926/head-with-webclient)
webClient.DownloadString("http://en.wikipedia.org/wiki/Maurício");
var contentType = webClient.ResponseHeaders["Content-Type"];
var charset = Regex.Match(contentType,"charset=([^;]+)").Groups[1].Value;
webClient.Encoding = Encoding.GetEncoding(charset);
var s = webClient.DownloadString("http://en.wikipedia.org/wiki/Maurício");
BTW--Unicode doesn't define "foreign" characters. From Maurício's perspective, "Mauricio" would be the foreign spelling of his name.
WebClient.DownloadString() returns string with peculiar characters

is the windows-1252 representation of the octets EF BB BF
. That's the UTF-8 byte-order marker, which implies that your remote web page is encoded in UTF-8 but you're reading it as if it were windows-1252. According to the docs, WebClient.DownloadString
uses Webclient.Encoding
as its encoding when it converts the remote resource into a string. Set it to System.Text.Encoding.UTF8
and things should theoretically work.
WebClient DownloadString response with strange characters
To get the response in JSON, instead of the PHP serialize format, use api=json
instead of api=serialize
in the URL.
WebClient.DownloadString results in mangled characters due to encoding issues, but the browser is OK
It's not lying. You should set the webclient's encoding first before calling DownloadString.
using(WebClient webClient = new WebClient())
{
webClient.Encoding = Encoding.UTF8;
string s = webClient.DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
}
As for why your alternative isn't working, it's because the usage is incorrect. Its should be:
System.Text.Encoding.UTF8.GetString()
Using WebClient.DownloadString() return odd characters sometimes
Basically the server is delivering content in a compressed (GZip) format.
The answer here Characters in string changed after downloading HTML from the internet will give you a replacement downloader that will handle compressed and uncompressed files.
WebClient.DownloadString uses wrong encoding
Probably the encoding it is using now is not the one the service returns.
You can set the encoding you expect before you make the request:
webClient.Encoding = Encoding.UTF8;
string previouslyWrongXml = webClient.DownloadString(url);
Downloading JSON with WebClient results in weird unicode-like characters?
Following @Progman comment, all you need is to do the following:
// You can define other methods, fields, classes and namespaces here
class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
void Main()
{
using var webClient = new MyWebClient();
webClient.Headers.Add("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:89.0) Gecko/20100101 Firefox/89.0");
webClient.Headers.Add("Host", "search.snapchat.com");
var str = webClient.DownloadString("https://search.snapchat.com/lookupStory?id=itsmaxwyatt");
Debug.WriteLine(str);
}
Related Topics
How to Load a C# Dll in Python
Python: Inflate and Deflate Implementations
C# Version of Java's Synchronized Keyword
Why C# Implements Methods as Non-Virtual by Default
How to Find a Java to C# Converter
How to Use Java-Style Throws Keyword in C#
Extending an Enum via Inheritance
C# Equivalent to Java's Charat()
Is There an Equivalent to the Scanner Class in C# for Strings
Posting JSON to Url via Webclient in C#
How to Return JSON with ASP.NET & Jquery
How to Convert JavaScript Date Object to Ticks
Is There a Faster Way to Scan Through a Directory Recursively in .Net
Dependent Dll Is Not Getting Copied to the Build Output Folder in Visual Studio