Webclient.Downloadstring() Returns String with Peculiar Characters

DownloadString and Special Characters

DownloadString doesn't look at HTTP response headers. It uses the previously set WebClient.Encoding property. If you have to use it, get the headers first:

// call twice 
// (or to just do a HEAD, see http://stackoverflow.com/questions/3268926/head-with-webclient)
webClient.DownloadString("http://en.wikipedia.org/wiki/Maurício");
var contentType = webClient.ResponseHeaders["Content-Type"];
var charset = Regex.Match(contentType,"charset=([^;]+)").Groups[1].Value;

webClient.Encoding = Encoding.GetEncoding(charset);
var s = webClient.DownloadString("http://en.wikipedia.org/wiki/Maurício");

BTW--Unicode doesn't define "foreign" characters. From Maurício's perspective, "Mauricio" would be the foreign spelling of his name.

WebClient.DownloadString() returns string with peculiar characters

 is the windows-1252 representation of the octets EF BB BF. That's the UTF-8 byte-order marker, which implies that your remote web page is encoded in UTF-8 but you're reading it as if it were windows-1252. According to the docs, WebClient.DownloadString uses Webclient.Encoding as its encoding when it converts the remote resource into a string. Set it to System.Text.Encoding.UTF8 and things should theoretically work.

WebClient DownloadString response with strange characters

To get the response in JSON, instead of the PHP serialize format, use api=json instead of api=serialize in the URL.

WebClient.DownloadString results in mangled characters due to encoding issues, but the browser is OK

It's not lying. You should set the webclient's encoding first before calling DownloadString.

using(WebClient webClient = new WebClient())
{
webClient.Encoding = Encoding.UTF8;
string s = webClient.DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20");
}

As for why your alternative isn't working, it's because the usage is incorrect. Its should be:

System.Text.Encoding.UTF8.GetString()

Using WebClient.DownloadString() return odd characters sometimes

Basically the server is delivering content in a compressed (GZip) format.

The answer here Characters in string changed after downloading HTML from the internet will give you a replacement downloader that will handle compressed and uncompressed files.

WebClient.DownloadString uses wrong encoding

Probably the encoding it is using now is not the one the service returns.

You can set the encoding you expect before you make the request:

webClient.Encoding = Encoding.UTF8;
string previouslyWrongXml = webClient.DownloadString(url);

Downloading JSON with WebClient results in weird unicode-like characters?

Following @Progman comment, all you need is to do the following:

// You can define other methods, fields, classes and namespaces here
class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
void Main()
{
using var webClient = new MyWebClient();
webClient.Headers.Add("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:89.0) Gecko/20100101 Firefox/89.0");
webClient.Headers.Add("Host", "search.snapchat.com");
var str = webClient.DownloadString("https://search.snapchat.com/lookupStory?id=itsmaxwyatt");
Debug.WriteLine(str);
}


Related Topics



Leave a reply



Submit