Curl Import Character Encoding Problem

CURL import character encoding problem

Like Jon Skeet pointed it's difficult to understand your situation, however if you have access only to final text, you can try to use iconv for changing text encoding.

I.e.

$text = iconv("Windows-1252","UTF-8",$text);

I've had similar issue time ago (with Italian language and special chars) and I've solved it in this way.

Try different combination (UTF-8, ISO-8859-1, Windows-1252).

PHP Curl return character

I found these two similar SO posts that may be helpful:

PHP Curl UTF-8 Charset

CURL import character encoding problem

Curl encoding issues with command line

This is due to handling of unicode characters in a DOS prompt, see Unicode characters in Windows command line - how?. You should be able to change this behavior by using a command like chcp 65001 to set the terminal up for UTF-8 handling.

How to include an '&' character in a bash curl statement

Putting single quotes around the & symbol seems to work. That is, using a URL like http://www.example.com/page.asp?arg1=${i}'&'arg2=${j} with curl returns the requested webpage.

How to encode foreign characters for importing to shopify via API using PHP and CURL

If you are using a mysqli database connection to fetch the data from Magento, you may need to set the charset of the connection to utf8 so that PHP gets the data correctly from the database:

$mysqli->set_charset("utf8")

Scraping meta data on Japanese websites with some character encoding problems

Looks like even tough all pages declared using UTF-8, some ISO-8859-1 was hidden in places. Using iconv solved the issue.

Edited the question with all the details, case closed !

php: file_get_contents encoding problem

First off, is your browser set to UTF-8? In Firefox you can set your text encoding in View->Character Encoding. Make sure you have "Unicode (UTF-8)" selected. I would also set View->Character Encoding->Auto-Detect to "Universal."

Secondly, you could try passing the FILE_TEXT flag, like so:

$page = file_get_contents('http://translate.google.com/translate_t', FILE_TEXT, $context);

How to urlencode data for curl command?

Use curl --data-urlencode; from man curl:

This posts data, similar to the other --data options with the exception that this performs URL-encoding. To be CGI-compliant, the <data> part should begin with a name followed by a separator and a content specification.

Example usage:

curl \
--data-urlencode "paramName=value" \
--data-urlencode "secondParam=value" \
http://example.com

See the man page for more info.

This requires curl 7.18.0 or newer (released January 2008). Use curl -V to check which version you have.

You can as well encode the query string:

curl --get \
--data-urlencode "p1=value 1" \
--data-urlencode "p2=value 2" \
http://example.com
# http://example.com?p1=value%201&p2=value%202

Why can Haskell not handle characters from a specific website?

Since you said you are interested in just the links, there is no need to convert the GBK encoding to Unicode.

Here is a version which prints out all links like "123456.html" in the document:

#!/usr/bin/env stack
{- stack
--resolver lts-6.0 --install-ghc runghc
--package wreq --package lens
--package tagsoup
-}

{-# LANGUAGE OverloadedStrings #-}

import Network.Wreq
import qualified Data.ByteString.Lazy.Char8 as LBS
import Control.Lens
import Text.HTML.TagSoup
import Data.Char
import Control.Monad

-- match \d+\.html
isNumberHtml lbs = (LBS.dropWhile isDigit lbs) == ".html"

wanted t = isTagOpenName "a" t && isNumberHtml (fromAttrib "href" t)

main = do
r <- get "http://www.piaotian.net/html/7/7430/"
let body = r ^. responseBody :: LBS.ByteString
tags = parseTags body
links = filter wanted tags
hrefs = map (fromAttrib "href") links
forM_ hrefs LBS.putStrLn


Related Topics



Leave a reply



Submit