Encode/Decode URl In C++
You can check out this article and this
Encode:
std::string UriEncode(const std::string & sSrc)
{
const char DEC2HEX[16 + 1] = "0123456789ABCDEF";
const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
const int SRC_LEN = sSrc.length();
unsigned char * const pStart = new unsigned char[SRC_LEN * 3];
unsigned char * pEnd = pStart;
const unsigned char * const SRC_END = pSrc + SRC_LEN;
for (; pSrc < SRC_END; ++pSrc)
{
if (SAFE[*pSrc])
*pEnd++ = *pSrc;
else
{
// escape this char
*pEnd++ = '%';
*pEnd++ = DEC2HEX[*pSrc >> 4];
*pEnd++ = DEC2HEX[*pSrc & 0x0F];
}
}
std::string sResult((char *)pStart, (char *)pEnd);
delete [] pStart;
return sResult;
}
Decode:
std::string UriDecode(const std::string & sSrc)
{
// Note from RFC1630: "Sequences which start with a percent
// sign but are not followed by two hexadecimal characters
// (0-9, A-F) are reserved for future extension"
const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
const int SRC_LEN = sSrc.length();
const unsigned char * const SRC_END = pSrc + SRC_LEN;
// last decodable '%'
const unsigned char * const SRC_LAST_DEC = SRC_END - 2;
char * const pStart = new char[SRC_LEN];
char * pEnd = pStart;
while (pSrc < SRC_LAST_DEC)
{
if (*pSrc == '%')
{
char dec1, dec2;
if (-1 != (dec1 = HEX2DEC[*(pSrc + 1)])
&& -1 != (dec2 = HEX2DEC[*(pSrc + 2)]))
{
*pEnd++ = (dec1 << 4) + dec2;
pSrc += 3;
continue;
}
}
*pEnd++ = *pSrc++;
}
// the last 2- chars
while (pSrc < SRC_END)
*pEnd++ = *pSrc++;
std::string sResult(pStart, pEnd);
delete [] pStart;
return sResult;
}
Encoding decoded urls in c++
In POSIX you can print UTF8 string directly:
std::string utf8 = "\xc3\xb6"; // or just u8"ö"
printf(utf8);
In Windows, you have to convert to UTF16. Use wchar_t
instead of char16_t
, even though char16_t
is supposed to be the right one. They are both 2 bytes per character in Windows.
You want convert.from_bytes
to convert from UTF8, instead of convert.to_bytes
which converts to UTF8.
Printing Unicode in Windows console is another headache. See relevant topics.
Note that std::wstring_convert
is deprecated and has no replacement as of now.
#include <iostream>
#include <string>
#include <codecvt>
#include <windows.h>
int main()
{
std::string utf8 = "\xc3\xb6";
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert;
std::wstring utf16 = convert.from_bytes(utf8);
MessageBox(0, utf16.c_str(), 0, 0);
DWORD count;
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), utf16.c_str(), utf16.size(), &count, 0);
return 0;
}
Encoding/Decoding URL
"URL safe characters" don't need encoding. All other characters, including non-ASCII characters, should be encoded. Example:
std::string encode_url(const std::string& s)
{
const std::string safe_characters =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~";
std::ostringstream oss;
for(auto c : s) {
if (safe_characters.find(c) != std::string::npos)
oss << c;
else
oss << '%' << std::setfill('0') << std::setw(2) <<
std::uppercase << std::hex << (0xff & c);
}
return oss.str();
}
std::string decode_url(const std::string& s)
{
std::string result;
for(std::size_t i = 0; i < s.size(); i++) {
if(s[i] == '%') {
try {
auto v = std::stoi(s.substr(i + 1, 2), nullptr, 16);
result.push_back(0xff & v);
} catch(...) { } //handle error
i += 2;
}
else {
result.push_back(s[i]);
}
}
return result;
}
How to properly decode url with unicode in C
%81
and %8A
are perfectly valid %-escapes, but the result is not a UTF-8 string. URLs are not required to be UTF-8 strings, but these days they usually are.
It looks to me like some very strange double encoding has happened. There is no convention I know of which uses three-digit %-encodings, but that's what it looks like you have in that URL. On the assumption that the intention was to encode the Spanish word "cariño" (care, affection, fondness), it should have been cari%C3%B1o
in UTF-8, or cari%F1o
in ISO-8859-1/Windows-1252 (which usually show up in URLs by accident).
The rules for valid UTF-8 sequences are simple enough that you can check for a valid sequence using a regular expression. Not all valid sequences are mapped to characters, and 66 of them are mapped explicitly as "not characters", but all valid sequences should be accepted by a conforming decoder even if it later rejects the decoded character as semantically incorrect.
A UTF-8 sequence is a one-to-four byte sequence corresponding to one of the following patterns: (taken from the Unicode standard, table 3.7)
Byte 1 Byte 2 Byte 3 Byte 4
------ ------ ------ ------
00..7F -- -- --
C2..DF 80..BF -- --
E0 A0..BF 80..BF --
E1..EC 80..BF 80..BF --
ED 80..9F 80..BF --
EE..EF 80..BF 80..BF --
F0 90..BF 80..BF 80..BF
F1..F3 80..BF 80..BF 80..BF
F4 80..8F 80..BF 80..BF
Anything else is illegal. (So codes C0, C1 and F5 through FF cannot appear at all.) In particular, the hex codes 81 and 8A can never start a UTF-8 sequence.
Since there is no good way to know what might be meant by an invalid sequence, the simplest thing is just to strip them out.
C - URL encoding
curl_escape
which apparently has been superseded by
curl_easy_escape
How to encode or decode URL in objective-c
It's natural that Chinese and Japanese characters don't work with ASCII string encoding. If you try to escape the string by Apple's methods, which you definitely should to avoid code duplication, store the result as a Unicode string. Use one of the following encodings:
NSUTF8StringEncoding
NSUTF16StringEncoding
NSShiftJISStringEncoding (not Unicode, Japanese-specific)
Related Topics
No Matching Function - Ifstream Open()
Writing Your Own Stl Container
Create N-Element Constexpr Array in C++11
Using Char* as a Key in Std::Map
How to Use a Custom Deleter With a Std::Unique_Ptr Member
G++ Undefined Reference to Typeinfo
Int A[] = {1,2,}; Why Is a Trailing Comma in an Initializer-List Allowed
Visual Studio Code, #Include ≪Stdio.H≫ Saying "Add Include Path to Settings"
Why Can't I Compile an Unordered_Map With a Pair as Key
How to Output a Character as an Integer Through Cout
How Could Pairing New[] With Delete Possibly Lead to Memory Leak Only
Comparing a Variable to a Range of Values
Dealing With Accuracy Problems in Floating-Point Numbers
Generate Random Numbers Following a Normal Distribution in C/C++