How to Store the Data in Unicode in Hindi Language

How to store the data in unicode in hindi language

Choose utf8 character set and utf8_general_ci collation.

Obviously, Collation of the field (to which you want to store Hindi text) should be utf8_general_ci.

To alter your table field, run

ALTER TABLE `<table_name>` CHANGE `<field_name>` `<field_name>` VARCHAR(100) 
CHARSET utf8 COLLATE utf8_general_ci DEFAULT '' NOT NULL;

Once you've connected to database, run the following statement at first

mysql_set_charset('utf8');

Eg:

//setting character set
mysql_set_charset('utf8');

//insert Hindi text
mysql_query("INSERT INTO ....");

To retrieve data

//setting character set
mysql_set_charset('utf8');

//select Hindi text
mysql_query("SELECT * FROM ....");

Before you printing any unicode text (say Hindi text) on browser, you should have to set content type of that page by adding a meta tag

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Eg:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Example Unicode</title>
</head>

<body>
<?php echo $hindiText; ?>
</body>
</html>

Update:

mysql_query("SET CHARACTER SET utf8") has changed tomysql_set_charset('utf8');
This is the preferred way to change the charset. Using mysql_query() to set it (such as SET NAMES utf8) is not recommended. See http://php.net/manual/en/function.mysql-set-charset.php*

Storing and displaying unicode string (हिन्दी) using PHP and MySQL

Did you set proper charset in the HTML Head section?

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

or you can set content type in your php script using -

   header( 'Content-Type: text/html; charset=utf-8' ); 

There are already some discussions here on StackOverflow - please have a look

How to make MySQL handle UTF-8 properly
setting utf8 with mysql through php

PHP/MySQL with encoding problems

So what i want to know is how can i
directly store सूर्योदय into my
database and fetch it and display in
my webpage using PHP.

I am not sure what you mean by "directly storing in the database" .. did you mean entering data using PhpMyAdmin or any other similar tool? If yes, I have tried using PhpMyAdmin to input unicode data, so it has worked fine for me - You could try inputting data using phpmyadmin and retrieve it using a php script to confirm. If you need to submit data via a Php script just set the NAMES and CHARACTER SET when you create mysql connection, before execute insert queries, and when you select data. Have a look at the above posts to find the syntax. Hope it helps.

** UPDATE **
Just fixed some typos etc

SQL data not retrieved in Unicode Hindi

Change the string to start with N to signify it is a Unicode string:

SELECT        uid, family_head, member_name, house_no, address, f_h_name, gender, caste, dob, occupation, literacy, end_date
FROM family
WHERE (member_name = N'समर्थ अग्रवाल')

Otherwise, the string will not be a Unicode string and the query will return no results.

See Constants (Transact-SQL) on MSDN:

Unicode strings

Unicode strings have a format similar to character strings but are preceded by an N identifier (N stands for National Language in the SQL-92 standard). The N prefix must be uppercase. For example, 'Michél' is a character constant while N'Michél' is a Unicode constant. Unicode constants are interpreted as Unicode data, and are not evaluated by using a code page.

C++ - How to read Unicode characters( Hindi Script for e.g. ) using C++ or is there a better Way through some other programming language?

I would seriously suggest that you'd use Python for an applicatin like this.
It will lift the burden of decoding the strigns (not to mention allocating memory for them and the like). You will be free to concentrate on your problem, instead of problems of the language.

For example, if the sentence above is contained in an utf-8 file, and you are uisng python2.x.
If you use python 3.x it is even more readible, as you don't have to prefix the unicode strings with 'u" ', as in this example (but you will be missing a lot of 3rd party libraries:

separators = [u"।", u",", u"."]
text = open("indiantext.txt").read()
#This converts the encoded text to an internal unicode object, where
# all characters are properly recognized as an entity:
text = text.decode("utf-8")

#this breaks the text on the white spaces, yielding a list of words:
words = text.split()

counter = 1

output = ""
for word in words:
#if the last char is a separator, and is joined to the word:
if word[-1] in separators and len(word) > 1:
#word up to the second to last char:
output += word[:-1] + u"(%d) " % counter
counter += 1
#last char
output += word[-1] + u"(%d) " % counter
else:
output += word + u"(%d) " % counter
counter += 1

print output

This is an "unfolded" example, As you get more used to Python there are shorer ways to express this. You can learn the basics of teh language in just a couple of hours, following a tutorial. (for example, the one at http://python.org itself)



Related Topics



Leave a reply



Submit