Validate Japanese Character in Active Record Callback
The following code may just push you over the line to fulfil the exact requirement you've so far specified in the least possible time. It uses the Moji gem (Japanese documentation), which gives lots of convenience methods in determining the content of a Japanese language string.
It validates a maximum of 14 characters in a name
that only consists of half-width characters, and a maximum of 7 characters for name
s otherwise (including names that contain a combination of half- and full-width characters i.e. the presence of even one full-width character in the string will make the whole string be regarded as "full-width").
class Customer
validates_length_of :name, :maximum => 14,
:if => Proc.new { |customer| half_width?(customer.name) }
validates_length_of :name, :maximum => 7
:unless => Proc.new { |customer| half_width?(customer.name) }
def half_width?(string)
Moji.type?(string, Moji::HAN_KATA)
end
end
Assumptions made:
- Data encoding within the system is UTF-8, and gets stored as such in the database; any further necessary re-encoding (such as for passing the data to a legacy system etc) is done in another module.
- No automatic conversion of half-to-full width characters done before data is saved to database i.e. half-width characters are allowed in the database for reasons perhaps of legacy system integration, proper preservation of actual user input(!), and/or aesthetic value of half-width characters(!)
- Diacritics in half-width characters are treated as their own separate character (i.e. no parsing of カ and ゙ to be considered one character for purposes of determining string length)
- There is only one name field as you specify and not, say, four (for surname, surname furigana, given name, given name furigana) which is quite common nowadays.
Ruby: Checking for East Asian Width (Unicode)
Late to the party, but hopefully still helpful: In Ruby, you can use the unicode-display_width gem to check for a string's east-asian-width:
require 'unicode/display_width'
"⚀".display_width #=> 1
'一'.display_width #=> 2
Using JavaScript to check whether a string contains Japanese characters (including kanji)
Check whether this works or not. I found this website that seems to list all the characters in Unicode that might be used in Japanese text.
The corresponding regex (for single character) would be:
/[\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf]/
-------------_____________-------------_____________-------------_____________
Punctuation Hiragana Katakana Full-width CJK CJK Ext. A
Roman/ (Common & (Rare)
Half-width Uncommon)
Katakana
The ranges are (as quoted from the site):
3000 - 303f
: Japanese-style punctuation3040 - 309f
: Hiragana30a0 - 30ff
: Katakanaff00 - ff9f
: Full-width Roman characters and half-width Katakana4e00 - 9faf
: CJK unified ideographs - Common and uncommon Kanji3400 - 4dbf
: CJK unified ideographs Extension A - Rare Kanji
I have changed the ranges a bit:
- I have changed from
ff00 - ffef
toff00 - ff9f
for Full-width Roman characters and half-width Katakana. The code points fromffa0 - ffdc
contains Hangul half-width characters, which is not what you want. You may want to re-add the code points fromffe0 - ffef
, but they are mostly half-width punctuations or full-width currency symbols.
You can check the site and take off any range you don't want, or are sure that it will not appear in your input.
Postgresql convert Japanese Full-Width to Half-Width
How about using translate() function?
-- prepare test data
CREATE TABLE address (
id integer,
name text
);
INSERT INTO address VALUES (1, 'SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1');
-- show test data
SELECT * from address;
-- convert Full-Width to Half-Width Japanese
UPDATE address SET name = translate(name,
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
);
-- see the converted data
SELECT * from address;
This code made the name column to "SYSKEN, 松井ケ丘3, コメリH&G, 篠路7-1".
Why do I get this undefined method error in a Rails model callback, passing the method as a symbol?
This may not be your primary issue (since your error message doesn't seem to relate to it), but you're not using changed?
correctly. changed?
needs to be called on your model object, optionally prefixed with your attribute name. So your condition method should look like:
def markdown_changed_or_html_nil?
# based on your method name, shouldn't this be:
# content_markdown_changed? || content_html.nil?
content_markdown_changed? || content_markdown.nil?
end
Find more information about Dirty
methods at http://api.rubyonrails.org/classes/ActiveModel/Dirty.html.
ALSO
I'm pretty sure Rails 4 hasn't moved Dirty
out of ActiveRecord::Base
, so you don't need to manually include ActiveModel::Dirty
in your model.
ALSO
This line:
validates :user, :title, :content_markdown, { presence: true, on: create }
Should be:
validates :user, :title, :content_markdown, { presence: true, on: :create }
determine whether a unicode character is fullwidth or halfwidth in C++
You should use ICU u_getIntPropertyValue
with the UCHAR_EAST_ASIAN_WIDTH
property.
For example:
bool is_fullwidth(UChar32 c) {
int width = u_getIntPropertyValue(c, UCHAR_EAST_ASIAN_WIDTH);
return width == U_EA_FULLWIDTH || width == U_EA_WIDE;
}
Note that if your graphics library supports combining characters then you'll have to consider those as well when determining how many cells a sequence uses; for example e
followed by U+0301
COMBINING ACUTE ACCENT will only take up 1 cell.
Related Topics
Show Markers on Google Maps Dynamically -Rails 3.2
Validate That String Contains Only Allowed Characters in Ruby
Parsing Large Xml with Nokogiri
How to Create an Operator for Deep Copy/Cloning of Objects in Ruby
How to Define a Simple Global Variable in an Rspec Test That Can Be Accesed by Helper Functions
Convert Ip Address to 32 Bit Integer in Ruby
Fresh Install of Rails and Getting Openssl Errors: "Already Initialized Constant Openssl"
Minitest, Test::Unit, and Rails
Bundle Uses Wrong Ruby Version
How to Get Error Messages from Ruby Threads
Are There More Elegant Ways to Prevent Negative Numbers in Ruby
Exclude Option from Collection.Map in Ruby on Rails
Ruby: "If !Object.Nil" or "If Object"
How to Set a Proxy in Rubys Net/Http
How to Create a Custom Method for the Rails Console
How to Make a Ruby Gem Package Copy Files to Arbitrary Locations