Validating User's Utf-8 Name in JavaScript

Validating user's UTF-8 name in Javascript

The XRegExp library Unicode plugin adds Unicode character class support (like "\p{L}") to JavaScript regular expressions.

Validating UTF-8 names in JavaScript (Node.js) with XRegExp

var XRegExp = require('xregexp').XRegExp;
var re = new XRegExp('^\\p{L}+$');

console.log(re.test('Ciesiołkiewicz'));
console.log(re.test('1Ciesiołkiewicz2'));
console.log(re.test('привет'));
console.log(re.test('пр1вет'));

> true
> false
> true
> false

works perfectly.

Javascript RegEx UTF-8

Again: I don't know if this is the answer you are looking for. This will also recapitalize the first letter of the name. So if I'm writing "My name is Salvador Dalí" the answer is: "Hello, Salvador Dalí! Nice to meet you!"

var myInput = document.getElementById("myInput");
function myFunction() { var text, answer = myInput.value.toLowerCase(); answer = answer.replace("my name is ", "");
switch (answer) { case "": text = "Please type something."; break; default: text = "Hello, " + CapitalizeName(answer) + "! Nice to meet you!"; } document.getElementById("reply").innerHTML = text;}
function CapitalizeName(name) { let _array = name.split(" "); let n_array = []; _array.map(w => { w = w.charAt(0).toUpperCase() + w.slice(1); n_array.push(w); }); return n_array.join(" ");}
<p>What is your name?</p>
<input id="myInput" type="text">
<button onclick="myFunction()">Go</button>
<p id="reply"></p>

How to validate non-english (UTF-8) encoded email address in Javascript and PHP?

Attempting to validate email addresses may not be a good idea. The specifications (RFC5321, RFC5322) allow for so much flexibility that validating them with regular expressions is literally impossible, and validating with a function is a great deal of work. The result of this is that most email validation schemes end up rejecting a large number of valid email addresses, much to the inconvenience of the users. (By far the most common example of this is not allowing the + character.)

It is more likely that the user will (accidentally or deliberately) enter an incorrect email address than in an invalid one, so actually validating is a great deal of work for very little benefit, with possible costs if you do it incorrectly.

I would recommend that you just check for the presence of an @ character on the client and then send a confirmation email to verify it; it's the most practical way to validate and it confirms that the address is correct as well.

Multi-language input validation with UTF-8 encoding

You can approximate the Unicode derived property \p{Alphabetic} pretty succintly with [\pL\pM\p{Nl}] if your language doensn’t support a proper Alphabetic property directly.

Don’t use Java’s \p{Alpha}, because that’s ASCII-only.

But then you’ll notice that you’ve failed to account for dashes (\p{Pd} or DashPunctuation works, but that does not include most of the hyphens!), apostrophes (usually but not always one of U+27, U+2BC, U+2019, or U+FF07), comma, or full stop/period.

You probably had better include \p{Pc} ConnectorPunctuation, just in case.

If you have the Unicode derived property \p{Diacritic}, you should use that, too, because it includes things like the mid-dot needed for geminated L’s in Catalan and the non-combining forms of diacritic marks which people sometimes use.

But then you’ll find people who use ordinal numbers in their names in ways that \p{Nl} (LetterNumber) doesn’t accomodate, so you throw \p{Nd} (DecimalNumber) or even all of \pN (Number) into the mix.

Then you realize that Asian names often require the use of ZWJ or ZWNJ to be written correctly in their scripts, so then you have to add U+200D and U+200C to the mix, which are both \p{Cf} (Format) characters and indeed also JoinControl ones.

By the time you’re done looking up the various Unicode properties for the various and many exotic characters that keep cropping up — or when you think you’re done, rather — you’re almost certain to conclude that you would do a much better job at this if you simply allowed them to use whatever Unicode characters for their name that they wish, as the link Tim cites advises. Yes, you’ll get a few jokers putting in things like “əɯɐuʇƨɐ⅂ əɯɐuʇƨɹᴉℲ”, but that just goes with the territory, and you can’t preclude silly names in any reasonable way.



Related Topics



Leave a reply



Submit