How to Implement a Good Profanity Filter

How do you implement a good profanity filter?

Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea?

Also, one can't forget The Untold History of Toontown's SpeedChat, where even using a "safe-word whitelist" resulted in a 14-year-old quickly circumventing it with:
"I want to stick my long-necked Giraffe up your fluffy white bunny."

Bottom line: Ultimately, for any system that you implement, there is absolutely no substitute for human review (whether peer or otherwise). Feel free to implement a rudimentary tool to get rid of the drive-by's, but for the determined troll, you absolutely must have a non-algorithm-based approach.

A system that removes anonymity and introduces accountability (something that Stack Overflow does well) is helpful also, particularly in order to help combat John Gabriel's G.I.F.T.

You also asked where you can get profanity lists to get you started -- one open-source project to check out is Dansguardian -- check out the source code for their default profanity lists. There is also an additional third party Phrase List that you can download for the proxy that may be a helpful gleaning point for you.

Edit in response to the question edit: Thanks for the clarification on what you're trying to do. In that case, if you're just trying to do a simple word filter, there are two ways you can do it. One is to create a single long regexp with all of the banned phrases that you want to censor, and merely do a regex find/replace with it. A regex like:

$filterRegex = "(boogers|snot|poop|shucks|argh)"

and run it on your input string using preg_match() to wholesale test for a hit,

or preg_replace() to blank them out.

You can also load those functions up with arrays rather than a single long regex, and for long word lists, it may be more manageable. See the preg_replace() for some good examples as to how arrays can be used flexibly.

For additional PHP programming examples, see this page for a somewhat advanced generic class for word filtering that *'s out the center letters from censored words, and this previous Stack Overflow question that also has a PHP example (the main valuable part in there is the SQL-based filtered word approach -- the leet-speak compensator can be dispensed with if you find it unnecessary).

You also added: "Getting the list of words in the first place is the real question." -- in addition to some of the previous Dansgaurdian links, you may find this handy .zip of 458 words to be helpful.

How to add a profanity filter in C#

After string return_str = myStringBuilder.ToString(); add:

ProfanityFilter filter = new ProfanityFilter();

if (filter.ContainsProfanity(return_str)
{
// Do action when message contains profanity
}
else
{
// Do action when message does not contains profanity
}

Also have in mind that your code is not taking into account line breaks (I don't know if comments cannot have line breaks, just in case of).

JavaScript simple profanity filter

Use some() - it exits from the loop as soon as a match to the condition is found, as such it's more performant than a loop.

let isInclude = badWords.some(word => description.includes(word));

Implementation of PHP profanity filter

$swears = array("shoot", "darn", "heck");

foreach ($swears as $bad_word)
$body = str_replace($bad_word, " ", $body);

This will work as a quick-and-dirty solution, although it does have the Scunthorpe Problem (e.g. "heckler" will get transformed to "ler" if "heck" is on your profanity list).

How to best implement swear words handler (.NET preferred)?

Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea? ^_^

Also see How do you implement a good profanity filter?.

Adding a Profanity Filter to a simple Socket.IO app

Comparing a string with a RegExp object with a boolean operator will always return false. You want to use the test method of RegExp for this. For example:

new RegExp("*bad words*").test(msg);

Also, there exists a shorthand syntax for constructing a regex.

/myregex/.test(msg);


Related Topics



Leave a reply



Submit