Turn a String into a Valid Filename

Turn a string into a valid filename?

You can look at the Django framework for how they create a "slug" from arbitrary text. A slug is URL- and filename- friendly.

The Django text utils define a function, slugify(), that's probably the gold standard for this kind of thing. Essentially, their code is the following.

import unicodedata
import re

def slugify(value, allow_unicode=False):
"""
Taken from https://github.com/django/django/blob/master/django/utils/text.py
Convert to ASCII if 'allow_unicode' is False. Convert spaces or repeated
dashes to single dashes. Remove characters that aren't alphanumerics,
underscores, or hyphens. Convert to lowercase. Also strip leading and
trailing whitespace, dashes, and underscores.
"""
value = str(value)
if allow_unicode:
value = unicodedata.normalize('NFKC', value)
else:
value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
value = re.sub(r'[^\w\s-]', '', value.lower())
return re.sub(r'[-\s]+', '-', value).strip('-_')

And the older version:

def slugify(value):
"""
Normalizes string, converts to lowercase, removes non-alpha characters,
and converts spaces to hyphens.
"""
import unicodedata
value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
value = unicode(re.sub('[^\w\s-]', '', value).strip().lower())
value = unicode(re.sub('[-\s]+', '-', value))
# ...
return value

There's more, but I left it out, since it doesn't address slugification, but escaping.

How to make a valid filename from an arbitrary string in Javascript?

Huge thanks to Kelvin's answer!

I quickly compiled it into a function. The final code I used is:

function convertToValidFilename(string) {
return (string.replace(/[\/|\\:*?"<>]/g, " "));
}

var string = 'Un éléphant à l\'orée du bois/An elephant at the edge of the woods".txt';

console.log("Before = ", string);
console.log("After = ", convertToValidFilename(string));

This results in the output:

Before =  Un éléphant à l'orée du bois/An elephant at the edge of the woods".txt
After = Un éléphant à l orée du bois An elephant at the edge of the woods .txt

Elegant way in python to make sure a string is suitable as a filename?

pathvalidate is a Python library to sanitize/validate a string such as filenames/file-paths/etc.

This library provides both utilities for validation of paths:

import sys
from pathvalidate import ValidationError, validate_filename

try:
validate_filename("fi:l*e/p\"a?t>h|.t<xt")
except ValidationError as e:
print("{}\n".format(e), file=sys.stderr)

And utilities for sanitizing paths:

from pathvalidate import sanitize_filename

fname = "fi:l*e/p\"a?t>h|.t<xt"
print("{} -> {}".format(fname, sanitize_filename(fname)))

How to make a valid Windows filename from an arbitrary string?

Try something like this:

string fileName = "something";
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
fileName = fileName.Replace(c, '_');
}

Edit:

Since GetInvalidFileNameChars() will return 10 or 15 chars, it's better to use a StringBuilder instead of a simple string; the original version will take longer and consume more memory.

Converting user input string into a valid file name

The primary concern should be the user experience: The user can type anything to identify the file. When coming back to the data she would expect to see exactly the same string she typed in.

The best way to handle this information is to store the actual input somewhere else and use a mapping to get to the actual file.

You could just use a dictionary saved in a plist file. The dictionary would contain the user input as key and a UUID as value. The file is then saved using the UUID as a file name. This way you are sure that the filename is always valid and the user can type whatever she wants without fear for invalid filenames.

An advantage over just stripping invalid characters is that the user can use for instance "/" and "//" as valid identifiers if she feels like it.

Create (sane/safe) filename from any (unsafe) string

Python:

"".join([c for c in filename if c.isalpha() or c.isdigit() or c==' ']).rstrip()

this accepts Unicode characters but removes line breaks, etc.

example:

filename = u"ad\nbla'{-+\)(ç?"

gives: adblaç

edit
str.isalnum() does alphanumeric on one step. – comment from queueoverflow below. danodonovan hinted on keeping a dot included.

    keepcharacters = (' ','.','_')
"".join(c for c in filename if c.isalnum() or c in keepcharacters).rstrip()

Turn a string into a valid filename in PHP

You could try something like this:

setlocale(LC_ALL, 'en_US.utf8');
$brand_name = iconv('utf-8', 'us-ascii//TRANSLIT', $_GET['brand-name']);

From Convert Accented Characters to Non-Accented in PHP

Python function to make arbitrary strings valid filenames

import re

arbitrary_string = "File!name?.txt"
cleaned_up_filename = re.sub(r'[/\\:*?"<>|]', '', arbitrary_string)
filepath = os.path.join("/tmp", cleaned_up_filename)

with open(filepath, 'wb') as f:
# ...

Taken from User gx

Obviously adapt to your situation.

How to convert strings in any language and character set to valid filenames in Java?

Regex [^a-zA-Z0-9] will filter non-ASCII characters which will omit Unicode characters or characters above 128 codepoints.

Assuming that you want to filter user input for valid file-names by replacing invalid file-name characters such as ? \ / : | < > * with underscore (_):

import java.io.UnsupportedEncodingException;

public class ReplaceI18N {

public static void main(String[] args) {
String[] names = {
"John Smith",
"高岡和子",
"محمد سعيد بن عبد العزيز الفلسطيني",
"|J:o<h>n?Sm\\it/h*",
"高?岡和\\子*",
"محمد /سعيد بن عبد ?العزيز :الفلسطيني\\"
};

for(String s: names){
String u = s;
try {
u = new String(s.getBytes(), "UTF-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
u = u.replaceAll("[\\?\\\\/:|<>\\*]", " "); //filter ? \ / : | < > *
u = u.replaceAll("\\s+", "_");
System.out.println(s + " = " + u);
}
}
}

The output:

John Smith = John_Smith
高岡和子 = 高岡和子
محمد سعيد بن عبد العزيز الفلسطيني = محمد_سعيد_بن_عبد_العزيز_الفلسطيني
|J:o<h>n?Sm\it/h* = _J_o_h_n_Sm_it_h_
高?岡和\子* = 高_岡和_子_
محمد /سعيد بن عبد ?العزيز :الفلسطيني\ = محمد_سعيد_بن_عبد_العزيز_الفلسطيني_

The valid filenames even with Unicode characters will be displayable on any webpage that supports UTF-8 encoding with the correct Unicode font.

In addition, each will be the correct name for its file on any OS file-system that supports Unicode (tested OK on Windows XP, Windows 7).

i18n filenames

But, if you want to pass each valid filename as a URL string, make sure to encode it properly using URLEncoder and later decode each encoded URL using URLDecoder.



Related Topics



Leave a reply



Submit