How to Split a String into a List of Characters

Split a list of strings into their individual characters Python

Strings are iterable in Python, so you can just loop through them like this:

list = ['foo', 'bar', 'bak']

for item in list:
for character in item:
print(character)

What is the simplest way to split a string into a list of characters?

This is perhaps the simplest, though certainly not the most efficient:

let split = s =>
s |> Js.String.split("")
|> Array.to_list
|> List.map(s => s.[0])

This is more efficient, and cross-platform:

let split = s => {
let rec aux = (acc, i) =>
if (i >= 0) {
aux([s.[i], ...acc], i - 1)
} else {
acc
}

aux([], String.length(s) - 1)
}

I don't think it usually makes much sense to convert a string to a list though, since the conversion will have significant overhead regardless of method and it'd be better to just iterate the string directly. If it does make sense it's probably when the strings are small enough that the difference between the first and second method matters little.

Split string into array of character strings

"cat".split("(?!^)")

This will produce

array ["c", "a", "t"]

Split user input string into a list with every character

You can use builtin list() function:

>>> list("A string") 
['A', ' ', 's', 't', 'r', 'i', 'n', 'g']

In your case, you can call list(getMessage()) to convert the contents of the file to chars.

How to split the characters of a string by spaces and then resultant elements of list by special characters and numbers and then again join them?

You can build you regular expression with the keys of your dictionary, ensuring they're not enclosed in another word (i.e. not directly preceded nor followed by a letter):

import re
def standarisationn(addr):
addr = re.sub(r'(,|\s+)', " ", addr)
lookp_dict = {"allee":"ale","alley":"ale","ally":"ale","aly":"ale",
"arcade":"arc",
"apartment":"apt","aprtmnt":"apt","aptmnt":"apt",
"av":"ave","aven":"ave","avenu":"ave","avenue":"ave","avn":"ave","avnue":"ave",
"beach":"bch",
"bend":"bnd",
"blfs":"blf","bluf":"blf","bluff":"blf","bluffs":"blf",
"boul":"blvd","boulevard":"blvd","boulv":"blvd",
"bottm":"bot","bottom":"bot",
"branch":"br","brnch":"br",
"brdge":"brg","bridge":"brg",
"bypa":"byp","bypas":"byp","bypass":"byp","byps":"byp",
"camp":"cmp",
"canyn":"cny","canyon":"cny","cnyn":"cny",
"southwest":"sw" ,"northwest":"nw"}

for wrd in lookp_dict:
addr = re.sub(rf'(?:^|(?<=[^a-zA-Z])){wrd}(?=[^a-zA-Z]|$)', lookp_dict[wrd], addr)
return addr

print(standarisationn("well-2-34 2 @$%23beach bend com"))

The expression is built in three parts:

  • ^ matches the beginning of the string
  • (?<=[^a-zA-Z]) is a lookbehind (ie a non capturing expression), checking that the preceding character is a letter
  • {wrd} is the key of your dictionary
  • (?=[^a-zA-Z]|$) is a lookahead (ie a non capturing expression), checking that the following character is a letter or the end of the string

Output:

well-2-34 2 @$%23bch bnd com

Edit: you can compile a whole expression and use re.sub only once if you replace the loop with:

repl_pattern = re.compile(rf"(?:^|(?<=[^a-zA-Z]))({'|'.join(lookp_dict.keys())})(?=([^a-zA-Z]|$))")
addr = re.sub(repl_pattern, lambda x: lookp_dict[x.group(1)], addr)

This should be much faster if your dictionary grows because we build a single expression with all your dictionary keys:

  • ({'|'.join(lookp_dict.keys())}) is interpreted as (allee|alley|...
  • a lambda function in re.sub replaces the matching element with the corresponding value in lookp_dict (see for example this link for more details about this)


Related Topics



Leave a reply



Submit