Splitting a String at Every N-Th Character

split-string-every-nth-character with nth+1 separator '0'

You can use a regular expression with capture groups.

import re

instr = '01110100001101001001110100'
outlist = list(sum(re.findall(r'(\d{8})(\d)', instr), ()))

print(outlist)

re.findall() returns a list of tuples, list(sum(..., ()) flattens it into a single list.

Split string by Nth occurrence of a character

You can do it by splitting on , and joining in chunks:

seq = my_string.split(',')
size = 5
[','.join(seq[pos:pos + size]) for pos in range(0, len(seq), size)]

Output:

['abc,kjj,hg,kj,ls', 'jsh,ku,lo,sasad,hh', 'da']

Split string every nth character from the right?

You can adapt the answer you linked, and use the beauty of mod to create a nice little one-liner:

>>> s = '1234567890'
>>> '/'.join([s[0:len(s)%3]] + [s[i:i+3] for i in range(len(s)%3, len(s), 3)])
'1/234/567/890'

and if you want this to auto-add the dot for the cases like your first example of:

s = '100243'

then you can just add a mini ternary use or as suggested by @MosesKoledoye:

>>> '/'.join(([s[0:len(s)%3] or '.']) + [s[i:i+3] for i in range(len(s)%3, len(s), 3)])
'./100/243'

This method will also be faster than reversing the string before hand or reversing a list.

Python split string every n character

Those are called n-grams.

This should work :)

text = "BANANA"
n = 2
chars = [c for c in text]
ngrams = []
for i in range(len(chars)-n + 1):
ngram = "".join(chars[i:i+n])
ngrams.append(ngram)
print(ngrams)

output: ['BA', 'AN', 'NA, 'AN', 'NA']

How to split a string every N words

You can create a regex that describes this pattern.
e.g. "((?:\w+\s*){4})"

Sample Image

Or in simple words:

  • The \w+\s* part means that there are 1 or multiple word-characters (e.g. text, digits) followed by 0, 1 or multiple whitespace characters.

  • It is surrounded in braces and followed by {4} to indicate that we want this to occur 4 times.

  • Finally that again is wrapped in braces, because we want to capture that result.

  • By contrast the braces which were used to specify {4} are preceded by a (?: ...) prefix, which makes it a "non-capturing-group". We don't want to capture the individual matches just yet.

You can use that pattern in java to extract each chunk of 4 occurrences.

Sample Image

And than next, you can simply split each individual result with a second regex, \s+ ( = whitespace)

Edit

One more thing, you may notice that the first matched group also contains whitespace at the end. You can get rid of that with a more advanced regex: ((?:\w+\s+){3}(?:\w+))\s*

Sample Image

Splitting a string at every n-th character

You could do it like this:

String s = "1234567890";
System.out.println(java.util.Arrays.toString(s.split("(?<=\\G...)")));

which produces:

[123, 456, 789, 0]

The regex (?<=\G...) matches an empty string that has the last match (\G) followed by three characters (...) before it ((?<= ))



Related Topics



Leave a reply



Submit