Split String into Strings by Length

Split string into strings by length?

>>> x = "qwertyui"
>>> chunks, chunk_size = len(x), len(x)//4
>>> [ x[i:i+chunk_size] for i in range(0, chunks, chunk_size) ]
['qw', 'er', 'ty', 'ui']

Javascript: split string into strings of specific length

You can use a global regular expression to match repeated instances of 2 digits, and then replace the array items as needed:

const a = '18122122';const dateArray = a.match(/\d{2}/g);dateArray[0] = '20' + dateArray[0];dateArray[1] -= 1;const date = new Date(...dateArray);console.log(date);

How to split a string into substrings of a given length?

Here is one way

substring("aabbccccdd", seq(1, 9, 2), seq(2, 10, 2))
#[1] "aa" "bb" "cc" "cc" "dd"

or more generally

text <- "aabbccccdd"
substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text), 2))
#[1] "aa" "bb" "cc" "cc" "dd"

Edit: This is much, much faster

sst <- strsplit(text, "")[[1]]
out <- paste0(sst[c(TRUE, FALSE)], sst[c(FALSE, TRUE)])

It first splits the string into characters. Then, it pastes together the even elements and the odd elements.

Timings

text <- paste(rep(paste0(letters, letters), 1000), collapse="")
g1 <- function(text) {
substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text), 2))
}
g2 <- function(text) {
sst <- strsplit(text, "")[[1]]
paste0(sst[c(TRUE, FALSE)], sst[c(FALSE, TRUE)])
}
identical(g1(text), g2(text))
#[1] TRUE
library(rbenchmark)
benchmark(g1=g1(text), g2=g2(text))
# test replications elapsed relative user.self sys.self user.child sys.child
#1 g1 100 95.451 79.87531 95.438 0 0 0
#2 g2 100 1.195 1.00000 1.196 0 0 0

What's the best way to split a string into fixed length chunks and work with them in Python?

One solution would be to use this function:

def chunkstring(string, length):
return (string[0+i:length+i] for i in range(0, len(string), length))

This function returns a generator, using a generator comprehension. The generator returns the string sliced, from 0 + a multiple of the length of the chunks, to the length of the chunks + a multiple of the length of the chunks.

You can iterate over the generator like a list, tuple or string - for i in chunkstring(s,n):
, or convert it into a list (for instance) with list(generator). Generators are more memory efficient than lists because they generator their elements as they are needed, not all at once, however they lack certain features like indexing.

This generator also contains any smaller chunk at the end:

>>> list(chunkstring("abcdefghijklmnopqrstuvwxyz", 5))
['abcde', 'fghij', 'klmno', 'pqrst', 'uvwxy', 'z']

Example usage:

text = """This is the first line.
This is the second line.
The line below is true.
The line above is false.
A short line.
A very very very very very very very very very long line.
A self-referential line.
The last line.
"""

lines = (i.strip() for i in text.splitlines())

for line in lines:
for chunk in chunkstring(line, 16):
print(chunk)

Split string to equal length substrings in Java

Here's the regex one-liner version:

System.out.println(Arrays.toString(
"Thequickbrownfoxjumps".split("(?<=\\G.{4})")
));

\G is a zero-width assertion that matches the position where the previous match ended. If there was no previous match, it matches the beginning of the input, the same as \A. The enclosing lookbehind matches the position that's four characters along from the end of the last match.

Both lookbehind and \G are advanced regex features, not supported by all flavors. Furthermore, \G is not implemented consistently across the flavors that do support it. This trick will work (for example) in Java, Perl, .NET and JGSoft, but not in PHP (PCRE), Ruby 1.9+ or TextMate (both Oniguruma). JavaScript's /y (sticky flag) isn't as flexible as \G, and couldn't be used this way even if JS did support lookbehind.

I should mention that I don't necessarily recommend this solution if you have other options. The non-regex solutions in the other answers may be longer, but they're also self-documenting; this one's just about the opposite of that. ;)

Also, this doesn't work in Android, which doesn't support the use of \G in lookbehinds.

How to split string to substrings with given length but not breaking sentences?

The steps I'd take:

  • Initiate a list to store the lines and a current line variable to store the string of the current line.
  • Split the paragraph into sentences - this requires you to .split on '.', remove the trailing empty sentence (""), strip leading and trailing whitespace (.strip) and then add the fullstops back.
  • Loop through these sentences and:

    • if the sentence can be added onto the current line, add it
    • otherwise add the current working line string to the list of lines and set the current line string to be the current sentence

So, in Python, something like:

para = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer in tellus quam. Nam sit amet iaculis lacus, non sagittis nulla. Nam blandit quam eget velit maximus, eu consectetur sapien sodales. Etiam efficitur blandit arcu, quis rhoncus mauris elementum vel."
lines = []
line = ''
for sentence in (s.strip()+'.' for s in para.split('.')[:-1]):
if len(line) + len(sentence) + 1 >= 80: #can't fit on that line => start new one
lines.append(line)
line = sentence
else: #can fit on => add a space then this sentence
line += ' ' + sentence

giving lines as:

[
"Lorem ipsum dolor sit amet, consectetur adipiscing elit.Integer in tellus quam.",
"Nam sit amet iaculis lacus, non sagittis nulla.",
"Nam blandit quam eget velit maximus, eu consectetur sapien sodales."
]

In dart, split string into two parts using length of first string

I'd use the solution you published shortening up the definition:

List<String> splitStringByLength(String str, int length) =>
[str.substring(0, length), str.substring(length)];

or using an extension method to call the function:

extension on String {
List<String> splitByLength(int length) =>
[substring(0, length), substring(length)];
}

'helloWorld'.splitByLength(5); // Returns [hello, World].


Related Topics



Leave a reply



Submit