Split string into strings by length?
>>> x = "qwertyui"
>>> chunks, chunk_size = len(x), len(x)//4
>>> [ x[i:i+chunk_size] for i in range(0, chunks, chunk_size) ]
['qw', 'er', 'ty', 'ui']
Javascript: split string into strings of specific length
You can use a global regular expression to match repeated instances of 2 digits, and then replace the array items as needed:
const a = '18122122';const dateArray = a.match(/\d{2}/g);dateArray[0] = '20' + dateArray[0];dateArray[1] -= 1;const date = new Date(...dateArray);console.log(date);
How to split a string into substrings of a given length?
Here is one way
substring("aabbccccdd", seq(1, 9, 2), seq(2, 10, 2))
#[1] "aa" "bb" "cc" "cc" "dd"
or more generally
text <- "aabbccccdd"
substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text), 2))
#[1] "aa" "bb" "cc" "cc" "dd"
Edit: This is much, much faster
sst <- strsplit(text, "")[[1]]
out <- paste0(sst[c(TRUE, FALSE)], sst[c(FALSE, TRUE)])
It first splits the string into characters. Then, it pastes together the even elements and the odd elements.
Timings
text <- paste(rep(paste0(letters, letters), 1000), collapse="")
g1 <- function(text) {
substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text), 2))
}
g2 <- function(text) {
sst <- strsplit(text, "")[[1]]
paste0(sst[c(TRUE, FALSE)], sst[c(FALSE, TRUE)])
}
identical(g1(text), g2(text))
#[1] TRUE
library(rbenchmark)
benchmark(g1=g1(text), g2=g2(text))
# test replications elapsed relative user.self sys.self user.child sys.child
#1 g1 100 95.451 79.87531 95.438 0 0 0
#2 g2 100 1.195 1.00000 1.196 0 0 0
What's the best way to split a string into fixed length chunks and work with them in Python?
One solution would be to use this function:
def chunkstring(string, length):
return (string[0+i:length+i] for i in range(0, len(string), length))
This function returns a generator, using a generator comprehension. The generator returns the string sliced, from 0 + a multiple of the length of the chunks, to the length of the chunks + a multiple of the length of the chunks.
You can iterate over the generator like a list, tuple or string - for i in chunkstring(s,n):
, or convert it into a list (for instance) with list(generator)
. Generators are more memory efficient than lists because they generator their elements as they are needed, not all at once, however they lack certain features like indexing.
This generator also contains any smaller chunk at the end:
>>> list(chunkstring("abcdefghijklmnopqrstuvwxyz", 5))
['abcde', 'fghij', 'klmno', 'pqrst', 'uvwxy', 'z']
Example usage:
text = """This is the first line.
This is the second line.
The line below is true.
The line above is false.
A short line.
A very very very very very very very very very long line.
A self-referential line.
The last line.
"""
lines = (i.strip() for i in text.splitlines())
for line in lines:
for chunk in chunkstring(line, 16):
print(chunk)
Split string to equal length substrings in Java
Here's the regex one-liner version:
System.out.println(Arrays.toString(
"Thequickbrownfoxjumps".split("(?<=\\G.{4})")
));
\G
is a zero-width assertion that matches the position where the previous match ended. If there was no previous match, it matches the beginning of the input, the same as \A
. The enclosing lookbehind matches the position that's four characters along from the end of the last match.
Both lookbehind and \G
are advanced regex features, not supported by all flavors. Furthermore, \G
is not implemented consistently across the flavors that do support it. This trick will work (for example) in Java, Perl, .NET and JGSoft, but not in PHP (PCRE), Ruby 1.9+ or TextMate (both Oniguruma). JavaScript's /y
(sticky flag) isn't as flexible as \G
, and couldn't be used this way even if JS did support lookbehind.
I should mention that I don't necessarily recommend this solution if you have other options. The non-regex solutions in the other answers may be longer, but they're also self-documenting; this one's just about the opposite of that. ;)
Also, this doesn't work in Android, which doesn't support the use of \G
in lookbehinds.
How to split string to substrings with given length but not breaking sentences?
The steps I'd take:
- Initiate a list to store the lines and a current
line
variable to store the string of the current line. - Split the paragraph into sentences - this requires you to
.split
on'.'
, remove the trailing empty sentence (""
), strip leading and trailing whitespace (.strip
) and then add the fullstops back. - Loop through these sentences and:
- if the sentence can be added onto the current line, add it
- otherwise add the current working line string to the list of lines and set the current line string to be the current sentence
So, in Python, something like:
para = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer in tellus quam. Nam sit amet iaculis lacus, non sagittis nulla. Nam blandit quam eget velit maximus, eu consectetur sapien sodales. Etiam efficitur blandit arcu, quis rhoncus mauris elementum vel."
lines = []
line = ''
for sentence in (s.strip()+'.' for s in para.split('.')[:-1]):
if len(line) + len(sentence) + 1 >= 80: #can't fit on that line => start new one
lines.append(line)
line = sentence
else: #can fit on => add a space then this sentence
line += ' ' + sentence
giving lines
as:
[
"Lorem ipsum dolor sit amet, consectetur adipiscing elit.Integer in tellus quam.",
"Nam sit amet iaculis lacus, non sagittis nulla.",
"Nam blandit quam eget velit maximus, eu consectetur sapien sodales."
]
In dart, split string into two parts using length of first string
I'd use the solution you published shortening up the definition:
List<String> splitStringByLength(String str, int length) =>
[str.substring(0, length), str.substring(length)];
or using an extension method to call the function:
extension on String {
List<String> splitByLength(int length) =>
[substring(0, length), substring(length)];
}
'helloWorld'.splitByLength(5); // Returns [hello, World].
Related Topics
How to Merge 200 CSV Files in Python
Using Moviepy, Scipy and Numpy in Amazon Lambda
How to Prevent Errno 32 Broken Pipe
Error Installing Geopandas:" a Gdal API Version Must Be Specified " in Anaconda
Fill Username and Password Using Selenium in Python
Writing to Existing Workbook Using Xlwt
Group by & Count Function in SQLalchemy
Using Requests with Tls Doesn't Give Sni Support
How to Get the "Id" After Insert into MySQL Database with Python
How to List Pip Dependencies/Requirements
Python, Https Get with Basic Authentication
Intercepting Stdout of a Subprocess While It Is Running
Improper Use of _New_ to Generate Class Instances
Why Do Some Built-In Python Functions Only Have Pass
Parsing HTML in Python - Lxml or Beautifulsoup? Which of These Is Better for What Kinds of Purposes