Split String by New Line Characters

Split Java String by New Line

This should cover you:

String lines[] = string.split("\\r?\\n");

There's only really two newlines (UNIX and Windows) that you need to worry about.

How to split a string containing newlines

To treat a CRLF sequence as a whole as the separator, it's simpler to use the -split operator, which is regex-based:

PS> "This is `r`n`r`n a string." -split '\r?\n'
This is
a string.

Note:

  • \r?\n matches both CRLF (Windows-style) and LF-only (Unix-style) newlines; use \r\n if you really only want to match CRLF sequences.

    • Note the use of a single-quoted string ('...'), so as to pass the string containing the regex as-is through to the .NET regex engine; the regex engine uses \ as the escape character; hence the use of \r and \n.
  • PowerShell's -split operator is a generally superior alternative to the [string] .NET type's .Split() method - see this answer.


As for what you tried:

The separator argument, [Environment]::NewLine, on Windows is the string "`r`n", i.e. a CRLF sequence.

  • In PowerShell [Core] v6+, your command does work, because this string as a whole is considered the separator.

  • In Windows PowerShell, as Steven points out in his helpful answer, the individual characters - CR and LF separately are considered separators, resulting in an extra, empty element - the empty string between the CR and the LF - in the result array.

This change in behavior happened outside of PowerShell's control: .NET Core introduced a new .Split() method overload with a [string]-typed separator parameter, which PowerShell's overload-resolution algorithm now selects over the older overload with the [char[]]-typed parameter.

Avoiding such unavoidable (albeit rare) inadvertent behavioral changes is another good reason to prefer the PowerShell-native -split operator over the .NET [string] type's .Split() method.

Split a string by a newline in C#

var result = mystring.Split(new string[] {"\\n"}, StringSplitOptions.None);

Since the new line is glued to the words in your case, you have to use an additional back-slash.

Easiest way to split a string on newlines in .NET?

To split on a string you need to use the overload that takes an array of strings:

string[] lines = theText.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None
);

Edit:

If you want to handle different types of line breaks in a text, you can use the ability to match more than one string. This will correctly split on either type of line break, and preserve empty lines and spacing in the text:

string[] lines = theText.Split(
new string[] { "\r\n", "\r", "\n" },
StringSplitOptions.None
);

Split string using a newline delimiter with Python

str.splitlines method should give you exactly that.

>>> data = """a,b,c
... d,e,f
... g,h,i
... j,k,l"""
>>> data.splitlines()
['a,b,c', 'd,e,f', 'g,h,i', 'j,k,l']

Separate string by whitespace, but keep newlines in split array

You could first replace all \n with \n (newline and a space) and then do a simple split on the space character.

    String input = "Hello \n\n\nworld!";
String replacement = input.replace("\n", "\n ");
String[] result = replacement.split(" ");
  • input: "Hello \n\n\nworld!"
  • replacement: "Hello \n \n \n world!"
  • result: ["Hello", "\n", "\n", "\n", "world!"]

Note: my example does not handle the final exclamation mark - but it seems you already know how to handle that.

Split string in Python while keeping the line break inside the generated list

Split String using Regex findall()

import re

my_string = "This is a test.\nAlso\tthis"
my_list = re.findall(r"\S+|\n", my_string)

print(my_list)

How it Works:

  • "\S+": "\S" = non whitespace characters. "+" is a greed quantifier so it find any groups of non-whitespace characters aka words
  • "|": OR logic
  • "\n": Find "\n" so it's returned as well in your list

Output:

['This', 'is', 'a', 'test.', '\n', 'Also', 'this']


Related Topics



Leave a reply



Submit