How to Split String into Paragraphs Using First Comma

How to split string into paragraphs using first comma?

String#split has a second argument, the maximum number of fields returned in the result array:
http://ruby-doc.org/core/classes/String.html#M001165

@address.split(",", 2) will return an array with two strings, split at the first occurrence of ",".

the rest of it is simply building the string using interpolation or if you want to have it more generic, a combination of Array#map and #join for example

@address.split(",", 2).map {|split| "<p>#{split}</p>" }.join("\n")

I am trying to split the text into paragraphs, it is splitting on commas instead of new line or paragraph using javascript

You probably need to do:

subject = subjects.replace(/&/g, '<br>');

not split

How to split text into paragraphs?

Something like this should work for you:

        var paragraphMarker = Environment.NewLine + Environment.NewLine;
var paragraphs = fileText.Split(new[] {paragraphMarker},
StringSplitOptions.RemoveEmptyEntries);
foreach (var paragraph in paragraphs)
{
var words = paragraph.Split(new[] {' '},
StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Trim());
//do something
}

You may need to change line delimiter, file can have different variants like "\n", "\r", "\r\n".

Also you can pass specific characters inside Trim function to remove symbols like '.',',','!','"' and others.

Edit: To add more flexibility you can use regexp for splitting paragraphs:

        var paragraphs = Regex.Split(fileText, @"(\r\n?|\n){2}")
.Where(p => p.Any(char.IsLetterOrDigit));
foreach (var paragraph in paragraphs)
{
var words = paragraph.Split(new[] {' '},
StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Trim());
//do something
}

Splitting first sentence of string

Strings are immutable. No method or any operation you do to the string in data will modify the string - functions like .split() just return new data, they do not mutate the original data.

.split() returns an array with each part. You are using the second parameter already to limit it to one split, so the rest of the sentences past the first are not returned. Instead, you could get all splits and just re-join the ones after the first.

 const data = "The amount spent on gasoline continues to be a large expense to many individuals. The average amount spent on gasoline per year is $2000. This number can easily change as the price of gasoline is rapidly changing."

let sentences = data.split('\.');
let firstSentence = sentences.shift(); //remove and return first element
let rest = sentences.join('.').trim(); //undo split and trim the space

console.log(firstSentence)
console.log(rest)

Split string into sentences in javascript

str.replace(/([.?!])\s*(?=[A-Z])/g, "$1|").split("|")

Output:

[ 'This is a long string with some numbers [125.000,55 and 140.000] and an end.',
'This is another sentence.' ]

Breakdown:

([.?!]) = Capture either . or ? or !

\s* = Capture 0 or more whitespace characters following the previous token ([.?!]). This accounts for spaces following a punctuation mark which matches the English language grammar.

(?=[A-Z]) = The previous tokens only match if the next character is within the range A-Z (capital A to capital Z). Most English language sentences start with a capital letter. None of the previous regexes take this into account.


The replace operation uses:

"$1|"

We used one "capturing group" ([.?!]) and we capture one of those characters, and replace it with $1 (the match) plus |. So if we captured ? then the replacement would be ?|.

Finally, we split the pipes | and get our result.


So, essentially, what we are saying is this:

1) Find punctuation marks (one of . or ? or !) and capture them

2) Punctuation marks can optionally include spaces after them.

3) After a punctuation mark, I expect a capital letter.

Unlike the previous regular expressions provided, this would properly match the English language grammar.

From there:

4) We replace the captured punctuation marks by appending a pipe |

5) We split the pipes to create an array of sentences.

How to cut off string after the first line in the paragraph

var firstLine = theString.split('\n')[0];

Split block of text when starts with numbering

I think that this problem can be easily solved using regex. Here's the code:

import re

lines = re.split("\n(?=[0-9])",text)

First, the re.split function will split a string on all matches of the pattern. The matches themselves thus won't be included in the string.

The pattern starts with \n, a newline character. Then, we have (?=, the start of a lookahead group. Lookaheads in regex are parts that need to be behind the match, but aren't included in the match. We don't wan't the number to be included in the match, as that would result in the numbers themselves not to be in the resulting lines.

Inside the lookahead, we have [0-9]. This means any character from zero to nine, thus any digit. Finally, there is a closing paranthesis to end the lookahead.



Related Topics



Leave a reply



Submit