How to split string into paragraphs using first comma?
String#split has a second argument, the maximum number of fields returned in the result array:
http://ruby-doc.org/core/classes/String.html#M001165
@address.split(",", 2)
will return an array with two strings, split at the first occurrence of ",".
the rest of it is simply building the string using interpolation or if you want to have it more generic, a combination of Array#map
and #join
for example
@address.split(",", 2).map {|split| "<p>#{split}</p>" }.join("\n")
I am trying to split the text into paragraphs, it is splitting on commas instead of new line or paragraph using javascript
You probably need to do:
subject = subjects.replace(/&/g, '<br>');
not split
How to split text into paragraphs?
Something like this should work for you:
var paragraphMarker = Environment.NewLine + Environment.NewLine;
var paragraphs = fileText.Split(new[] {paragraphMarker},
StringSplitOptions.RemoveEmptyEntries);
foreach (var paragraph in paragraphs)
{
var words = paragraph.Split(new[] {' '},
StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Trim());
//do something
}
You may need to change line delimiter, file can have different variants like "\n"
, "\r"
, "\r\n"
.
Also you can pass specific characters inside Trim
function to remove symbols like '.'
,','
,'!'
,'"'
and others.
Edit: To add more flexibility you can use regexp for splitting paragraphs:
var paragraphs = Regex.Split(fileText, @"(\r\n?|\n){2}")
.Where(p => p.Any(char.IsLetterOrDigit));
foreach (var paragraph in paragraphs)
{
var words = paragraph.Split(new[] {' '},
StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Trim());
//do something
}
Splitting first sentence of string
Strings are immutable. No method or any operation you do to the string in data
will modify the string - functions like .split()
just return new data, they do not mutate the original data.
.split()
returns an array with each part. You are using the second parameter already to limit it to one split, so the rest of the sentences past the first are not returned. Instead, you could get all splits and just re-join the ones after the first.
const data = "The amount spent on gasoline continues to be a large expense to many individuals. The average amount spent on gasoline per year is $2000. This number can easily change as the price of gasoline is rapidly changing."
let sentences = data.split('\.');
let firstSentence = sentences.shift(); //remove and return first element
let rest = sentences.join('.').trim(); //undo split and trim the space
console.log(firstSentence)
console.log(rest)
Split string into sentences in javascript
str.replace(/([.?!])\s*(?=[A-Z])/g, "$1|").split("|")
Output:
[ 'This is a long string with some numbers [125.000,55 and 140.000] and an end.',
'This is another sentence.' ]
Breakdown:
([.?!])
= Capture either .
or ?
or !
\s*
= Capture 0 or more whitespace characters following the previous token ([.?!])
. This accounts for spaces following a punctuation mark which matches the English language grammar.
(?=[A-Z])
= The previous tokens only match if the next character is within the range A-Z (capital A to capital Z). Most English language sentences start with a capital letter. None of the previous regexes take this into account.
The replace operation uses:
"$1|"
We used one "capturing group" ([.?!])
and we capture one of those characters, and replace it with $1
(the match) plus |
. So if we captured ?
then the replacement would be ?|
.
Finally, we split the pipes |
and get our result.
So, essentially, what we are saying is this:
1) Find punctuation marks (one of .
or ?
or !
) and capture them
2) Punctuation marks can optionally include spaces after them.
3) After a punctuation mark, I expect a capital letter.
Unlike the previous regular expressions provided, this would properly match the English language grammar.
From there:
4) We replace the captured punctuation marks by appending a pipe |
5) We split the pipes to create an array of sentences.
How to cut off string after the first line in the paragraph
var firstLine = theString.split('\n')[0];
Split block of text when starts with numbering
I think that this problem can be easily solved using regex. Here's the code:
import re
lines = re.split("\n(?=[0-9])",text)
First, the re.split
function will split a string on all matches of the pattern. The matches themselves thus won't be included in the string.
The pattern starts with \n
, a newline character. Then, we have (?=
, the start of a lookahead group. Lookaheads in regex are parts that need to be behind the match, but aren't included in the match. We don't wan't the number to be included in the match, as that would result in the numbers themselves not to be in the resulting lines.
Inside the lookahead, we have [0-9]
. This means any character from zero to nine, thus any digit. Finally, there is a closing paranthesis to end the lookahead.
Related Topics
Using Will_Paginate Without :Total_Entries to Improve a Lengthy Query
How Rails Delegate Method Works
How to Detect Browser Type and Its Version
Ruby - Create Singleton with Parameters
Active Admin: Sorting on Multiple Columns
What Are Tainted Objects, and When Should We Untaint Them
Convert Hash to Openstruct Recursively
How to Transfer Files Using Ssh and Scp Using Ruby Calls
How to Timeout Flash Messages in Rails
What Is the Purpose of a 'Transient Do' Block in Factorybot Factories
How to Determine Leap Year in Ruby
How to Make the Url's in Ruby on Rails Seo Friendly Knowing a @Vendor.Name
How to Dynamically Call Accessor Methods in Ruby
Where Does Ruby Keep Track of Its Open File Descriptors
Rails 4 Update Nested Attributes
Ruby on Rails: Conditionally Display a Partial