Pandas: How to return rows where a column has a line breaks/new line ( \n ) with one of several case-sensitive words coming directly after?
Try the below code:
>>> testdf[testdf['A'].str.contains('\nRESULTS|\nMETHODS|\nBACKGROUND')]
A
0 generates the final summary. \nRESULTS We eva...
1 the cat and bat \n\n\nRESULTS\n teamed up to f...
4 the cat and bat \n\n\nMETHODS\n teamed up to f...
6 generates the final summary. \nBACKGROUND We ...
>>>
Pandas: How to return rows where a column has a line breaks/new line ( \n ) in its cell?
Can you try below:
import re
df1 = testdf[testdf['B'].str.contains('\nRESULTS', flags = re.IGNORECASE)]
df1
#output
A B
0 test1 generates the final summary. \nRESULTS We eva...
1 test2 the cat and bat \n\n\nRESULTS\n teamed up to f...
How to put linebreaks between list elements within the same Pandas dataframe cell?
Both of these work:
df['col_b'].apply(lambda x: '<br>'.join(x)).style
df['col_b'].str.join('<br>').style
I'm using the latter as it is shorter.
Note I'm working inside Jupyter notebook, and it needs <br>
instead of \n
and then calling .style
to display the way I want it to.
regaining original line breaks pandas \n
We regex split Text
and Alt_Text
using capturing parentheses in the pattern:
If capturing parentheses are used in pattern, then the text of all
groups in the pattern are also returned as part of the resulting list.
Then we zip
both lists taking separators containing line breaks from Text
and anything else from Alt_Text
and join
the resulting list into New_Text
:
def insert_line_breaks(text, alt_text):
regex = re.compile(r'([^ \n\[\]]+)')
text = regex.split(text)
alt_text = regex.split(alt_text)
return ''.join([t if '\n' in t else a for t,a in zip(text,alt_text)])
df['New_Text'] = df.apply(lambda r: insert_line_breaks(r.Text, r.Alt_Text), axis=1)
I guess there should be a space between the second *B*
and So
in the last row of Alt_Text
and the J
before the first *B*
in the desired output is just a typo. In this case we get:
>>> df.New_Text
0 \n[STUFF]\nBut the here is \n\nCase ID : *A* Date is Here \nfollow\n
1 \n[OTHER]\n\n\nFound *B* *B* \nhere\n BATH # : *A* MR # *C*
2 \n[ANY]\n*B* *B* So so \nCase ID : *A* Date\n\n\n hey the \n\n \n \n\n\n
Explode pandas dataframe into separate rows by splitting column by newlines
Sample:
df = pd.DataFrame({'A':['A0','A1'],
'B':['B0', 'split this\n\n into \r\n separate \n rows \n'],
'index_col':[0,1]})
print (df)
A B index_col
0 A0 B0 0
1 A1 split this\n\n into \r\n separate \n rows \n 1
Your solution should be changed with DataFrame.set_index
, Series.str.replace
added expand=True
to Series.str.split
for DataFrame
and last fitler out empty strings from B
by DataFrame.query
:
df1 = (df.set_index('index_col')['B']
.str.replace('\r', ' ')
.str.split('\n', expand=True)
.stack()
.rename('B')
.reset_index(level=1, drop=True)
.reset_index()[['B', 'index_col']]
.query("B != ''"))
print (df1)
B index_col
0 B0 0
1 split this 1
3 into 1
4 separate 1
5 rows 1
For pandas 0.25+ is possible use DataFrame.explode
:
df['B'] = df['B'].str.replace('\r', ' ').str.split('\n')
df1 = df[['B', 'index_col']].explode('B').query("B != ''")
print (df1)
B index_col
0 B0 0
1 split this 1
1 into 1
1 separate 1
1 rows 1
How to add a line break in a string inside a DataFrame?
So the problem lies when you replace r\s+ which also matches line breaks and replaces them with white spaces.
source.
If you comment your line then following will retain the newline character in strings.
import spintax
df = pd.DataFrame()
for i in range(0, 50):
data = spintax.spin("{option1|option2}" + "\n" + " blablabla ")
df = df.append({'A': data}, ignore_index=True)
# df['A'] = df['A'].str.replace(r'\s+', " ")
print(df)
Is that what you wanted to achieve?
Related Topics
Valueerror: Too Many Values to Unpack (Expected 2) in Django
Split a String At Uppercase Letters
Fastest 2D Convolution or Image Filter in Python
Issue Skipping Song by Requester
Construct Networkx Graph from Pandas Dataframe
Recursive Function to Go Inside Dictionary of Dictionary
Navigating Through Pagination With Selenium in Python
Count Duplicates Between 2 Lists
How to Periodically Execute a Function With Asyncio
How to Get a Value from a Cell of a Dataframe
How to Extract Integer or Float from String
Formatting Datetime Xlabels in Matplotlib (Pandas Df.Plot() Method)
How to Find Duplicate Values in a List and Merge Them
How to Sort a List of Lists by a Specific Index of the Inner List
Spark Add New Column With Value Form Previous Some Columns
Python Searching for Partial Matches in a List
Making Python Dictionary from a Text File With Multiple Keys