Pandas: How to Return Rows Where a Column Has a Line Breaks/New Line ( \N ) in Its Cell

Pandas: How to return rows where a column has a line breaks/new line ( \n ) with one of several case-sensitive words coming directly after?

Try the below code:

>>> testdf[testdf['A'].str.contains('\nRESULTS|\nMETHODS|\nBACKGROUND')]
A
0 generates the final summary. \nRESULTS We eva...
1 the cat and bat \n\n\nRESULTS\n teamed up to f...
4 the cat and bat \n\n\nMETHODS\n teamed up to f...
6 generates the final summary. \nBACKGROUND We ...
>>>

Pandas: How to return rows where a column has a line breaks/new line ( \n ) in its cell?

Can you try below:

import re
df1 = testdf[testdf['B'].str.contains('\nRESULTS', flags = re.IGNORECASE)]
df1
#output
A B
0 test1 generates the final summary. \nRESULTS We eva...
1 test2 the cat and bat \n\n\nRESULTS\n teamed up to f...

How to put linebreaks between list elements within the same Pandas dataframe cell?

Both of these work:

df['col_b'].apply(lambda x: '<br>'.join(x)).style

df['col_b'].str.join('<br>').style

I'm using the latter as it is shorter.

Note I'm working inside Jupyter notebook, and it needs <br> instead of \n and then calling .style to display the way I want it to.

regaining original line breaks pandas \n

We regex split Text and Alt_Text using capturing parentheses in the pattern:

If capturing parentheses are used in pattern, then the text of all
groups in the pattern are also returned as part of the resulting list.

Then we zip both lists taking separators containing line breaks from Text and anything else from Alt_Text and join the resulting list into New_Text:

def insert_line_breaks(text, alt_text):
regex = re.compile(r'([^ \n\[\]]+)')
text = regex.split(text)
alt_text = regex.split(alt_text)
return ''.join([t if '\n' in t else a for t,a in zip(text,alt_text)])

df['New_Text'] = df.apply(lambda r: insert_line_breaks(r.Text, r.Alt_Text), axis=1)

I guess there should be a space between the second *B* and So in the last row of Alt_Text and the J before the first *B* in the desired output is just a typo. In this case we get:

>>> df.New_Text
0 \n[STUFF]\nBut the here is \n\nCase ID : *A* Date is Here \nfollow\n
1 \n[OTHER]\n\n\nFound *B* *B* \nhere\n BATH # : *A* MR # *C*
2 \n[ANY]\n*B* *B* So so \nCase ID : *A* Date\n\n\n hey the \n\n \n \n\n\n

Explode pandas dataframe into separate rows by splitting column by newlines

Sample:

df = pd.DataFrame({'A':['A0','A1'],
'B':['B0', 'split this\n\n into \r\n separate \n rows \n'],
'index_col':[0,1]})
print (df)
A B index_col
0 A0 B0 0
1 A1 split this\n\n into \r\n separate \n rows \n 1

Your solution should be changed with DataFrame.set_index, Series.str.replace added expand=True to Series.str.split for DataFrame and last fitler out empty strings from B by DataFrame.query:

df1 = (df.set_index('index_col')['B']
.str.replace('\r', ' ')
.str.split('\n', expand=True)
.stack()
.rename('B')
.reset_index(level=1, drop=True)
.reset_index()[['B', 'index_col']]
.query("B != ''"))
print (df1)
B index_col
0 B0 0
1 split this 1
3 into 1
4 separate 1
5 rows 1

For pandas 0.25+ is possible use DataFrame.explode:

df['B'] = df['B'].str.replace('\r', ' ').str.split('\n')
df1 = df[['B', 'index_col']].explode('B').query("B != ''")
print (df1)
B index_col
0 B0 0
1 split this 1
1 into 1
1 separate 1
1 rows 1

How to add a line break in a string inside a DataFrame?

So the problem lies when you replace r\s+ which also matches line breaks and replaces them with white spaces.
source.

If you comment your line then following will retain the newline character in strings.

  import spintax
df = pd.DataFrame()
for i in range(0, 50):
data = spintax.spin("{option1|option2}" + "\n" + " blablabla ")
df = df.append({'A': data}, ignore_index=True)

# df['A'] = df['A'].str.replace(r'\s+', " ")

print(df)

Is that what you wanted to achieve?



Related Topics



Leave a reply



Submit