Remove All Rows Where Length of String Is More Than N

Remove all rows where length of string is more than n

To reword your question slightly, you want to retain rows where entries in f_name have length of 3 or less. So how about:

subset(m, nchar(as.character(f_name)) <= 3)

Remove the row from dataframe, that has string length greater than a certain number, after a certain character( , ) till end

You can use Series.str.match and pass the regex:

>>> df[df['name'].str.match('.*?,\w{0,2}$')]

id name
0 1 xy,ab
2 3 piy,bs

Or you can just split the values on comma, take the last value, and check if length is less than or equals to 2:

>>> df[df['name'].str.split(',').str[-1].str.len().le(2)]
id name
0 1 xy,ab
2 3 piy,bs

Delete rows with pandas an excessive length of a string in a field

How to limit the email length to 50 characters:

df[df['email'].str.len()<51]

How to limit any string field to 50 characters:

df[df.applymap(lambda x: len(x) if isinstance(x, str) else 0).lt(51).all(axis=1)]

Remove the rows from pandas dataframe, that has sentences longer than certain word length

First split values by whitespace, get number of rows by Series.str.len and check by inverted condition >= to < with Series.lt for boolean indexing:

df = df[df['Y'].str.split().str.len().lt(4)]
#alternative with inverted mask by ~
#df = df[~df['Y'].str.split().str.len().ge(4)]
print (df)
X Y
1 1 An apple
2 2 glass of water

How to delete rows with less than a certain amount of items or strings with Pandas?

Just measure the number of items in the list and filter the rows with length lower than 3

dr0['length'] = dr0['PLATSBESKRIVNING'].apply(lambda x: len(x))
cond = dr0['length'] > 3
dr0 = dr0[cond]

remove String row in pandas data frame when number of words is less than N

Using Pandas dataframe:

import pandas
text = {"header":["The quick fox","The quick fox brown jumps hight","The quick"]}
df = pandas.DataFrame(text)
df = df[df['header'].str.split().str.len().gt(2)]
print(df)

The above snippet filters the dataframe of 'header' column length greater than 2 words.

For more on pandas dataframe, refer https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

Drop rows in dataframe if length of the name columns =1

Fastest way to do operations like this on pandas is through numpy.where.

eg for String length:

data = data[np.where((data['cust_last_nm'].str.len()>1) & 
(data['cust_frst_nm'].str.len()>1), True, False)]

Note: you can add postal code condition in same way. by default in your data postal codes will read in as floats, so cast them to string first, and then set length limit:

## string length & postal code conditions together
data = data[np.where((data['cust_last_nm'].str.len()>1) &
(data['cust_frst_nm'].str.len()>1) &
(data['cust_postl_cd'].astype('str').str.len()>4) &
(data['cust_postl_cd'].astype('str').str.len()<8)
, True, False)]

EDIT:

Since you working in chunks, change the data to chunk and put this inside your loop. Also, since you don't import headers (headers=0, change column names to their index values. And convert all values to strings before comparison, since otherwise NaN columns will be treated as floats eg:

chunk = chunk[np.where((chunk[0].astype('str').str.len()>1) & 
(chunk[1].astype('str').str.len()>1) &
(chunk[5].astype('str').str.len()>4) &
(chunk[5].astype('str').str.len()<8), True, False)]

Filter string data based on its string length

import pandas as pd

df = pd.read_csv('filex.csv')
df['A'] = df['A'].astype('str')
df['B'] = df['B'].astype('str')
mask = (df['A'].str.len() == 10) & (df['B'].str.len() == 10)
df = df.loc[mask]
print(df)

Applied to filex.csv:

A,B
123,abc
1234,abcd
1234567890,abcdefghij

the code above prints

            A           B
2 1234567890 abcdefghij


Related Topics



Leave a reply



Submit