Remove all rows where length of string is more than n
To reword your question slightly, you want to retain rows where entries in f_name have length of 3 or less. So how about:
subset(m, nchar(as.character(f_name)) <= 3)
Remove the row from dataframe, that has string length greater than a certain number, after a certain character( , ) till end
You can use Series.str.match
and pass the regex
:
>>> df[df['name'].str.match('.*?,\w{0,2}$')]
id name
0 1 xy,ab
2 3 piy,bs
Or you can just split the values on comma, take the last value, and check if length is less than or equals to 2:
>>> df[df['name'].str.split(',').str[-1].str.len().le(2)]
id name
0 1 xy,ab
2 3 piy,bs
Delete rows with pandas an excessive length of a string in a field
How to limit the email length to 50 characters:
df[df['email'].str.len()<51]
How to limit any string field to 50 characters:
df[df.applymap(lambda x: len(x) if isinstance(x, str) else 0).lt(51).all(axis=1)]
Remove the rows from pandas dataframe, that has sentences longer than certain word length
First split values by whitespace, get number of rows by Series.str.len
and check by inverted condition >=
to <
with Series.lt
for boolean indexing
:
df = df[df['Y'].str.split().str.len().lt(4)]
#alternative with inverted mask by ~
#df = df[~df['Y'].str.split().str.len().ge(4)]
print (df)
X Y
1 1 An apple
2 2 glass of water
How to delete rows with less than a certain amount of items or strings with Pandas?
Just measure the number of items in the list and filter the rows with length lower than 3
dr0['length'] = dr0['PLATSBESKRIVNING'].apply(lambda x: len(x))
cond = dr0['length'] > 3
dr0 = dr0[cond]
remove String row in pandas data frame when number of words is less than N
Using Pandas dataframe:
import pandas
text = {"header":["The quick fox","The quick fox brown jumps hight","The quick"]}
df = pandas.DataFrame(text)
df = df[df['header'].str.split().str.len().gt(2)]
print(df)
The above snippet filters the dataframe of 'header' column length greater than 2 words.
For more on pandas dataframe, refer https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
Drop rows in dataframe if length of the name columns =1
Fastest way to do operations like this on pandas is through numpy.where.
eg for String length:
data = data[np.where((data['cust_last_nm'].str.len()>1) &
(data['cust_frst_nm'].str.len()>1), True, False)]
Note: you can add postal code condition in same way. by default in your data postal codes will read in as floats
, so cast them to string first, and then set length limit:
## string length & postal code conditions together
data = data[np.where((data['cust_last_nm'].str.len()>1) &
(data['cust_frst_nm'].str.len()>1) &
(data['cust_postl_cd'].astype('str').str.len()>4) &
(data['cust_postl_cd'].astype('str').str.len()<8)
, True, False)]
EDIT:
Since you working in chunks, change the data
to chunk
and put this inside your loop. Also, since you don't import headers (headers
=0, change column names to their index values. And convert all values to strings before comparison, since otherwise NaN columns will be treated as floats eg:
chunk = chunk[np.where((chunk[0].astype('str').str.len()>1) &
(chunk[1].astype('str').str.len()>1) &
(chunk[5].astype('str').str.len()>4) &
(chunk[5].astype('str').str.len()<8), True, False)]
Filter string data based on its string length
import pandas as pd
df = pd.read_csv('filex.csv')
df['A'] = df['A'].astype('str')
df['B'] = df['B'].astype('str')
mask = (df['A'].str.len() == 10) & (df['B'].str.len() == 10)
df = df.loc[mask]
print(df)
Applied to filex.csv:
A,B
123,abc
1234,abcd
1234567890,abcdefghij
the code above prints
A B
2 1234567890 abcdefghij
Related Topics
How to One Hot Encode Several Categorical Variables in R
Changing Whisker Definition in Geom_Boxplot
Alternative to Expand.Grid for Data.Frames
Changing Facet Label to Math Formula in Ggplot2
Create Categories by Comparing a Numeric Column with a Fixed Value
How to Install an R Package from the Source Tarball on Windows
Ggplot2 Multiple Sub Groups of a Bar Chart
Collapsing Data Frame by Selecting One Row Per Group
Filter Data Frame Rows Based on Values in Vector
For Loop Over Dygraph Does Not Work in R
Subsetting a Data.Table Using !=<Some Non-Na> Excludes Na Too
How to Geocode a Simple Address Using Data Science Toolbox
Why True == "True" Is True in R
Opening Shiny App Directly in the Default Browser
Set Locale to System Default Utf-8