Finding Rows Containing a Value (Or Values) in Any Column

Finding rows containing a value (or values) in any column

How about

apply(df, 1, function(r) any(r %in% c("M017", "M018")))

The ith element will be TRUE if the ith row contains one of the values, and FALSE otherwise. Or, if you want just the row numbers, enclose the above statement in which(...).

Pandas find rows with value in any column

You can use for match first ID/id/Id substring in all columns by filter output rows before and then convert first row to columns:

mask = (df.select_dtypes(object)
.apply(lambda x: x.str.contains('id', case=False))
.any(axis=1)
.cumsum()
.gt(0))

df = df[mask].copy()
df.columns = df.iloc[0].rename(None)
df = df.iloc[1:].reset_index(drop=True)

Another idea for test not subtrings:

mask = df.isin(['id','ID','Id']).any(axis=1).cumsum().gt(0)

df = df[mask].copy()
df.columns = df.iloc[0].rename()
df = df.iloc[1:].reset_index(drop=True)

Select rows containing certain values from pandas dataframe

Introduction

At the heart of selecting rows, we would need a 1D mask or a pandas-series of boolean elements of length same as length of df, let's call it mask. So, finally with df[mask], we would get the selected rows off df following boolean-indexing.

Here's our starting df :

In [42]: df
Out[42]:
A B C
1 apple banana pear
2 pear pear apple
3 banana pear pear
4 apple apple pear

I. Match one string

Now, if we need to match just one string, it's straight-foward with elementwise equality :

In [42]: df == 'banana'
Out[42]:
A B C
1 False True False
2 False False False
3 True False False
4 False False False

If we need to look ANY one match in each row, use .any method :

In [43]: (df == 'banana').any(axis=1)
Out[43]:
1 True
2 False
3 True
4 False
dtype: bool

To select corresponding rows :

In [44]: df[(df == 'banana').any(axis=1)]
Out[44]:
A B C
1 apple banana pear
3 banana pear pear


II. Match multiple strings

1. Search for ANY match

Here's our starting df :

In [42]: df
Out[42]:
A B C
1 apple banana pear
2 pear pear apple
3 banana pear pear
4 apple apple pear

NumPy's np.isin would work here (or use pandas.isin as listed in other posts) to get all matches from the list of search strings in df. So, say we are looking for 'pear' or 'apple' in df :

In [51]: np.isin(df, ['pear','apple'])
Out[51]:
array([[ True, False, True],
[ True, True, True],
[False, True, True],
[ True, True, True]])

# ANY match along each row
In [52]: np.isin(df, ['pear','apple']).any(axis=1)
Out[52]: array([ True, True, True, True])

# Select corresponding rows with masking
In [56]: df[np.isin(df, ['pear','apple']).any(axis=1)]
Out[56]:
A B C
1 apple banana pear
2 pear pear apple
3 banana pear pear
4 apple apple pear

2. Search for ALL match

Here's our starting df again :

In [42]: df
Out[42]:
A B C
1 apple banana pear
2 pear pear apple
3 banana pear pear
4 apple apple pear

So, now we are looking for rows that have BOTH say ['pear','apple']. We will make use of NumPy-broadcasting :

In [66]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1)
Out[66]:
array([[ True, True],
[ True, True],
[ True, False],
[ True, True]])

So, we have a search list of 2 items and hence we have a 2D mask with number of rows = len(df) and number of cols = number of search items. Thus, in the above result, we have the first col for 'pear' and second one for 'apple'.

To make things concrete, let's get a mask for three items ['apple','banana', 'pear'] :

In [62]: np.equal.outer(df.to_numpy(copy=False),  ['apple','banana', 'pear']).any(axis=1)
Out[62]:
array([[ True, True, True],
[ True, False, True],
[False, True, True],
[ True, False, True]])

The columns of this mask are for 'apple','banana', 'pear' respectively.

Back to 2 search items case, we had earlier :

In [66]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1)
Out[66]:
array([[ True, True],
[ True, True],
[ True, False],
[ True, True]])

Since, we are looking for ALL matches in each row :

In [67]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1).all(axis=1)
Out[67]: array([ True, True, False, True])

Finally, select rows :

In [70]: df[np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1).all(axis=1)]
Out[70]:
A B C
1 apple banana pear
2 pear pear apple
4 apple apple pear

How do I select rows from a DataFrame based on column values?

To select rows whose column value equals a scalar, some_value, use ==:

df.loc[df['column_name'] == some_value]

To select rows whose column value is in an iterable, some_values, use isin:

df.loc[df['column_name'].isin(some_values)]

Combine multiple conditions with &:

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses

df['column_name'] >= A & df['column_name'] <= B

is parsed as

df['column_name'] >= (A & df['column_name']) <= B

which results in a Truth value of a Series is ambiguous error.


To select rows whose column value does not equal some_value, use !=:

df.loc[df['column_name'] != some_value]

isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~:

df.loc[~df['column_name'].isin(some_values)]

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two three two two one three'.split(),
'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
# A B C D
# 0 foo one 0 0
# 1 bar one 1 2
# 2 foo two 2 4
# 3 bar three 3 6
# 4 foo two 4 8
# 5 bar two 5 10
# 6 foo one 6 12
# 7 foo three 7 14

print(df.loc[df['A'] == 'foo'])

yields

     A      B  C   D
0 foo one 0 0
2 foo two 2 4
4 foo two 4 8
6 foo one 6 12
7 foo three 7 14

If you have multiple values you want to include, put them in a
list (or more generally, any iterable) and use isin:

print(df.loc[df['B'].isin(['one','three'])])

yields

     A      B  C   D
0 foo one 0 0
1 bar one 1 2
3 bar three 3 6
6 foo one 6 12
7 foo three 7 14

Note, however, that if you wish to do this many times, it is more efficient to
make an index first, and then use df.loc:

df = df.set_index(['B'])
print(df.loc['one'])

yields

       A  C   D
B
one foo 0 0
one bar 1 2
one foo 6 12

or, to include multiple values from the index use df.index.isin:

df.loc[df.index.isin(['one','two'])]

yields

       A  C   D
B
one foo 0 0
one bar 1 2
two foo 2 4
two foo 4 8
two bar 5 10
one foo 6 12

SQL Server SELECT where any column contains 'x'

First Method(Tested)
First get list of columns in string variable separated by commas and then you can search 'foo' using that variable by use of IN

Check stored procedure below which first gets columns and then searches for string:

DECLARE @TABLE_NAME VARCHAR(128)
DECLARE @SCHEMA_NAME VARCHAR(128)

-----------------------------------------------------------------------

-- Set up the name of the table here :
SET @TABLE_NAME = 'testing'
-- Set up the name of the schema here, or just leave set to 'dbo' :
SET @SCHEMA_NAME = 'dbo'

-----------------------------------------------------------------------

DECLARE @vvc_ColumnName VARCHAR(128)
DECLARE @vvc_ColumnList VARCHAR(MAX)

IF @SCHEMA_NAME =''
BEGIN
PRINT 'Error : No schema defined!'
RETURN
END

IF NOT EXISTS (SELECT * FROM sys.tables T JOIN sys.schemas S
ON T.schema_id=S.schema_id
WHERE T.Name=@TABLE_NAME AND S.name=@SCHEMA_NAME)
BEGIN
PRINT 'Error : The table '''+@TABLE_NAME+''' in schema '''+
@SCHEMA_NAME+''' does not exist in this database!'
RETURN
END

DECLARE TableCursor CURSOR FAST_FORWARD FOR
SELECT CASE WHEN PATINDEX('% %',C.name) > 0
THEN '['+ C.name +']'
ELSE C.name
END
FROM sys.columns C
JOIN sys.tables T
ON C.object_id = T.object_id
JOIN sys.schemas S
ON S.schema_id = T.schema_id
WHERE T.name = @TABLE_NAME
AND S.name = @SCHEMA_NAME
ORDER BY column_id


SET @vvc_ColumnList=''

OPEN TableCursor
FETCH NEXT FROM TableCursor INTO @vvc_ColumnName

WHILE @@FETCH_STATUS=0
BEGIN
SET @vvc_ColumnList = @vvc_ColumnList + @vvc_ColumnName

-- get the details of the next column
FETCH NEXT FROM TableCursor INTO @vvc_ColumnName

-- add a comma if we are not at the end of the row
IF @@FETCH_STATUS=0
SET @vvc_ColumnList = @vvc_ColumnList + ','
END

CLOSE TableCursor
DEALLOCATE TableCursor

-- Now search for `foo`


SELECT *
FROM testing
WHERE 'foo' in (@vvc_ColumnList );

2nd Method
In sql server you can get object id of table then using that object id you can fetch columns. In that case it will be as below:

Step 1: First get Object Id of table

select * from sys.tables order by name    

Step 2: Now get columns of your table and search in it:

 select * from testing where 'foo' in (select name from sys.columns  where  object_id =1977058079)

Note: object_id is what you get fetch in first step for you relevant table



Related Topics



Leave a reply



Submit