Selecting Column Names That Have Specified Value

How to extract column names which have a specific value

You can add a new column to an existing dataframe that contains a list of all columns in which for that particular row the field has the value 1.

Within the column paramater of withColumn you can iterate over all other columns and check for the wanted value:

val df = Seq((1, 2, 3), (4, 5, 6), (3, 2, 1)).toDF("col1", "col2", "col3")
df.show()

val cols = df.schema.fieldNames //change this array according to your needs
//if you want to exclude columns from the check

df.withColumn("result", array(
cols.map {
c: String => when(col(c).equalTo(1), c)
}: _*
)).show()

prints:

//input data
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| 2| 3|
| 4| 5| 6|
| 3| 1| 1|
+----+----+----+

//result
+----+----+----+--------------+
|col1|col2|col3| result|
+----+----+----+--------------+
| 1| 2| 3| [col1,,]|
| 4| 5| 6| [,,]|
| 3| 1| 1|[, col2, col3]|
+----+----+----+--------------+

Selecting all column names where value is greater than 0

You can filter values greater like 0 to boolean DataFrame and then use DataFrame.dot for matrix multiplication with columns names, last remove separator by indexing with str:

df['e'] = df.gt(0).dot(df.columns + ',').str[:-1]
print (df)
a b c d e
0 12 21 0 0 a,b
1 0 23 22 22 b,c,d
2 23 0 33 0 a,c

SELECT rows that have specified values in one of the columns

One way is to use XML:

SELECT t.*
FROM tab t
CROSS APPLY (SELECT * FROM tab t2 WHERE t.id = t2.id FOR XML RAW('a')) sub(c)
WHERE sub.c LIKE '%"I"%';

Output:

┌────┬──────┬────────┬─────────┬───────┐
│ ID │ Name │ Status │ Address │ Phone │
├────┼──────┼────────┼─────────┼───────┤
│ 1 │ Tom │ I │ U │ D │
│ 3 │ Pam │ D │ I │ U │
└────┴──────┴────────┴─────────┴───────┘

DBFiddle Demo



EDIT:

A bit more advanced option that excludes some columns. Basically simulating SELECT * EXCEPT id, name:

SELECT DISTINCT t.*
FROM tab t
CROSS APPLY (VALUES(CAST((SELECT t.* for XML RAW) AS xml))) B(XMLData)
CROSS APPLY (SELECT 1 c
FROM B.XMLData.nodes('/row') AS C1(n)
CROSS APPLY C1.n.nodes('./@*') AS C2(a)
WHERE a.value('local-name(.)','varchar(100)') NOT IN ('id','name')
AND a.value('.','varchar(max)') = 'I') C;

DBFiddle Demo2

Select specific columns, where the column names are in another df in r

The problem is that Y.variable.names is a data.frame which you cannot use to subset another data.frame.

You can check by typing class(Y.variable.names).

So the solution to your problem is subsetting Y.variable.names:

Y.Data = data %>% select(Y.variable.names[,1])

Bring a row for each specific column that is not empty, with the column name

You can use a CROSS APPLY in concert with VALUES to UNPIVOT your data

Select A.ID
,B.Data
,A.RandomInformation
From YourTable A
Cross Apply ( values ('Data1',Data1)
,('Data2',Data2)
,('Data3',Data3)
,('Data4',Data4)
,('Data5',Data5)
,('Data6',Data6)
) B(Data,Value)
Where B.Value is not null


Related Topics



Leave a reply



Submit