Create a New Dataframe Based on Rows With a Certain Value

Create a new dataframe based on rows with a certain value

You do not need to write loops. You can do it easily with pandas.

Assuming your dataframe looks like this:

import pandas as pd  

mainDf = pd.DataFrame()
mainDf['Type'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
mainDf['Dummy'] = [1, 2, 3, 4, 5, 6, 7, 8]

To create dataframe for S and P types, you can just do this:

cust_sell = mainDf[mainDf.Type == 'S']
cust_buy = mainDf[mainDf.Type == 'P']

cust_sell output:

  Type  Dummy
0 S 1
1 S 2
2 S 3
5 S 6
7 S 8

cust_buy output:

  Type  Dummy
3 P 4
4 P 5
6 P 7

Creating a new Dataframe based on rows with certain values and removing the rows from the original Dataframe

Your code is almost correct. Use any(axis=1) to keep only one boolean value for each row instead of using dropna(how='all')

The same with a reproducible example:

import pandas as pd
import numpy as np

np.random.seed(2022)
vals = np.random.choice([-1, 0, 1], size=(10, 4), p=[.2, .4, .4])
df = pd.DataFrame(vals, columns=list('ABCD'))

m = df.isin([-1]).any(axis=1) # or df.eq(-1).any(axis=1)
df1, df2 = df[m], df[~m]

Output:

>>> df.assign(M=m)
A B C D M
0 -1 0 -1 -1 True
1 1 0 1 1 False
2 1 1 1 1 False
3 1 1 0 0 False
4 0 1 1 -1 True
5 1 0 0 1 False
6 -1 0 1 0 True
7 0 0 0 0 False
8 1 -1 1 0 True
9 1 1 0 1 False

>>> df1
A B C D
0 -1 0 -1 -1
4 0 1 1 -1
6 -1 0 1 0
8 1 -1 1 0

>>> df2
A B C D
1 1 0 1 1
2 1 1 1 1
3 1 1 0 0
5 1 0 0 1
7 0 0 0 0
9 1 1 0 1

creating a new dataframe based off if a particular value matches a value in a list

As you've not posted any data or code I will demonstrate how the following should work for you. You can pass a list to isin which will return a boolean index which you can use to filter your df, there is no need to loop over and append the rows of interest. It's probably failing for you (I'm guessing as I don't have your data) because you've either gone off the end or your index doesn't contain that specific label value.

In [147]:

customer_list=['Microsoft', 'Google', 'Facebook']
df = pd.DataFrame({'Customer':['Microsoft', 'Microsoft', 'Google', 'Facebook','Google', 'Facebook', 'Apple','Apple'], 'data':np.random.randn(8)})
df
Out[147]:
Customer data
0 Microsoft 0.669051
1 Microsoft 0.392646
2 Google 1.534285
3 Facebook -1.204585
4 Google 1.050301
5 Facebook 0.492487
6 Apple 1.471614
7 Apple 0.762598
In [148]:

df['Customer'].isin(customer_list)
Out[148]:
0 True
1 True
2 True
3 True
4 True
5 True
6 False
7 False
Name: Customer, dtype: bool
In [149]:

df[df['Customer'].isin(customer_list)]
Out[149]:
Customer data
0 Microsoft 0.669051
1 Microsoft 0.392646
2 Google 1.534285
3 Facebook -1.204585
4 Google 1.050301
5 Facebook 0.492487

Pandas create new dataframe based on unique value in a column of existing dataframe efficiently

The easiest way would be to use groupby -

And populate the first occurrences of the column values

Group By

>>> import pandas as pd
>>>
>>> d = {
... 'Main':['v1','v2','v1','v2','v5','v2']
... ,'Col1':[1,0,1,1,1,1]
... ,'Col2':[0,1,1,0,0,0]
... ,'Col3':[0,1,0,1,0,0]
... }
>>>
>>> df = pd.DataFrame(d)
>>>
>>> df.groupby('Main').agg('first')
Col1 Col2 Col3
Main
v1 1 0 0
v2 0 1 1
v5 1 0 0
>>> df.groupby('Main').agg('first').reset_index()
Main Col1 Col2 Col3
0 v1 1 0 0
1 v2 0 1 1
2 v5 1 0 0

Drop Duplicates

>>> df.drop_duplicates(subset='Main')
Main Col1 Col2 Col3
0 v1 1 0 0
1 v2 0 1 1
4 v5 1 0 0

Creating new rows in dataframe based on string values in multiple columns

A bit tricky but it should work with melt to flat your dataframe then pivot_table to reshape it:

out = (df.reset_index().melt(['ID', 'Name', 'index'], var_name='col', value_name='val')
.assign(val=lambda x: x['val'].str.split(', ')).explode('val')
.assign(row=lambda x: x.groupby(['index', 'col']).cumcount())
.pivot_table('val', ['index', 'row', 'ID', 'Name'], 'col', aggfunc='first')
.droplevel(['index', 'row']).reset_index().rename_axis(columns=None).fillna(''))

Output:







































































IDNameCol3Col4Col5
0P39PipeTest1Test4
1P39PipeTest2Test5
2P39PipeTest3
3S32ScrewTest6Test8Test10
4S32ScrewTest7Test9Test11
5S32ScrewTest12
6S32ScrewTest13

Create new dataframe that contain the average value from some of the columns in the old dataframe

You can group the dataframe by the grouper np.arange(len(df)) // 6 which groups the dataframe every six rows, then aggregate the columns using the desired aggregation functions to get the result, optionally reindex along axis=1 to reorder the columns

d = {
'A': 'mean', 'B': 'mean', 'C': 'mean',
'TIME': 'first', 'D': 'first', 'E': 'first'
}

df.groupby(np.arange(len(df)) // 6).agg(d).reindex(df.columns, axis=1)

Define aggegation functions using columns index:

d = {
**dict.fromkeys(df.columns[[0, 4, 5]], 'first'),
**dict.fromkeys(df.columns[[1, 2, 3]], 'mean' )
}

df.groupby(np.arange(len(df)) // 6).agg(d).reindex(df.columns, axis=1)

Result

        TIME           A         B           C  D  E
0 2021/3/4 149.666667 0.000000 146.000000 0 1
1 2021/4/30 197.500000 4.166667 186.666667 0 1
2 2021/5/6 202.500000 5.000000 205.000000 1 1

create a new data frame from existing data frame based on condition

You could do the following:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1],
[0,0,1,0,0,1]]))
df_res = pd.DataFrame(df.apply(lambda c: 1 if np.sum(c) > 2 else 0))

In [6]: df_res
Out[6]:
0
0 1
1 0
2 1
3 0
4 0
5 1

Instead of np.sum(c) you can also do c.sum()

And if you want it transposed just do the following instead:

df_res = pd.DataFrame(df.apply(lambda c: 1 if c.sum() > 2 else 0)).T


Related Topics



Leave a reply



Submit