Create a new dataframe based on rows with a certain value
You do not need to write loops. You can do it easily with pandas.
Assuming your dataframe looks like this:
import pandas as pd
mainDf = pd.DataFrame()
mainDf['Type'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
mainDf['Dummy'] = [1, 2, 3, 4, 5, 6, 7, 8]
To create dataframe for S and P types, you can just do this:
cust_sell = mainDf[mainDf.Type == 'S']
cust_buy = mainDf[mainDf.Type == 'P']
cust_sell output:
Type Dummy
0 S 1
1 S 2
2 S 3
5 S 6
7 S 8
cust_buy output:
Type Dummy
3 P 4
4 P 5
6 P 7
Creating a new Dataframe based on rows with certain values and removing the rows from the original Dataframe
Your code is almost correct. Use any(axis=1)
to keep only one boolean value for each row instead of using dropna(how='all')
The same with a reproducible example:
import pandas as pd
import numpy as np
np.random.seed(2022)
vals = np.random.choice([-1, 0, 1], size=(10, 4), p=[.2, .4, .4])
df = pd.DataFrame(vals, columns=list('ABCD'))
m = df.isin([-1]).any(axis=1) # or df.eq(-1).any(axis=1)
df1, df2 = df[m], df[~m]
Output:
>>> df.assign(M=m)
A B C D M
0 -1 0 -1 -1 True
1 1 0 1 1 False
2 1 1 1 1 False
3 1 1 0 0 False
4 0 1 1 -1 True
5 1 0 0 1 False
6 -1 0 1 0 True
7 0 0 0 0 False
8 1 -1 1 0 True
9 1 1 0 1 False
>>> df1
A B C D
0 -1 0 -1 -1
4 0 1 1 -1
6 -1 0 1 0
8 1 -1 1 0
>>> df2
A B C D
1 1 0 1 1
2 1 1 1 1
3 1 1 0 0
5 1 0 0 1
7 0 0 0 0
9 1 1 0 1
creating a new dataframe based off if a particular value matches a value in a list
As you've not posted any data or code I will demonstrate how the following should work for you. You can pass a list to isin
which will return a boolean index which you can use to filter your df, there is no need to loop over and append the rows of interest. It's probably failing for you (I'm guessing as I don't have your data) because you've either gone off the end or your index doesn't contain that specific label value.
In [147]:
customer_list=['Microsoft', 'Google', 'Facebook']
df = pd.DataFrame({'Customer':['Microsoft', 'Microsoft', 'Google', 'Facebook','Google', 'Facebook', 'Apple','Apple'], 'data':np.random.randn(8)})
df
Out[147]:
Customer data
0 Microsoft 0.669051
1 Microsoft 0.392646
2 Google 1.534285
3 Facebook -1.204585
4 Google 1.050301
5 Facebook 0.492487
6 Apple 1.471614
7 Apple 0.762598
In [148]:
df['Customer'].isin(customer_list)
Out[148]:
0 True
1 True
2 True
3 True
4 True
5 True
6 False
7 False
Name: Customer, dtype: bool
In [149]:
df[df['Customer'].isin(customer_list)]
Out[149]:
Customer data
0 Microsoft 0.669051
1 Microsoft 0.392646
2 Google 1.534285
3 Facebook -1.204585
4 Google 1.050301
5 Facebook 0.492487
Pandas create new dataframe based on unique value in a column of existing dataframe efficiently
The easiest way would be to use groupby -
And populate the first occurrences of the column values
Group By
>>> import pandas as pd
>>>
>>> d = {
... 'Main':['v1','v2','v1','v2','v5','v2']
... ,'Col1':[1,0,1,1,1,1]
... ,'Col2':[0,1,1,0,0,0]
... ,'Col3':[0,1,0,1,0,0]
... }
>>>
>>> df = pd.DataFrame(d)
>>>
>>> df.groupby('Main').agg('first')
Col1 Col2 Col3
Main
v1 1 0 0
v2 0 1 1
v5 1 0 0
>>> df.groupby('Main').agg('first').reset_index()
Main Col1 Col2 Col3
0 v1 1 0 0
1 v2 0 1 1
2 v5 1 0 0
Drop Duplicates
>>> df.drop_duplicates(subset='Main')
Main Col1 Col2 Col3
0 v1 1 0 0
1 v2 0 1 1
4 v5 1 0 0
Creating new rows in dataframe based on string values in multiple columns
A bit tricky but it should work with melt
to flat your dataframe then pivot_table
to reshape it:
out = (df.reset_index().melt(['ID', 'Name', 'index'], var_name='col', value_name='val')
.assign(val=lambda x: x['val'].str.split(', ')).explode('val')
.assign(row=lambda x: x.groupby(['index', 'col']).cumcount())
.pivot_table('val', ['index', 'row', 'ID', 'Name'], 'col', aggfunc='first')
.droplevel(['index', 'row']).reset_index().rename_axis(columns=None).fillna(''))
Output:
ID | Name | Col3 | Col4 | Col5 | |
---|---|---|---|---|---|
0 | P39 | Pipe | Test1 | Test4 | |
1 | P39 | Pipe | Test2 | Test5 | |
2 | P39 | Pipe | Test3 | ||
3 | S32 | Screw | Test6 | Test8 | Test10 |
4 | S32 | Screw | Test7 | Test9 | Test11 |
5 | S32 | Screw | Test12 | ||
6 | S32 | Screw | Test13 |
Create new dataframe that contain the average value from some of the columns in the old dataframe
You can group the dataframe by the grouper np.arange(len(df)) // 6
which groups the dataframe every six rows, then aggregate the columns using the desired aggregation functions to get the result, optionally reindex
along axis=1
to reorder the columns
d = {
'A': 'mean', 'B': 'mean', 'C': 'mean',
'TIME': 'first', 'D': 'first', 'E': 'first'
}
df.groupby(np.arange(len(df)) // 6).agg(d).reindex(df.columns, axis=1)
Define aggegation functions using columns index:
d = {
**dict.fromkeys(df.columns[[0, 4, 5]], 'first'),
**dict.fromkeys(df.columns[[1, 2, 3]], 'mean' )
}
df.groupby(np.arange(len(df)) // 6).agg(d).reindex(df.columns, axis=1)
Result
TIME A B C D E
0 2021/3/4 149.666667 0.000000 146.000000 0 1
1 2021/4/30 197.500000 4.166667 186.666667 0 1
2 2021/5/6 202.500000 5.000000 205.000000 1 1
create a new data frame from existing data frame based on condition
You could do the following:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1],
[0,0,1,0,0,1]]))
df_res = pd.DataFrame(df.apply(lambda c: 1 if np.sum(c) > 2 else 0))
In [6]: df_res
Out[6]:
0
0 1
1 0
2 1
3 0
4 0
5 1
Instead of np.sum(c)
you can also do c.sum()
And if you want it transposed just do the following instead:
df_res = pd.DataFrame(df.apply(lambda c: 1 if c.sum() > 2 else 0)).T
Related Topics
How to Convert Float into Hours Minutes Seconds
How to Pad a String With Leading Zeros in Python 3
Python, Anaconda, Spyder - Uninstalling Python Package Using Pip Does Not Work in Spyder + Ipython
Run Multiple Python File Concurrently
How to Solve and Equation With Inputs in Python
Python Executable Not Finding Libpython Shared Library
Plot Two Histograms on Single Chart With Matplotlib
Overlay a Smaller Image on a Larger Image Python Opencv
How to Convert Signed to Unsigned Integer in Python
How to Download Outlook Attachment from Python Script
Changing Presence Discord Status
How to Remove a Single Quotes from a List
Read CSV from Google Cloud Storage to Pandas Dataframe
How to Match a Newline Character in a Raw String
Change Date Formats in CSV With Python 3