Create a New Dataframe Based on Rows With a Certain Value

Create a new dataframe based on rows with a certain value

You do not need to write loops. You can do it easily with pandas.

Assuming your dataframe looks like this:

import pandas as pd  

mainDf = pd.DataFrame()
mainDf['Type'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
mainDf['Dummy'] = [1, 2, 3, 4, 5, 6, 7, 8]

To create dataframe for S and P types, you can just do this:

cust_sell = mainDf[mainDf.Type == 'S']
cust_buy = mainDf[mainDf.Type == 'P']

cust_sell output:

  Type  Dummy
0    S      1
1    S      2
2    S      3
5    S      6
7    S      8

cust_buy output:

  Type  Dummy
3    P      4
4    P      5
6    P      7

Creating a new Dataframe based on rows with certain values and removing the rows from the original Dataframe

Your code is almost correct. Use any(axis=1) to keep only one boolean value for each row instead of using dropna(how='all')

The same with a reproducible example:

import pandas as pd
import numpy as np

np.random.seed(2022)
vals = np.random.choice([-1, 0, 1], size=(10, 4), p=[.2, .4, .4])
df = pd.DataFrame(vals, columns=list('ABCD'))

m = df.isin([-1]).any(axis=1)  # or df.eq(-1).any(axis=1)
df1, df2 = df[m], df[~m]

Output:

>>> df.assign(M=m)
   A  B  C  D      M
0 -1  0 -1 -1   True
1  1  0  1  1  False
2  1  1  1  1  False
3  1  1  0  0  False
4  0  1  1 -1   True
5  1  0  0  1  False
6 -1  0  1  0   True
7  0  0  0  0  False
8  1 -1  1  0   True
9  1  1  0  1  False

>>> df1
   A  B  C  D
0 -1  0 -1 -1
4  0  1  1 -1
6 -1  0  1  0
8  1 -1  1  0

>>> df2
   A  B  C  D
1  1  0  1  1
2  1  1  1  1
3  1  1  0  0
5  1  0  0  1
7  0  0  0  0
9  1  1  0  1

creating a new dataframe based off if a particular value matches a value in a list

As you've not posted any data or code I will demonstrate how the following should work for you. You can pass a list to isin which will return a boolean index which you can use to filter your df, there is no need to loop over and append the rows of interest. It's probably failing for you (I'm guessing as I don't have your data) because you've either gone off the end or your index doesn't contain that specific label value.

In [147]:

customer_list=['Microsoft', 'Google', 'Facebook']
df = pd.DataFrame({'Customer':['Microsoft', 'Microsoft', 'Google', 'Facebook','Google', 'Facebook', 'Apple','Apple'], 'data':np.random.randn(8)})
df
Out[147]:
    Customer      data
0  Microsoft  0.669051
1  Microsoft  0.392646
2     Google  1.534285
3   Facebook -1.204585
4     Google  1.050301
5   Facebook  0.492487
6      Apple  1.471614
7      Apple  0.762598
In [148]:

df['Customer'].isin(customer_list)
Out[148]:
0     True
1     True
2     True
3     True
4     True
5     True
6    False
7    False
Name: Customer, dtype: bool
In [149]:

df[df['Customer'].isin(customer_list)]
Out[149]:
    Customer      data
0  Microsoft  0.669051
1  Microsoft  0.392646
2     Google  1.534285
3   Facebook -1.204585
4     Google  1.050301
5   Facebook  0.492487

Pandas create new dataframe based on unique value in a column of existing dataframe efficiently

The easiest way would be to use groupby -

And populate the first occurrences of the column values

Group By

>>> import pandas as pd
>>> 
>>> d = {
...   'Main':['v1','v2','v1','v2','v5','v2']
...   ,'Col1':[1,0,1,1,1,1]
...   ,'Col2':[0,1,1,0,0,0]
...   ,'Col3':[0,1,0,1,0,0]
... }
>>> 
>>> df = pd.DataFrame(d)
>>> 
>>> df.groupby('Main').agg('first')
      Col1  Col2  Col3
Main                  
v1       1     0     0
v2       0     1     1
v5       1     0     0
>>> df.groupby('Main').agg('first').reset_index()
  Main  Col1  Col2  Col3
0   v1     1     0     0
1   v2     0     1     1
2   v5     1     0     0

Drop Duplicates

>>> df.drop_duplicates(subset='Main')
  Main  Col1  Col2  Col3
0   v1     1     0     0
1   v2     0     1     1
4   v5     1     0     0

Creating new rows in dataframe based on string values in multiple columns

A bit tricky but it should work with melt to flat your dataframe then pivot_table to reshape it:

out = (df.reset_index().melt(['ID', 'Name', 'index'], var_name='col', value_name='val')
         .assign(val=lambda x: x['val'].str.split(', ')).explode('val')
         .assign(row=lambda x: x.groupby(['index', 'col']).cumcount())
         .pivot_table('val', ['index', 'row', 'ID', 'Name'], 'col', aggfunc='first')
         .droplevel(['index', 'row']).reset_index().rename_axis(columns=None).fillna(''))

Output:

	ID	Name	Col3	Col4	Col5
0	P39	Pipe	Test1		Test4
1	P39	Pipe	Test2		Test5
2	P39	Pipe	Test3
3	S32	Screw	Test6	Test8	Test10
4	S32	Screw	Test7	Test9	Test11
5	S32	Screw			Test12
6	S32	Screw			Test13

Create new dataframe that contain the average value from some of the columns in the old dataframe

You can group the dataframe by the grouper np.arange(len(df)) // 6 which groups the dataframe every six rows, then aggregate the columns using the desired aggregation functions to get the result, optionally reindex along axis=1 to reorder the columns

d = {
    'A': 'mean', 'B': 'mean', 'C': 'mean', 
    'TIME': 'first', 'D': 'first', 'E': 'first'
}

df.groupby(np.arange(len(df)) // 6).agg(d).reindex(df.columns, axis=1)

Define aggegation functions using columns index:

d = {
    **dict.fromkeys(df.columns[[0, 4, 5]], 'first'),
    **dict.fromkeys(df.columns[[1, 2, 3]], 'mean' )
}

df.groupby(np.arange(len(df)) // 6).agg(d).reindex(df.columns, axis=1)

Result

        TIME           A         B           C  D  E
0   2021/3/4  149.666667  0.000000  146.000000  0  1
1  2021/4/30  197.500000  4.166667  186.666667  0  1
2   2021/5/6  202.500000  5.000000  205.000000  1  1

create a new data frame from existing data frame based on condition

You could do the following:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1], 
[0,0,1,0,0,1]]))
df_res = pd.DataFrame(df.apply(lambda c: 1 if np.sum(c) > 2 else 0))

In [6]: df_res
Out[6]: 
   0
0  1
1  0
2  1
3  0
4  0
5  1

Instead of np.sum(c) you can also do c.sum()

And if you want it transposed just do the following instead:

df_res = pd.DataFrame(df.apply(lambda c: 1 if c.sum() > 2 else 0)).T

Create a New Dataframe Based on Rows With a Certain Value