Remove Very First Row in Pandas

Delete the first three rows of a dataframe in pandas

Use iloc:

df = df.iloc[3:]

will give you a new df without the first three rows.

Remove very first row in pandas

Use DataFrame.droplevel because there is MultiIndex in columns:

df = df.sort_index(axis=1, level=1).droplevel(0, axis=1)

Or for oldier versions of pandas MultiIndex.droplevel:

df.columns = df.columns.droplevel(0)

Python/Pandas - Remove the first row with Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7

You need to use the skiprows argument inside the pd.read_excel function to correctly get the column names in the 5th row.

UPDATE Including the forward filling

import pandas as pd

xl = pd.ExcelFile('Sample_File.xlsm')

for sheet in xl.sheet_names:
df = pd.read_excel(xl, sheet_name=sheet, skiprows=4) # no more iloc here
df['Comment'] = df['Comment'].ffill()
df.to_csv(f'{sheet}.csv', index=False)

How to remove/delete first row/column from Data Frame using python?

.read_html() returns a list of dataframes. You call the specific dataframes by the index position (Ie: like you did with df[1]. So you need to use .iloc on the dataframe in your list of dataframes, on index position 1.

df = df[1].iloc[: , 1:]

Remove top row from a dataframe

You can try using slicing.

df = df[1:]

This will remove the first row of your dataframe.

How to delete first row in a csv file using python

FILENAME = 'test.csv'
DELETE_LINE_NUMBER = 1

with open(FILENAME) as f:
data = f.read().splitlines() # Read csv file
with open(FILENAME, 'w') as g:
g.write('\n'.join([data[:DELETE_LINE_NUMBER]] + data[DELETE_LINE_NUMBER+1:])) # Write to file

Original test.csv:

ID, Name
0, ABC
1, DEF
2, GHI
3, JKL
4, MNO

After run:

ID, Name
1, DEF
2, GHI
3, JKL
4, MNO

(deleted 0, ABC)

Python: Pandas - Delete the first row by group

You could use groupby/transform to prepare a boolean mask which is True for the rows you want and False for the rows you don't want. Once you have such a boolean mask, you can select the sub-DataFrame using df.loc[mask]:

import numpy as np
import pandas as pd

df = pd.DataFrame(
{'ID': [10001, 10001, 10001, 10002, 10002, 10002, 10003, 10003, 10003],
'PRICE': [14.5, 14.5, 14.5, 15.125, 14.5, 14.5, 14.5, 14.5, 15.0],
'date': [19920103, 19920106, 19920107, 19920108, 19920109, 19920110,
19920113, 19920114, 19920115]},
index = range(1,10))

def mask_first(x):
result = np.ones_like(x)
result[0] = 0
return result

mask = df.groupby(['ID'])['ID'].transform(mask_first).astype(bool)
print(df.loc[mask])

yields

      ID  PRICE      date
2 10001 14.5 19920106
3 10001 14.5 19920107
5 10002 14.5 19920109
6 10002 14.5 19920110
8 10003 14.5 19920114
9 10003 15.0 19920115

Since you're interested in efficiency, here is a benchmark:

import timeit
import operator
import numpy as np
import pandas as pd

N = 10000
df = pd.DataFrame(
{'ID': np.random.randint(100, size=(N,)),
'PRICE': np.random.random(N),
'date': np.random.random(N)})

def using_mask(df):
def mask_first(x):
result = np.ones_like(x)
result[0] = 0
return result

mask = df.groupby(['ID'])['ID'].transform(mask_first).astype(bool)
return df.loc[mask]

def using_apply(df):
return df.groupby('ID').apply(lambda group: group.iloc[1:, 1:])

def using_apply_alt(df):
return df.groupby('ID', group_keys=False).apply(lambda x: x[1:])

timing = dict()
for func in (using_mask, using_apply, using_apply_alt):
timing[func] = timeit.timeit(
'{}(df)'.format(func.__name__),
'from __main__ import df, {}'.format(func.__name__), number=100)

for func, t in sorted(timing.items(), key=operator.itemgetter(1)):
print('{:16}: {:.2f}'.format(func.__name__, t))

reports

using_mask      : 0.85
using_apply_alt : 2.04
using_apply : 3.70

pandas data frame removing the first row of every numbers

Use duplicated with boolean indexing, last remove # by position with str[1:] or by str.strip:

print (df)
a
0 #1
1 #2
2 #2
3 #3
4 #3
5 #3
6 #3
7 #4
8 #4
9 #5

df = df.loc[df['a'].duplicated(), 'a'].str[1:]
print (df)
2 2
4 3
5 3
6 3
8 4
Name: a, dtype: object

Or:

df = df.loc[df['a'].duplicated(), 'a'].str.strip('#')
print (df)
2 2
4 3
5 3
6 3
8 4
Name: a, dtype: object

Detail:

print (df['a'].duplicated())
0 False
1 False
2 True
3 False
4 True
5 True
6 True
7 False
8 True
9 False
Name: a, dtype: bool

EDIT:

df = df[df['a'].duplicated()]
df['a'] = df['a'].str.strip('#')


Related Topics



Leave a reply



Submit