Selecting a Row of Pandas Series/Dataframe by Integer Index

Selecting a row of pandas series/dataframe by integer index

echoing @HYRY, see the new docs in 0.11

http://pandas.pydata.org/pandas-docs/stable/indexing.html

Here we have new operators, .iloc to explicity support only integer indexing, and .loc to explicity support only label indexing

e.g. imagine this scenario

In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list('AB'))

In [2]: df
Out[2]:
A B
0 1.068932 -0.794307
2 -0.470056 1.192211
4 -0.284561 0.756029
6 1.037563 -0.267820
8 -0.538478 -0.800654

In [5]: df.iloc[[2]]
Out[5]:
A B
4 -0.284561 0.756029

In [6]: df.loc[[2]]
Out[6]:
A B
2 -0.470056 1.192211

[] slices the rows (by label location) only

Getting the integer index of a Pandas DataFrame row fulfilling a condition?

You could use np.where like this:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(1,7).reshape(2,3),
columns = list('abc'),
index=pd.Series([2,5], name='b'))
print(df)
# a b c
# b
# 2 1 2 3
# 5 4 5 6
print(np.where(df.index==5)[0])
# [1]
print(np.where(df['c']==6)[0])
# [1]

The value returned is an array since there could be more than one row with a particular index or value in a column.

Selecting first row of each index in multi indexing pandas DataFrame

The following code can help:

gsgb = grouped_sort.copy()
gsgb = gsgb.groupby(level=0)
print(type(gsgb))
gsgb.head()

for cat, df in gsgb:
display(df.sort_values(by=["price"], ascending=False).reset_index().iloc[0])

Working:

It basically loops over all categories in the grouped dataframe and then sort values based on price, then reset index and finally choose the one with max price.

Get index of a row of a pandas dataframe as an integer

The easier is add [0] - select first value of list with one element:

dfb = df[df['A']==5].index.values.astype(int)[0]
dfbb = df[df['A']==8].index.values.astype(int)[0]

dfb = int(df[df['A']==5].index[0])
dfbb = int(df[df['A']==8].index[0])

But if possible some values not match, error is raised, because first value not exist.

Solution is use next with iter for get default parameetr if values not matched:

dfb = next(iter(df[df['A']==5].index), 'no match')
print (dfb)
4

dfb = next(iter(df[df['A']==50].index), 'no match')
print (dfb)
no match

Then it seems need substract 1:

print (df.loc[dfb:dfbb-1,'B'])
4 0.894525
5 0.978174
6 0.859449
Name: B, dtype: float64

Another solution with boolean indexing or query:

print (df[(df['A'] >= 5) & (df['A'] < 8)])
A B
4 5 0.894525
5 6 0.978174
6 7 0.859449

print (df.loc[(df['A'] >= 5) & (df['A'] < 8), 'B'])
4 0.894525
5 0.978174
6 0.859449
Name: B, dtype: float64

print (df.query('A >= 5 and A < 8'))
A B
4 5 0.894525
5 6 0.978174
6 7 0.859449

Get integer index values from filtered pandas dataframe

Assuming df is sorted by index:

np.searchsorted(df.index, df_selection.index)

Output:

array([2, 3])

In general, you can do:

np.where(df.index.isin(df_selection.index))

output:

(array([2, 3]),)

Find integer index of rows with NaN in pandas dataframe

For DataFrame df:

import numpy as np
index = df['b'].index[df['b'].apply(np.isnan)]

will give you back the MultiIndex that you can use to index back into df, e.g.:

df['a'].ix[index[0]]
>>> 1.452354

For the integer index:

df_index = df.index.values.tolist()
[df_index.index(i) for i in index]
>>> [3, 6]

How can I choose rows and columns if the index/header contains certain integer in Pandas dataframe?

Let's define list of indexes of interest:

idx = [1, 3, 5]

Do the summation using specified columns:

df[['US' + str(i) for i in idx]].sum(axis = 1)

Alternatively, if you want to join summation column to dataframe, you can assign result to the variable:

s1 = df[['US' + str(i) for i in idx]].sum(axis = 1)
s1.name = 'NEW_US_IND_' + ''.join("{0}".format(i) for i in idx)

And add new column:

df.join(s1)

I'd like to get a unique of a series of a dataframe while preserving its index

I guess you are looking for this,

df[~df['col'].duplicated(keep='first')]

Sample Input:

   col
0 1
1 2
2 3
3 1
4 2

Sample Output:

   col
0 1
1 2
2 3


Related Topics



Leave a reply



Submit