Selecting a row of pandas series/dataframe by integer index
echoing @HYRY, see the new docs in 0.11
http://pandas.pydata.org/pandas-docs/stable/indexing.html
Here we have new operators, .iloc
to explicity support only integer indexing, and .loc
to explicity support only label indexing
e.g. imagine this scenario
In [1]: df = pd.DataFrame(np.random.rand(5,2),index=range(0,10,2),columns=list('AB'))
In [2]: df
Out[2]:
A B
0 1.068932 -0.794307
2 -0.470056 1.192211
4 -0.284561 0.756029
6 1.037563 -0.267820
8 -0.538478 -0.800654
In [5]: df.iloc[[2]]
Out[5]:
A B
4 -0.284561 0.756029
In [6]: df.loc[[2]]
Out[6]:
A B
2 -0.470056 1.192211
[]
slices the rows (by label location) only
Getting the integer index of a Pandas DataFrame row fulfilling a condition?
You could use np.where like this:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(1,7).reshape(2,3),
columns = list('abc'),
index=pd.Series([2,5], name='b'))
print(df)
# a b c
# b
# 2 1 2 3
# 5 4 5 6
print(np.where(df.index==5)[0])
# [1]
print(np.where(df['c']==6)[0])
# [1]
The value returned is an array since there could be more than one row with a particular index or value in a column.
Selecting first row of each index in multi indexing pandas DataFrame
The following code can help:
gsgb = grouped_sort.copy()
gsgb = gsgb.groupby(level=0)
print(type(gsgb))
gsgb.head()
for cat, df in gsgb:
display(df.sort_values(by=["price"], ascending=False).reset_index().iloc[0])
Working:
It basically loops over all categories in the grouped dataframe and then sort values based on price
, then reset index and finally choose the one with max price.
Get index of a row of a pandas dataframe as an integer
The easier is add [0]
- select first value of list with one element:
dfb = df[df['A']==5].index.values.astype(int)[0]
dfbb = df[df['A']==8].index.values.astype(int)[0]
dfb = int(df[df['A']==5].index[0])
dfbb = int(df[df['A']==8].index[0])
But if possible some values not match, error is raised, because first value not exist.
Solution is use next
with iter
for get default parameetr if values not matched:
dfb = next(iter(df[df['A']==5].index), 'no match')
print (dfb)
4
dfb = next(iter(df[df['A']==50].index), 'no match')
print (dfb)
no match
Then it seems need substract 1
:
print (df.loc[dfb:dfbb-1,'B'])
4 0.894525
5 0.978174
6 0.859449
Name: B, dtype: float64
Another solution with boolean indexing
or query
:
print (df[(df['A'] >= 5) & (df['A'] < 8)])
A B
4 5 0.894525
5 6 0.978174
6 7 0.859449
print (df.loc[(df['A'] >= 5) & (df['A'] < 8), 'B'])
4 0.894525
5 0.978174
6 0.859449
Name: B, dtype: float64
print (df.query('A >= 5 and A < 8'))
A B
4 5 0.894525
5 6 0.978174
6 7 0.859449
Get integer index values from filtered pandas dataframe
Assuming df
is sorted by index:
np.searchsorted(df.index, df_selection.index)
Output:
array([2, 3])
In general, you can do:
np.where(df.index.isin(df_selection.index))
output:
(array([2, 3]),)
Find integer index of rows with NaN in pandas dataframe
For DataFrame df
:
import numpy as np
index = df['b'].index[df['b'].apply(np.isnan)]
will give you back the MultiIndex
that you can use to index back into df
, e.g.:
df['a'].ix[index[0]]
>>> 1.452354
For the integer index:
df_index = df.index.values.tolist()
[df_index.index(i) for i in index]
>>> [3, 6]
How can I choose rows and columns if the index/header contains certain integer in Pandas dataframe?
Let's define list of indexes of interest:
idx = [1, 3, 5]
Do the summation using specified columns:
df[['US' + str(i) for i in idx]].sum(axis = 1)
Alternatively, if you want to join summation column to dataframe, you can assign result to the variable:
s1 = df[['US' + str(i) for i in idx]].sum(axis = 1)
s1.name = 'NEW_US_IND_' + ''.join("{0}".format(i) for i in idx)
And add new column:
df.join(s1)
I'd like to get a unique of a series of a dataframe while preserving its index
I guess you are looking for this,
df[~df['col'].duplicated(keep='first')]
Sample Input:
col
0 1
1 2
2 3
3 1
4 2
Sample Output:
col
0 1
1 2
2 3
Related Topics
Pandas Dataframe Str.Contains() and Operation
String Concatenation Without '+' Operator
Pandas: Converting to Numeric, Creating Nans When Necessary
Attributeerror: Module 'Time' Has No Attribute 'Clock' in Python 3.8
Pandas - Slice Large Dataframe into Chunks
Function for Factorial in Python
Replacing Few Values in a Pandas Dataframe Column with Another Value
Why Are Empty Strings Returned in Split() Results
How to Run Spyder in Virtual Environment
How to Get the Different Parts of a Flask Request's Url
How to Connect to MySQL in Python 3 on Windows
Pandas Dataframe to List of Dictionaries
Differencebetween 'Same' and 'Valid' Padding in Tf.Nn.Max_Pool of Tensorflow