Pandas Selecting by Label Sometimes Return Series, Sometimes Returns Dataframe

Pandas selecting by label sometimes return Series, sometimes returns DataFrame

Granted that the behavior is inconsistent, but I think it's easy to imagine cases where this is convenient. Anyway, to get a DataFrame every time, just pass a list to loc. There are other ways, but in my opinion this is the cleanest.

In [2]: type(df.loc[[3]])
Out[2]: pandas.core.frame.DataFrame

In [3]: type(df.loc[[1]])
Out[3]: pandas.core.frame.DataFrame

Different brackets in pandas DataFrame.loc

Although the first 2 are equivalent in output, the second is called chained indexing:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

the type also is a Series for the second one:

In[48]:
type(df.loc['Second'])

Out[48]: pandas.core.series.Series

you then index the index value which then returns the scalar value:

In[47]:
df.loc['Second']

Out[47]:
price 2
count 3
Name: Second, dtype: int32

In[49]:
df.loc['Second']['count']

Out[49]: 3

Regarding the last one, the additional brackets returns a df which is why you see the index value rather than a scalar value:

In[44]:
type(df.loc[['Second']])

Out[44]: pandas.core.frame.DataFrame

So then passing the column, indexes this df and returns the matching column, as a Series:

In[46]:
type(df.loc[['Second'],'count'])

Out[46]: pandas.core.series.Series

So it depends on what you want to achieve, but avoid the second form as it can lead to unexpected behaviour when attempting to assign to the column or df

Series [] and .loc[] sometimes returns a single value, and sometimes unexpectedly a single element Series containing the same value

In my opinion problem is duplicated index values, so if idxmax return tuple, which is duplicated, is returned not scalar, but all duplicated rows in selection.

Simple solution for avoid it is create default index, here change:

df = pd.read_clipboard(sep='\t', index_col=[0, 1, 2, 3, 4], na_values='')

to:

df = pd.read_clipboard(sep='\t', na_values='')

for no MultiIndex, but default RangeIndex.

Check it if RangeIndex:

print (df.index)

Solution if need MultiIndex is remove duplicated values:

df = pd.read_clipboard(sep='\t', index_col=[0, 1, 2, 3, 4], na_values='')
df = df[~df.index.duplicated()]

Stop pandas dataframe from converting to vector

Actually, df.iloc[1,:] is not a pd.DataFrame it is a pd.Series you can check it with type(df.iloc[1, :]). So row or column doesn't have any sense in these case.

To keep it as a pd.DataFrame you could select a range of rows of length 1: df.iloc[1:2, :] or df.iloc[[1], :]

Keep selected column as DataFrame instead of Series

As @Jeff mentions there are a few ways to do this, but I recommend using loc/iloc to be more explicit (and raise errors early if you're trying something ambiguous):

In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [11]: df
Out[11]:
A B
0 1 2
1 3 4

In [12]: df[['A']]

In [13]: df[[0]]

In [14]: df.loc[:, ['A']]

In [15]: df.iloc[:, [0]]

Out[12-15]: # they all return the same thing:
A
0 1
1 3

The latter two choices remove ambiguity in the case of integer column names (precisely why loc/iloc were created). For example:

In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 0])

In [17]: df
Out[17]:
A 0
0 1 2
1 3 4

In [18]: df[[0]] # ambiguous
Out[18]:
A
0 1
1 3

Pandas dataframe row extraction is changing dimensions

To get row as a DataFrame you need to use:

csv_row1 = csv.loc[[0]]

Pandas series: Only keep the first entry that contains a given character (comma)

If you really want to work with Series methods, the approach would be:

series[series.str.contains(',')].iloc[0]

However, this requires checking all elements, just to keep one.

A much more efficient approach (depending on the exact data, there might be edge case where this isn't true), would be to use a filter and next to get the first element. This is more that 100 times faster on the provided example.

next(filter(lambda x: ',' in x, series))

Output: '3,360,003|'



Related Topics



Leave a reply



Submit