Pandas: Difference Between Pivot and Pivot_Table. Why Is Only Pivot_Table Working

Pandas: Difference between pivot and pivot_table. Why is only pivot_table working?

For anyone who is still interested in the difference between pivot and pivot_table, there are mainly two differences:

  • pivot_table is a generalization of pivot that can handle duplicate values for one pivoted index/column pair. Specifically, you can give pivot_table a list of aggregation functions using keyword argument aggfunc. The default aggfunc of pivot_table is numpy.mean.
  • pivot_table also supports using multiple columns for the index and column of the pivoted table. A hierarchical index will be automatically generated for you.

REF: pivot and pivot_table

Python pivot_table - Add difference column

Here's a way to do that:

df = pd.DataFrame({
"Name": ["A", "A", "A", "B", "B", "B"],
"Date": "2020-07-01",
"Value": [0.1, 0.2, 3, 2, -.5, 4],
"Category": ["Z", "Y", "X", "Z", "Y", "X"]
})

piv = pd.pivot_table(df, index="Category", columns="Name", aggfunc=sum)
piv.columns = [c[1] for c in piv.columns]
piv["diff"] = piv.B - piv.A

The output (piv) is:

            A    B  diff
Category
X 3.0 4.0 1.0
Y 0.2 -0.5 -0.7
Z 0.1 2.0 1.9

To add 'total' for A and B, do

piv.loc["total"] = piv.sum()

Remove the total from the 'diff' column:

piv.loc["total", "diff"] = "" # or np.NaN, if you'd like to be more 
# 'pandas' style.

The output now is:

            A    B  diff
Category
X 3.0 4.0 1.0
Y 0.2 -0.5 -0.7
Z 0.1 2.0 1.9
total 3.3 5.5

If, at this point, you'd like to add the title 'Name' on top of the categories, do:

piv.columns = pd.MultiIndex.from_product([["Name"], piv.columns])

piv is now:

         Name          
A B diff
Category
X 3.0 4.0 1.0
Y 0.2 -0.5 -0.7
Z 0.1 2.0 1.9
total 3.3 5.5

To add the date to each column:

date = df.Date.max()
piv.columns = pd.MultiIndex.from_tuples([c+(date,) for c in piv.columns])

==>
Name
A B diff
2020-07-01 2020-07-01 2020-07-01
Category
X 3.0 4.0 1
Y 0.2 -0.5 -0.7
Z 0.1 2.0 1.9
total 3.3 5.5

Finally, to color a column (e.g. if you're using Jupyter), do:

second_col = piv.columns[2]
piv.style.background_gradient("PiYG", subset = [second_col]).highlight_null('white').set_na_rep("")

Sample Image

Pandas `pivot_table` working with `decimal.Decimal` type

Your code works for me, I can't reproduce your issue.

My setup:

import pandas as pd
from pandas import Timestamp
from decimal import Decimal


data = {'date': [Timestamp('2021-01-02 00:00:00'),
Timestamp('2021-01-02 00:00:00'),
Timestamp('2021-01-02 00:00:00'),
Timestamp('2021-01-02 00:00:00'),
Timestamp('2021-01-02 00:00:00'),
Timestamp('2021-01-02 00:00:00')],
'id': [123123, 123123, 456456, 456456, 567567, 891891],
'value': [Decimal('0.299999999999999988897769753748434595763683319091796875'),
Decimal('2.04999999999999982236431605997495353221893310546875'),
Decimal('2.018190000000000150492951433989219367504119873046875'),
Decimal('606800000'),
Decimal('2.20000000000000017763568394002504646778106689453125'),
Decimal('2475368')],
'type': ['apple', 'apple', 'apple', 'banana', 'apple', 'banana']}

df = pd.DataFrame(data)

Pivot:

pivot_df = pd.pivot_table(df,
values='value',
index=['date', 'id'],
columns='type').reset_index().rename_axis(None, axis=1)

Output:

>>> df
date id apple banana
0 2021-01-02 123123 1.17500 NaN
1 2021-01-02 456456 2.01819 606800000.0
2 2021-01-02 567567 2.20000 NaN
3 2021-01-02 891891 NaN 2475368.0

Why does Pivot table return Int64 Type Error?

First of all, let's create a df similar to the one OP has

import pandas as pd

df = pd.DataFrame( {'price': [10, 12, 18, 10, 12], 'type': ['A', 'A', 'A', 'B', 'B'], 'size': [10, 12, 18, 10, 12], 'location': ['A', 'A', 'A', 'B', 'B'], 'neighborhood': ['A', 'A', 'A', 'B', 'B']})

If one prints the df one will see that this one has int64 and not Int64 (as opposed to OP's). Note: On my answer here one finds the difference between the two dtypes.

print(df.info(verbose=True))

[Out]:

# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 5 non-null int64
1 type 5 non-null object
2 size 5 non-null int64
3 location 5 non-null object
4 neighborhood 5 non-null object

And, with an int64 one will be able to create the pivot table with index "neighborhood", columns "type", and values "price", with the following

df_pivot = df.pivot_table(index='neighborhood', columns='type', values='price')

This is the output

type                  A     B
neighborhood
A 13.333333 NaN
B NaN 11.0

However, with Int64 the Pivot Table can generate an error.

In order to handle that, one will need convert the type to int64

df[['price', 'size']] = df[['price', 'size']].astype('int64')  

or

import numpy as np

df[['price', 'size']] = df[['price', 'size']].astype(np.int64)

Also, most likely, OP has missing values. The fastest way to handle that is to remove the rows with missing values. In order to find and remove the missing values, my answer here may be of help.

For the reference, this is a direct link to the module maybe_downcast_to_dtype that is raising the error that OP is having.

Struggling in pandas pivot tables and flattening them

Use aggfunc="size" instead of len:

pivot = pd.pivot_table(
df,
index="customer_IDprovince",
columns="category",
aggfunc="size",
fill_value=0,
)

print(pivot.shape)

Prints:

(48, 68)

Python Pandas pivot_table : unexpected results

For me working your solution without [] for avoid MutliIndex:

pivot1 = pd.pivot_table(df,index=['date','city'],columns='timeOfTheDay',values='count')
print (pivot1)
timeOfTheDay day night
date city
13 NY 2 3
SF 4 5


Related Topics



Leave a reply



Submit