How to deal with SettingWithCopyWarning in Pandas
The SettingWithCopyWarning
was created to flag potentially confusing "chained" assignments, such as the following, which does not always work as expected, particularly when the first selection returns a copy. [see GH5390 and GH5597 for background discussion.]
df[df['A'] > 2]['B'] = new_val # new_val not set in df
The warning offers a suggestion to rewrite as follows:
df.loc[df['A'] > 2, 'B'] = new_val
However, this doesn't fit your usage, which is equivalent to:
df = df[df['A'] > 2]
df['B'] = new_val
While it's clear that you don't care about writes making it back to the original frame (since you are overwriting the reference to it), unfortunately this pattern cannot be differentiated from the first chained assignment example. Hence the (false positive) warning. The potential for false positives is addressed in the docs on indexing, if you'd like to read further. You can safely disable this new warning with the following assignment.
import pandas as pd
pd.options.mode.chained_assignment = None # default='warn'
Other Resources
- pandas User Guide: Indexing and selecting data
- Python Data Science Handbook: Data Indexing and Selection
- Real Python: SettingWithCopyWarning in Pandas: Views vs Copies
- Dataquest: SettingwithCopyWarning: How to Fix This Warning in Pandas
- Towards Data Science: Explaining the SettingWithCopyWarning in pandas
Warning : Try using .loc[row_indexer,col_indexer] = value instead
If you want to make a new dataframe while keeping titles
, then
either slice with
.loc[]
:description_category = titles.loc[:, ['listed_in', 'description']]
or create a
.copy()
:description_category = titles[['listed_in', 'description']].copy()
Also it's faster to use .str.split()
instead of apply()
:
description_category['listed_in'] = description_category['listed_in'].str.split(', ')
Getting SettingWithCopyWarning warning even after using .loc in pandas
You could get this UserWarning if df_masked
is a sub-DataFrame of some other DataFrame.
In particular, if data had been copied from the original DataFrame to df_masked
then, Pandas emits the UserWarning to alert you that modifying df_masked
will not affect the original DataFrame.
If you do not intend to modify the original DataFrame, then you are free to ignore the UserWarning.
There are ways to shut off the UserWarning on a per-statement basis. In particular, you could use df_masked.is_copy = False
.
If you run into this UserWarning a lot, then instead of silencing the UserWarnings one-by-one, I think it is better to leave them be as you are developing your code. Be aware of what the UserWarning means, and if the modifying-the-child-does-not-affect-the-parent issue does not affect you, then ignore it. When your code is ready for production, or if you are experienced enough to not need the warnings, shut them off entirely with
pd.options.mode.chained_assignment = None
near the top of your code.
Here is a simple example which demonstrate the problem and (a) solution:
import pandas as pd
df = pd.DataFrame({'swallow':['African','European'], 'cheese':['gouda', 'cheddar']})
df_masked = df.iloc[1:]
df_masked.is_copy = False # comment-out this line to see the UserWarning
df_masked.loc[:, 'swallow'] = 'forest'
The reason why the UserWarning exists is to help alert new users to the fact that
chained-indexing such as
df.iloc[1:].loc[:, 'swallow'] = 'forest'
will not affect df
when the result of the first indexer (e.g. df.iloc[1:]
)
returns a copy.
SettingWithCopyWarning, even when using loc (?)
In case 1, df['A']
creates a copy of df
. As explained by the Pandas documentation, this can lead to unexpected results when chaining, thus a warning is raised. Case 2 looks correct, but false positives are possible:
Warning: The chained assignment warnings / exceptions are aiming to
inform the user of a possibly invalid assignment. There may be false
positives; situations where a chained assignment is inadvertantly
reported.
To turn off SettingWithCopyWarning
for a single dataframe, use
df.is_copy = False
To turn off chained assignment warnings altogether, use
options.mode.chained_assignment = None
Pandas: SettingWithCopyWarning even when using .loc[]
If the warning is matter .
df=df.assign(prob_win=prob_win)
How to remove setting with copy warning?
Most likely your source DataFrame (spotify_df) has been created as
a view of another DataFrame.
The side effect is that spotify_df does not have its own data buffer.
Instead it shares the data buffer with the DataFrame it has been
created from.
To get rid of this warning: When you create spotify_df, add .copy()
to the code.
This way spotify_df will be an "independent" DataFrame, with its
own data buffer, so you can do with it anything you want.
Trying to create a new column delivers A value is trying to be set on a copy of a slice from a DataFrame
You need to copy when slicing, not when assigning:
df_new = df_old.loc[df_old['code'] == df_old['name_code']].copy()
df_new['adress'] = df_new['name'] + df_new['street'] + df_new['adresscode']
output (without SettingWithCopyWarning
):
customerId code name_code name street adresscode \
0 1 1 1 Mike Long Street 458
1 2 1 1 Jucie Short Street 856
adress
0 MikeLong Street458
1 JucieShort Street856
For you other slicing, you need to use a list of columns:
df_new = df_old[['name','street','adresscode']].copy()
# OR
df_new = df_old.loc[:, ['name','street','adresscode']].copy()
Related Topics
Extract First Item of Each Sublist
How to Add Group Labels for Bar Charts in Matplotlib
Replace Values in a Pandas Series via Dictionary Efficiently
How to Include a Folder with Cx_Freeze
What's the Difference Between a Python Module and a Python Package
How to Open Multiple Files Using "With Open" in Python
Typeerror: 'Module' Object Is Not Callable
Beautifulsoup Grab Visible Webpage Text
Python List Sort in Descending Order
Runtimeerror on Windows Trying Python Multiprocessing
Python Extract Pattern Matches
Call a Function from Another File
How to Direct Output to a File When There Are Utf-8 Characters
How to Add File Extensions Based on File Type on Linux/Unix
Differencebetween Size and Count in Pandas
Using Pandas to Pd.Read_Excel() for Multiple Worksheets of the Same Workbook