AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis', using pandas eval
Your data is fine, and pandas.eval
is buggy, but not in the way you think. There is a hint in the relevant github issue page that urged me to take a closer look at the documentation.
pandas.eval(expr, parser='pandas', engine=None, truediv=True, local_dict=None,
global_dict=None, resolvers=(), level=0, target=None, inplace=False)
Evaluate a Python expression as a string using various backends.
Parameters:
expr: str or unicode
The expression to evaluate. This string cannot contain any Python
statements, only Python expressions.
[...]
As you can see, the documented behaviour is to pass strings to pd.eval
, in line with the general (and expected) behaviour of the eval
/exec
class of functions. You pass a string, and end up with an arbitrary object.
As I see it, pandas.eval
is buggy because it doesn't reject the Series
input expr
up front, leading it to guess in the face of ambiguity. The fact that the default shortening of the Series
' __repr__
designed for pretty printing can drastically affect your result is the best proof of this situation.
The solution is then to step back from the XY problem, and use the right tool to convert your data, and preferably stop using pandas.eval
for this purpose entirely. Even in the working cases where the Series
is small, you can't really be sure that future pandas versions don't break this "feature" completely.
How to convert values like '2+3' in a Python Pandas column to its aggregated value
Use pandas.eval
what is different like pure python eval:
data['fatalities'] = pd.eval(data['fatalities'])
print (data)
fatalities
0 1
1 4
2 10
3 9
4 5
5 11
6 16
7 9
But because this working only to 100
rows because bug:
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'
Then solution is:
data['fatalities'] = data['fatalities'].apply(pd.eval)
How to read xlsx file with out loosing data type
you can use
pd.read_excel(..., dtype= {'Column1': np.float64, 'Column1': np.int32}
of course specifying the column types wisely
pandas column to list for a json file
You can use:
df = pd.read_excel('data_threated.xlsx').reset_index(drop=True)
df['categories'] = df['categories'].apply(lambda x: [int(i) for i in x.split(',')] if isinstance(x, str) else '')
df.to_json('output.json', orient='records', indent=4)
Content of output.json
[
{
"model":"xx",
"id":1,
"name":"xyz",
"categories":[
1,
2
]
}
]
Note you can also use:
df['categories'] = pd.eval(df['categories'])
Pandas evaluate a string ratio into a float
Dont use pd.eval
for Series
, because if more like 100 rows it return ugly error, so need convert each value separately:
df['Ratios'] = 1/df['Ratios'].str.replace(':','/').apply(pd.eval)
But also your error seems some non numeric values together with :
.
Error for 100+ rows:
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'
If not working and still error you can try test if data are correct in custom function:
print (df)
Date Ratios
0 2009-08-23 2:1r
1 2018-08-22 2:1
2 2019-10-24 2:1
3 2020-10-28 3:2
def f(x):
try:
pd.eval(x)
return False
except:
return True
df = df[df['Ratios'].str.replace(':','/').apply(f)]
print (df)
Date Ratios
0 2009-08-23 2:1r
ValueError: unknown type object pandas eval for n rows = 100
It is bug, need processing each value separately by apply
or map
:
test["result"] = (test["number1"].astype(str)+test["sign1"]+test["number2"].astype(str)+\
test["sign2"]+test["number3"].astype(str)).apply(pd.eval)
test["result"] = (test["number1"].astype(str)+test["sign1"]+test["number2"].astype(str)+\
test["sign2"]+test["number3"].astype(str)).map(pd.eval)
Summing and subtracting 2 numbers in 1 column in Pandas
I believe you need pandas.eval
:
df['new'] = pd.eval(df['bedrooms'])
print (df)
bedrooms new
0 1 + 1 2
1 2 - 1 1
EDIT: Problem in data is 6 +
, one possible solution for parse it to 6
is use Series.str.rstrip
:
df = pd.DataFrame({'bedrooms': "4 ,4 +,5 +1, 5+, 6+ ".split(',') * 200})
df['bedrooms'] = pd.eval(df['bedrooms'].str.rstrip('+- '))
Or:
df['bedrooms'] = df['bedrooms'].str.rstrip('+- ').apply(pd.eval)
print (df)
bedrooms
0 4
1 4
2 6
3 5
4 6
.. ...
995 4
996 4
997 6
998 5
999 6
[1000 rows x 1 columns]
EDIT1:
You can find problematic values:
def f(x):
try:
return pd.eval(x)
except:
return np.nan
df['bedrooms1'] = df['bedrooms'].apply(f)
a = df.loc[df['bedrooms1'].isna(), 'bedrooms']
print (a)
74 6 +
Name: bedrooms, dtype: object
Evaluate string in column of dataframe with a variable
Pandas has a safer version of eval
that supports a limited number of operations. Luckily, >
and <
work, and you can use this along with string concatenation:
i = '3'
idx = pd.eval(i + df.Expression)
df.loc[idx]
Name Factor Expression Year
0 Hydro 0.075 <10 2010
2 Hydro 0.075 <10 2011
4 Hydro 0.075 <10 2012
As @coldspeed noted, the above approach only works on DataFrames that are <
100 rows*, which isn't ideal. He also proposed the following solution:
df[[pd.eval(f"{i}{j}") for j in df['Expression']]]
*The above limitation is discussed more in depth in the following question: AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis', using pandas eval
Saved data using pandas is changing
This occures because read_csv
doesn't recognize complex types like list
and reads them as strings:
type(interactionsCSV.at[0, 'Sequence1'])
# <class 'str'>
One possible work around is to use pandas.eval function:
interactionsCSV['Sequence1'] = pd.eval(interactionsCSV['Sequence1'])
type(interactionsCSV.at[0, 'Sequence1'])
# <class 'list'>
max([len(s) for s in interactionsCSV.get('Sequence1')])
# 847
Turn str fractions to floats in pandas df
Try the below code:
df[['identifier']].join(df.filter(like='per').apply(pd.eval))
identifier per_1 per_2 per_3 per_4 per_5
0 'something' 0.976378 1 0.615385 0.7 1
1 'camel' 0.991803 0.728155 0.977199 0.916667 0
Related Topics
Does Python's Time.Time() Return the Local or Utc Timestamp
Attributeerror: 'List' Object Has No Attribute 'Click' - Selenium Webdriver
How to Read the Contents of an Url with Python
Pandas Selecting by Label Sometimes Return Series, Sometimes Returns Dataframe
Pyspark: Explode JSON in Column to Multiple Columns
Overwriting File in Ziparchive
Tab Completion in Python's Raw_Input()
How to Treat Python Argparse.Namespace() as a Dictionary
What Exactly Is Python's Iterator Protocol
How to Create an Empty Array and Then Append to It in Numpy
Which Seeds Have to Be Set Where to Realize 100% Reproducibility of Training Results in Tensorflow
How to Run a Python Script in a Web Page
Type Hints with User Defined Classes
Why Is the Empty Dictionary a Dangerous Default Value in Python