Accessing every 1st element of Pandas DataFrame column containing lists
You can use map
and a lambda
function
df.loc[:, 'new_col'] = df.A.map(lambda x: x[0])
Getting the first element from each list in a column of lists
The easiest way is using str.get
# df['new_column'] = df['codes'].str.get(0)
df['new_column'] = df['codes'].str[0]
However, I would suggest a list comprehension for speed, if there are no NaNs:
df['new_column'] = [l[0] for l in df['codes']]
If lists can be empty, you can do something like:
df['new_column'] = [l[0] if len(l) > 0 else np.nan for l in df['codes']]
To handle NaNs with the list comprehension, you can use loc
to subset and assign back.
m = df['codes'].notna()
df.loc[m, 'new_column'] = [
l[0] if len(l) > 0 else np.nan for l in df.loc[m, 'codes']]
Obligatory why-is-list-comp-worth-it link: For loops with pandas - When should I care?
Accessing the first item of list values that occur irregularly in a Pandas column
You can remove ''
from 'list'
to list
:
df1['Fruits'] = df1['Fruits'].apply(lambda x : x[0] if type(x) == list else x)
print (df1)
Fruits
0 Apple
1 Banana
2 Kiwi
3 Cheese
Similar solution is use isinstance
:
df1['Fruits'] = df1['Fruits'].apply(lambda x: x[0] if isinstance(x, list) else x)
print (df1)
Fruits
0 Apple
1 Banana
2 Kiwi
3 Cheese
Or is possible use list comprehension
:
df1['Fruits'] = [x[0] if type(x) == list else x for x in df1['Fruits']]
print (df1)
Fruits
0 Apple
1 Banana
2 Kiwi
3 Cheese
Extracting an element of a list in a pandas column
You can use the str
accessor for lists, e.g.:
df_params['Gamma'].str[0]
This should work for all columns:
df_params.apply(lambda col: col.str[0])
Get first list element in apply function Pandas
Do you want:
df['runs'] = df['name'].apply(preprocess_names).str[0]
Python Pandas: selecting 1st element in array in all cells
You can use apply
and for selecting first value of list use indexing with str:
print (zz.apply(lambda x: x.str[0]))
0 1 2
0 1 2 3
1 4 1 4
2 1 2 3
3 4 1 4
Another solution with stack
and unstack
:
print (zz.stack().str[0].unstack())
0 1 2
0 1 2 3
1 4 1 4
2 1 2 3
3 4 1 4
How to replace a list with first element of list in pandas dataframe column?
As you have strings, you could use a regex here:
df['Country'] = df['Country'].str.extract('((?<=\[["\'])[^"\']*|^[^"\']+$)')
output (as a new column for clarity):
Name Country Country2
0 Harry USA USA
1 Sam ['USA', 'UK', 'India'] USA
2 Raj ['India', 'USA'] India
3 Jamie Russia Russia
4 Rupert China China
regex:
( # start capturing
(?<=\[["\']) # if preceded by [" or ['
[^"\']* # get all text until " or '
| # OR
^[^"\']+$ # get whole string if it doesn't contain " or '
) # stop capturing
Get first row value of a given column
To select the ith
row, use iloc
:
In [31]: df_test.iloc[0]
Out[31]:
ATime 1.2
X 2.0
Y 15.0
Z 2.0
Btime 1.2
C 12.0
D 25.0
E 12.0
Name: 0, dtype: float64
To select the ith value in the Btime
column you could use:
In [30]: df_test['Btime'].iloc[0]
Out[30]: 1.2
There is a difference between df_test['Btime'].iloc[0]
(recommended) and df_test.iloc[0]['Btime']
:
DataFrames store data in column-based blocks (where each block has a single
dtype). If you select by column first, a view can be returned (which is
quicker than returning a copy) and the original dtype is preserved. In contrast,
if you select by row first, and if the DataFrame has columns of different
dtypes, then Pandas copies the data into a new Series of object dtype. So
selecting columns is a bit faster than selecting rows. Thus, althoughdf_test.iloc[0]['Btime']
works, df_test['Btime'].iloc[0]
is a little bit
more efficient.
There is a big difference between the two when it comes to assignment.df_test['Btime'].iloc[0] = x
affects df_test
, but df_test.iloc[0]['Btime']
may not. See below for an explanation of why. Because a subtle difference in
the order of indexing makes a big difference in behavior, it is better to use single indexing assignment:
df.iloc[0, df.columns.get_loc('Btime')] = x
df.iloc[0, df.columns.get_loc('Btime')] = x
(recommended):
The recommended way to assign new values to a
DataFrame is to avoid chained indexing, and instead use the method shown by
andrew,
df.loc[df.index[n], 'Btime'] = x
or
df.iloc[n, df.columns.get_loc('Btime')] = x
The latter method is a bit faster, because df.loc
has to convert the row and column labels to
positional indices, so there is a little less conversion necessary if you usedf.iloc
instead.
df['Btime'].iloc[0] = x
works, but is not recommended:
Although this works, it is taking advantage of the way DataFrames are currently implemented. There is no guarantee that Pandas has to work this way in the future. In particular, it is taking advantage of the fact that (currently) df['Btime']
always returns a
view (not a copy) so df['Btime'].iloc[n] = x
can be used to assign a new value
at the nth location of the Btime
column of df
.
Since Pandas makes no explicit guarantees about when indexers return a view versus a copy, assignments that use chained indexing generally always raise a SettingWithCopyWarning
even though in this case the assignment succeeds in modifying df
:
In [22]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [24]: df['bar'] = 100
In [25]: df['bar'].iloc[0] = 99
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
In [26]: df
Out[26]:
foo bar
0 A 99 <-- assignment succeeded
2 B 100
1 C 100
df.iloc[0]['Btime'] = x
does not work:
In contrast, assignment with df.iloc[0]['bar'] = 123
does not work because df.iloc[0]
is returning a copy:
In [66]: df.iloc[0]['bar'] = 123
/home/unutbu/data/binky/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
In [67]: df
Out[67]:
foo bar
0 A 99 <-- assignment failed
2 B 100
1 C 100
Warning: I had previously suggested df_test.ix[i, 'Btime']
. But this is not guaranteed to give you the ith
value since ix
tries to index by label before trying to index by position. So if the DataFrame has an integer index which is not in sorted order starting at 0, then using ix[i]
will return the row labeled i
rather than the ith
row. For example,
In [1]: df = pd.DataFrame({'foo':list('ABC')}, index=[0,2,1])
In [2]: df
Out[2]:
foo
0 A
2 B
1 C
In [4]: df.ix[1, 'foo']
Out[4]: 'C'
Related Topics
How to Detect the Python Version at Runtime
Pip Issue Installing Almost Any Library
Failed to Get Convolution Algorithm. This Is Probably Because Cudnn Failed to Initialize,
Python Random Sample with a Generator/Iterable/Iterator
Extrapolate Values in Pandas Dataframe
Extract a Page from a PDF as a Jpeg
Is' Operator Behaves Differently When Comparing Strings with Spaces
Django Datetime Issues (Default=Datetime.Now())
Schedule Python Script - Windows 7
Why Is Looping Over Range() in Python Faster Than Using a While Loop
Python String 'In' Operator Implementation Algorithm and Time Complexity
Split String into Strings by Length