How to Use 'Assign()' or 'Get()' on Specific Named Column of a Dataframe

How to use `assign()` or `get()` on specific named column of a dataframe?

lets assume that we have a valid data.frame with 50 rows in each

dat2 <- data.frame(c1 = 1:50, VAR1 = 51:100)

1 . Don't use assign and get if you can avoid it.

"dat2[,"VAR1"]" is not valid in R.

You can also note this from the help page for assign

assign does not dispatch assignment methods, so it cannot be used to
set elements of vectors, names, attributes, etc.

Note that assignment to an attached list or data frame changes the
attached copy and not the original object: see attach and with.

A column of a data.frame is an element of a list

What you are looking for is [[<-

# assign the values from column (named element of the list) `VAR1`
j <- dat2[['VAR1']]

If you want to assign new values to VAR1 within dat2,

dat2[['VAR1']] <- 1:50

The answer to your question....

To manipulate entirely using character strings using get and assign

assign('dat2', `[[<-`(get('dat2'), 'VAR1', value = 2:51))

Other approaches

data.table::set

if you want to assign by reference within a data.frame or data.table (replacing an existing column only) then set from the data.table package works (even with data.frames)

library(data.table)
set(dat2, j = 'VAR1', value = 5:54)

eval and bquote

dat1 <- data.frame(x=1:5)
dat2 <- data.frame(x=2:6)

for(x in sapply(c('dat1','dat2'),as.name)) {
eval(bquote(.(x)[['VAR1']] <- 2:6))
}

eapply

Or if you use a separate environment

ee <- new.env()
ee$dat1 <- dat1
ee$dat2 <- dat2

# eapply returns a list, so use list2env to assign back to ee
list2env(eapply(ee, `[[<-`, 'y', value =1:5), envir = ee)

Changing a specific column name in pandas DataFrame

A one liner does exist:

In [27]: df=df.rename(columns = {'two':'new_name'})

In [28]: df
Out[28]:
one three new_name
0 1 a 9
1 2 b 8
2 3 c 7
3 4 d 6
4 5 e 5

Following is the docstring for the rename method.


Definition: df.rename(self, index=None, columns=None, copy=True, inplace=False)
Docstring:
Alter index and / or columns using input function or
functions. Function / dict values must be unique (1-to-1). Labels not
contained in a dict / Series will be left as-is.

Parameters
----------
index : dict-like or function, optional
Transformation to apply to index values
columns : dict-like or function, optional
Transformation to apply to column values
copy : boolean, default True
Also copy underlying data
inplace : boolean, default False
Whether to return a new DataFrame. If True then value of copy is
ignored.

See also
--------
Series.rename

Returns
-------
renamed : DataFrame (new object)

Convert Select Columns in Pandas Dataframe to Numpy Array

The columns parameter accepts a collection of column names. You're passing a list containing a dataframe with two rows:

>>> [df[1:]]
[ viz a1_count a1_mean a1_std
1 n 0 NaN NaN
2 n 2 51 50]
>>> df.as_matrix(columns=[df[1:]])
array([[ nan, nan],
[ nan, nan],
[ nan, nan]])

Instead, pass the column names you want:

>>> df.columns[1:]
Index(['a1_count', 'a1_mean', 'a1_std'], dtype='object')
>>> df.as_matrix(columns=df.columns[1:])
array([[ 3. , 2. , 0.816497],
[ 0. , nan, nan],
[ 2. , 51. , 50. ]])

Set value for particular cell in pandas DataFrame using index

RukTech's answer, df.set_value('C', 'x', 10), is far and away faster than the options I've suggested below. However, it has been slated for deprecation.

Going forward, the recommended method is .iat/.at.


Why df.xs('C')['x']=10 does not work:

df.xs('C') by default, returns a new dataframe with a copy of the data, so

df.xs('C')['x']=10

modifies this new dataframe only.

df['x'] returns a view of the df dataframe, so

df['x']['C'] = 10

modifies df itself.

Warning: It is sometimes difficult to predict if an operation returns a copy or a view. For this reason the docs recommend avoiding assignments with "chained indexing".


So the recommended alternative is

df.at['C', 'x'] = 10

which does modify df.


In [18]: %timeit df.set_value('C', 'x', 10)
100000 loops, best of 3: 2.9 µs per loop

In [20]: %timeit df['x']['C'] = 10
100000 loops, best of 3: 6.31 µs per loop

In [81]: %timeit df.at['C', 'x'] = 10
100000 loops, best of 3: 9.2 µs per loop

Check which columns in DataFrame are Categorical

You could use df._get_numeric_data() to get numeric columns and then find out categorical columns

In [66]: cols = df.columns

In [67]: num_cols = df._get_numeric_data().columns

In [68]: num_cols
Out[68]: Index([u'0', u'1', u'2'], dtype='object')

In [69]: list(set(cols) - set(num_cols))
Out[69]: ['3', '4']

Index must be called with a collection of some kind: assign column name to dataframe

Documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html

columns : Index or array-like

Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided

Example:

df3 = DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])

Try to use:

pd.DataFrame(reweightTarget, columns=['t'])

Renaming column names in Pandas

Just assign it to the .columns attribute:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df
$a $b
0 1 10
1 2 20

>>> df.columns = ['a', 'b']
>>> df
a b
0 1 10
1 2 20

Pandas - add value at specific iloc into new dataframe column

There are two steps to created & populate a new column using only a row number...
(in this approach iloc is not used)

First, get the row index value by using the row number

rowIndex = df.index[someRowNumber]

Then, use row index with the loc function to reference the specific row and add the new column / value

df.loc[rowIndex, 'New Column Title'] = "some value"

These two steps can be combine into one line as follows

df.loc[df.index[someRowNumber], 'New Column Title'] = "some value"


Related Topics



Leave a reply



Submit