How to use `assign()` or `get()` on specific named column of a dataframe?
lets assume that we have a valid data.frame with 50 rows in each
dat2 <- data.frame(c1 = 1:50, VAR1 = 51:100)
1 . Don't use assign
and get
if you can avoid it.
"dat2[,"VAR1"]"
is not valid in R
.
You can also note this from the help page for assign
assign does not dispatch assignment methods, so it cannot be used to
set elements of vectors, names, attributes, etc.Note that assignment to an attached list or data frame changes the
attached copy and not the original object: see attach and with.
A column of a data.frame is an element of a list
What you are looking for is [[<-
# assign the values from column (named element of the list) `VAR1`
j <- dat2[['VAR1']]
If you want to assign new values to VAR1
within dat2
,
dat2[['VAR1']] <- 1:50
The answer to your question....
To manipulate entirely using character strings using get
and assign
assign('dat2', `[[<-`(get('dat2'), 'VAR1', value = 2:51))
Other approaches
data.table::set
if you want to assign by reference within a data.frame
or data.table
(replacing an existing column only) then set
from the data.table
package works (even with data.frames
)
library(data.table)
set(dat2, j = 'VAR1', value = 5:54)
eval
and bquote
dat1 <- data.frame(x=1:5)
dat2 <- data.frame(x=2:6)
for(x in sapply(c('dat1','dat2'),as.name)) {
eval(bquote(.(x)[['VAR1']] <- 2:6))
}
eapply
Or if you use a separate environment
ee <- new.env()
ee$dat1 <- dat1
ee$dat2 <- dat2
# eapply returns a list, so use list2env to assign back to ee
list2env(eapply(ee, `[[<-`, 'y', value =1:5), envir = ee)
Changing a specific column name in pandas DataFrame
A one liner does exist:
In [27]: df=df.rename(columns = {'two':'new_name'})
In [28]: df
Out[28]:
one three new_name
0 1 a 9
1 2 b 8
2 3 c 7
3 4 d 6
4 5 e 5
Following is the docstring for the rename
method.
Definition: df.rename(self, index=None, columns=None, copy=True, inplace=False)
Docstring:
Alter index and / or columns using input function or
functions. Function / dict values must be unique (1-to-1). Labels not
contained in a dict / Series will be left as-is.
Parameters
----------
index : dict-like or function, optional
Transformation to apply to index values
columns : dict-like or function, optional
Transformation to apply to column values
copy : boolean, default True
Also copy underlying data
inplace : boolean, default False
Whether to return a new DataFrame. If True then value of copy is
ignored.
See also
--------
Series.rename
Returns
-------
renamed : DataFrame (new object)
Convert Select Columns in Pandas Dataframe to Numpy Array
The columns
parameter accepts a collection of column names. You're passing a list containing a dataframe with two rows:
>>> [df[1:]]
[ viz a1_count a1_mean a1_std
1 n 0 NaN NaN
2 n 2 51 50]
>>> df.as_matrix(columns=[df[1:]])
array([[ nan, nan],
[ nan, nan],
[ nan, nan]])
Instead, pass the column names you want:
>>> df.columns[1:]
Index(['a1_count', 'a1_mean', 'a1_std'], dtype='object')
>>> df.as_matrix(columns=df.columns[1:])
array([[ 3. , 2. , 0.816497],
[ 0. , nan, nan],
[ 2. , 51. , 50. ]])
Set value for particular cell in pandas DataFrame using index
RukTech's answer, df.set_value('C', 'x', 10)
, is far and away faster than the options I've suggested below. However, it has been slated for deprecation.
Going forward, the recommended method is .iat/.at
.
Why df.xs('C')['x']=10
does not work:
df.xs('C')
by default, returns a new dataframe with a copy of the data, so
df.xs('C')['x']=10
modifies this new dataframe only.
df['x']
returns a view of the df
dataframe, so
df['x']['C'] = 10
modifies df
itself.
Warning: It is sometimes difficult to predict if an operation returns a copy or a view. For this reason the docs recommend avoiding assignments with "chained indexing".
So the recommended alternative is
df.at['C', 'x'] = 10
which does modify df
.
In [18]: %timeit df.set_value('C', 'x', 10)
100000 loops, best of 3: 2.9 µs per loop
In [20]: %timeit df['x']['C'] = 10
100000 loops, best of 3: 6.31 µs per loop
In [81]: %timeit df.at['C', 'x'] = 10
100000 loops, best of 3: 9.2 µs per loop
Check which columns in DataFrame are Categorical
You could use df._get_numeric_data()
to get numeric columns and then find out categorical columns
In [66]: cols = df.columns
In [67]: num_cols = df._get_numeric_data().columns
In [68]: num_cols
Out[68]: Index([u'0', u'1', u'2'], dtype='object')
In [69]: list(set(cols) - set(num_cols))
Out[69]: ['3', '4']
Index must be called with a collection of some kind: assign column name to dataframe
Documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
columns : Index or array-like
Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided
Example:
df3 = DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
Try to use:
pd.DataFrame(reweightTarget, columns=['t'])
Renaming column names in Pandas
Just assign it to the .columns
attribute:
>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df
$a $b
0 1 10
1 2 20
>>> df.columns = ['a', 'b']
>>> df
a b
0 1 10
1 2 20
Pandas - add value at specific iloc into new dataframe column
There are two steps to created & populate a new column using only a row number...
(in this approach iloc is not used)
First, get the row index value by using the row number
rowIndex = df.index[someRowNumber]
Then, use row index with the loc function to reference the specific row and add the new column / value
df.loc[rowIndex, 'New Column Title'] = "some value"
These two steps can be combine into one line as follows
df.loc[df.index[someRowNumber], 'New Column Title'] = "some value"
Related Topics
Write a Data Frame to CSV File Without Column Header in R
Exporting R Regression Summary for Publishable Paper
Names' Attribute Must Be the Same Length as the Vector
Align Axis Label on the Right with Ggplot2
Combine Voronoi Polygons and Maps
Alternate Geom_Text Position with Hjust
Have Lubridate Subtraction Return Only a Numeric Value
Ggplot2 Each Group Consists of Only One Observation
Using 'Fread' to Import CSV File from an Archive into 'R' Without Extracting to Disk
How to Change Strip.Text Labels in Ggplot with Facet and Margin=True
Ggplot2 Error "No Layers in Plot"
Applying the Optim Function in R in C++ with Rcpp
The Art of R Programming:Where Else Could I Find the Information
Ggplot2: Add P-Values to the Plot
Can Ggplot Make 2D Summaries of Data