How to Add New Calculated Variables to a Data Frame

Adding calculated column in Pandas

try df2['age_bmi'] = df.age * df.bmi.

You're trying to call the dataframe as a function, when you need to get the values of the columns, which you can access by key like a dictionary or by property if it's a lowercase name with no spaces that doesn't match a built-in DataFrame method.

Someone linked this in a comment the other day and it's pretty awesome. I recommend giving it a watch, even if you don't do the exercises: https://www.youtube.com/watch?v=5JnMutdy6Fw

How to add new calculated variables to a data frame

In a one liner with setNames:

setNames(as.data.frame(cbind(dat, dat^2)), c(names(dat), paste0(names(dat),'_2')))

# birds wolfs snakes birds_2 wolfs_2 snakes_2
#1 3 9 7 9 81 49
#2 3 8 4 9 64 16

Creating a new column to a data frame using a formula from another variable

If you want to evaluate an expression in the context, of a data frame, you can use with and within.

aa$z <- with(aa, x + y - 2)

or

aa <- within(aa, z <- x + y - 2)

Or, if your expression is in the form of a text string (you should see if there are other ways to write your code; evaluating arbitrary text strings can lead to lots of problems):

aa$z <- eval(parse(text="x + y - 2"), aa)

How can I add a new computed column in a dataframe?

Data:

In [5]: df
Out[5]:
YOB
0 1955
1 1965
2 1975
3 1985

you don't need an extra column TodaysDate - you can get it dynamically:

In [6]: df['Age'] = pd.datetime.now().year - df.YOB

In [7]: df
Out[7]:
YOB Age
0 1955 62
1 1965 52
2 1975 42
3 1985 32

Alternatively you can use DataFrame.eval() method:

In [16]: df
Out[16]:
YOB
0 1955
1 1965
2 1975
3 1985

In [17]: df.eval("Age = @pd.datetime.now().year - YOB", inplace=True)

In [18]: df
Out[18]:
YOB Age
0 1955 62
1 1965 52
2 1975 42
3 1985 32

Creating a new column in dataframe using row values as variables in a function in R

df %>%
rowwise() %>%
mutate(IMPLIEDVOLATILITY =
RQuantLib::AmericanOptionImpliedVolatility(type="call",
value = LAST,
underlying = CURRENTPRICE,
strike = STRIKEPRICE,
dividendYield = 0.00,
riskFreeRate =.03,
maturity = YEARSTOEXPIRATION,
volatility=0.2))

LAST CURRENTPRICE STRIKEPRICE YEARSTOEXPIRATI~ IMPLIEDVOLATILI~
<dbl> <dbl> <dbl> <dbl> <AmrcnOIV>
1 3.4 464. 461 0.00274 0.12058321
2 2.52 464. 462 0.00274 0.11218994
3 1.82 464. 463 0.00274 0.11577334
4 1.16 464. 464 0.00274 0.10918985
5 0.69 464. 465 0.00274 0.10744424
6 0.36 464. 466 0.00274 0.10472401
7 4 464. 461 0.0110 0.09853768
8 3.21 464. 462 0.0110 0.09443249
9 2.54 464. 463 0.0110 0.09343687
10 1.93 464. 464 0.0110 0.09130677

With base R, you could do:

transform(df, IMPLIEDVOLATILITY = 
Vectorize(RQuantLib::AmericanOptionImpliedVolatility)(type="call",
value = LAST,
underlying = CURRENTPRICE,
strike = STRIKEPRICE,
dividendYield = 0.00,
riskFreeRate =.03,
maturity = YEARSTOEXPIRATION,
volatility=0.2))

Python Pandas : How to add new calculated column to the next of specific column in the existing dataframe?

This is DataFrame.insert. Though really the condition would be after column B.

#df = pd.DataFrame({'A': [10, 0, 200, 50, 500],
# 'B': [50, 200, 70, 90, 800],
# 'C': [100, 500, 60, 300, 70]})

df.insert(loc=df.columns.get_loc('B')+1, # After column 'B'
column='new', # named `'new'`
value=(df['C'] - df['B'])/df['B']*100)

print(df)

A B new C
0 10 50 100.000000 100
1 0 200 150.000000 500
2 200 70 -14.285714 60
3 50 90 233.333333 300
4 500 800 -91.250000 70

Adding calculated column to dataframe causes error using lambda function

You can try to do it this way:

df['new column'] = df.apply(lambda x: some_function(x['c1'], var1, var2,... x['c2']), axis=1)

As mentioned in the comment, you cannot pass a whole Pandas Series or DataFrame to a dictionary. You need to do it element-wise. Also, those dict functions or custom functions are not designed to process in vectorized way of operations like numpy and pandas functions do.

With the use of .apply() like the above, you are passing the values of elements of each row of the dataframe to the custom function some_function() rather than passing the whole dataframe / series to the function as parameter inputs.

In particular, as you want to pass the values of df['c2'] to some_dict.get() and Python dict data type / object is not designed to work on a whole Pandas series (i.e. a Pandas column), we can bridge up this gap by passing the series broken down into element by element using this .apply() method on axis=1.

You can define some_function() in a way just like an ordinary function accepting only scalar values (not vector objects like pandas dataframe / series). E.g.

def some_function(c1_val, var1, var2,... c2_val):
...
value = some_dict.get(c2_val]) or some_dict[min(some_dict.keys(),
key = lambda key: abs(key - c2_val))]
....

Apply calculation over data.frame values,store in new data.frame - R

You could simply do a division since you already have the rowSums. The below code should suffice

cbind(l[1:2]/l[,3],l[3])
var1 var2 sum
a 1.000000000 0.0000000000 1
b 0.903225806 0.0967741935 62
c 0.555555556 0.4444444444 9

if you did not have the sums column then you could do:

 cbind(l[1:2]/(sm <- rowSums(l[1:2])), sum = sm)
var1 var2 sum
a 1.000000000 0.0000000000 1
b 0.903225806 0.0967741935 62
c 0.555555556 0.4444444444 9

Lastly if you are only interested in the proportionality, then you could use
prop.table

prop.table(as.matrix(l[1:2]),1)
var1 var2
a 1.000000000 0.0000000000
b 0.903225806 0.0967741935
c 0.555555556 0.4444444444

Calculation of a new variable in R

We can group by 'team', and then do the calculation to create a new column

library(dplyr)
df1 <- df1 %>%
group_by(team) %>%
mutate(new = (stat1/sum(stat1) + (stat2/sum(stat2)))) %>%
ungroup

-output

df1
# A tibble: 4 × 5
name team stat1 stat2 new
<chr> <chr> <int> <int> <dbl>
1 a aa 1 4 0.905
2 b aa 2 3 1.10
3 c bb 3 2 1.10
4 d bb 4 1 0.905

data

df1 <- structure(list(name = c("a", "b", "c", "d"), team = c("aa", "aa", 
"bb", "bb"), stat1 = 1:4, stat2 = 4:1), class = "data.frame",
row.names = c(NA,
-4L))


Related Topics



Leave a reply



Submit