How to Extract a Value (I Want an Int Not Row) from a Dataframe and Do Simple Calculations on It

How to extract a single (column/row) value from a dataframe using PySpark?

here is the alternative:

df.first()['column name']

it will give you the desired output. you can store it in a variable.

Return the int value for a specific column in a pandas data frame

In general, a condition of the form

DF.someCondition = condition

may be True more than once. That is why

DF[DF.someCondition=condition].A

returns an object of shape (1,) rather than a scalar value.
If you are certain that the condition is True only once, then you can extract the scalar value using item

DF[DF.someCondition=condition].A.item()

However, as MaxU suggested, it is better to use .loc to avoid chained-indexing:

DF.loc[DF.someCondition=condition, 'A'].item()

For example,

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(6).reshape(3,2), columns=list('AB'))
df[df['B']==3].A
# 1 2
# Name: A, dtype: int64

df.loc[df['B']==3, 'A'].item()
# 2

How to calculate and extract prices from timestamps in R

Up front, it might be a better path to convert your Date and Time fields into a single POSIXt-class object. This would be a good method if you have need for Date+Time to be a numeric-like field at some point (e.g., plotting something over time). It's not required, but in my experience I almost always have need to treat time numerically (and date usually needs to be there too).

If you don't want/need to change to POSIXt or Time class, you can do the below. (I added a couple of data rows in order to show multiple summary rows.)

base R
dat$min <- substr(dat$Time, 1, 5)
aggregate(dat$Price, dat[,c("Date","min")], function(Price) c(Start=Price[1], End=Price[length(Price)], Low=min(Price), High=max(Price)))
# Date min x.Start x.End x.Low x.High
# 1 19990104 14:11 220 215 200 221
# 2 19990104 14:12 229 209 209 229
dplyr
library(dplyr)
dat %>%
arrange(Date, Time) %>%
group_by(Date, min = substr(dat$Time, 1, 5)) %>%
summarize(Time = min(Time), Start = first(Price), End = last(Price), Low = min(Price), High = max(Price)) %>%
ungroup() %>%
select(-min)
# # A tibble: 2 x 6
# Date Time Start End Low High
# <int> <chr> <int> <int> <int> <int>
# 1 19990104 14:11:14 220 215 200 221
# 2 19990104 14:12:14 229 209 209 229

Data

dat <- structure(list(Date = c(19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L), Time = c("14:11:14", "14:11:21", "14:11:36", "14:11:45", "14:11:56", "14:12:14", "14:12:21"),     Price = c(220L, 200L, 221L, 202L, 215L, 229L, 209L)), class = "data.frame", row.names = c(NA, -7L))

Apply Math calculation to all rows of DF by Column Values

I would just gather all the Occ.. , Tot.. columns together and perform the required arithmetic

occ_cols <- grep("^Occ", names(df))
tot_cols <- grep("^Totl", names(df))

df[paste0("Probability_", 1:length(occ_cols))] <-
(df[occ_cols] + 1)/(df[tot_cols] + df$Unique_words)

df
# word Occ_1 Occ_2 Occ_3 Totl_1 Totl_2 Totl_3 Unique_words Probability_1
#1 car 0 1 0 11 9 7 17 0.03571429
#2 saturn 2 0 2 11 9 7 17 0.10714286
#3 survival 1 2 0 11 9 7 17 0.07142857
#4 baseball 1 1 0 11 9 7 17 0.07142857
#5 color 0 0 1 11 9 7 17 0.03571429
#6 muscle 0 1 0 11 9 7 17 0.03571429

# Probability_2 Probability_3
#1 0.07692308 0.04166667
#2 0.03846154 0.12500000
#3 0.11538462 0.04166667
#4 0.07692308 0.04166667
#5 0.03846154 0.08333333
#6 0.07692308 0.04166667

However, make sure all your Occ.. and Tot.. columns are in the same order. For this example, we have Occ_1, Occ_2, Occ_3 followed by Totl_1, Totl_2 and Totl_3.



Related Topics



Leave a reply



Submit