How to Extract a Value (I Want an Int Not Row) from a Dataframe and Do Simple Calculations on It

How to extract a single (column/row) value from a dataframe using PySpark?

here is the alternative:

df.first()['column name']

it will give you the desired output. you can store it in a variable.

Return the int value for a specific column in a pandas data frame

In general, a condition of the form

DF.someCondition = condition

may be True more than once. That is why

DF[DF.someCondition=condition].A

returns an object of shape (1,) rather than a scalar value.
If you are certain that the condition is True only once, then you can extract the scalar value using item

DF[DF.someCondition=condition].A.item()

However, as MaxU suggested, it is better to use .loc to avoid chained-indexing:

DF.loc[DF.someCondition=condition, 'A'].item()

For example,

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(6).reshape(3,2), columns=list('AB'))
df[df['B']==3].A
# 1    2
# Name: A, dtype: int64

df.loc[df['B']==3, 'A'].item()
# 2

How to calculate and extract prices from timestamps in R

Up front, it might be a better path to convert your Date and Time fields into a single POSIXt-class object. This would be a good method if you have need for Date+Time to be a numeric-like field at some point (e.g., plotting something over time). It's not required, but in my experience I almost always have need to treat time numerically (and date usually needs to be there too).

If you don't want/need to change to POSIXt or Time class, you can do the below. (I added a couple of data rows in order to show multiple summary rows.)

base R

dat$min <- substr(dat$Time, 1, 5)
aggregate(dat$Price, dat[,c("Date","min")], function(Price) c(Start=Price[1], End=Price[length(Price)], Low=min(Price), High=max(Price)))
#       Date   min x.Start x.End x.Low x.High
# 1 19990104 14:11     220   215   200    221
# 2 19990104 14:12     229   209   209    229

dplyr

library(dplyr)
dat %>%
  arrange(Date, Time) %>%
  group_by(Date, min = substr(dat$Time, 1, 5)) %>%
  summarize(Time = min(Time), Start = first(Price), End = last(Price), Low = min(Price), High = max(Price)) %>%
  ungroup() %>%
  select(-min)
# # A tibble: 2 x 6
#       Date Time     Start   End   Low  High
#      <int> <chr>    <int> <int> <int> <int>
# 1 19990104 14:11:14   220   215   200   221
# 2 19990104 14:12:14   229   209   209   229

Data

dat <- structure(list(Date = c(19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L), Time = c("14:11:14", "14:11:21", "14:11:36", "14:11:45", "14:11:56", "14:12:14", "14:12:21"),     Price = c(220L, 200L, 221L, 202L, 215L, 229L, 209L)), class = "data.frame", row.names = c(NA, -7L))

Apply Math calculation to all rows of DF by Column Values

I would just gather all the Occ.. , Tot.. columns together and perform the required arithmetic

occ_cols <- grep("^Occ", names(df))
tot_cols <- grep("^Totl", names(df))

df[paste0("Probability_", 1:length(occ_cols))] <- 
      (df[occ_cols] + 1)/(df[tot_cols] + df$Unique_words)

df
#      word Occ_1 Occ_2 Occ_3 Totl_1 Totl_2 Totl_3 Unique_words Probability_1
#1      car     0     1     0     11      9      7           17    0.03571429
#2   saturn     2     0     2     11      9      7           17    0.10714286
#3 survival     1     2     0     11      9      7           17    0.07142857
#4 baseball     1     1     0     11      9      7           17    0.07142857
#5    color     0     0     1     11      9      7           17    0.03571429
#6   muscle     0     1     0     11      9      7           17    0.03571429

#  Probability_2 Probability_3
#1    0.07692308    0.04166667
#2    0.03846154    0.12500000
#3    0.11538462    0.04166667
#4    0.07692308    0.04166667
#5    0.03846154    0.08333333
#6    0.07692308    0.04166667

However, make sure all your Occ.. and Tot.. columns are in the same order. For this example, we have Occ_1, Occ_2, Occ_3 followed by Totl_1, Totl_2 and Totl_3.

How to Extract a Value (I Want an Int Not Row) from a Dataframe and Do Simple Calculations on It

How to extract a single (column/row) value from a dataframe using PySpark?

Return the int value for a specific column in a pandas data frame

How to calculate and extract prices from timestamps in R

Apply Math calculation to all rows of DF by Column Values

Related Topics

Leave a reply