How to extract a single (column/row) value from a dataframe using PySpark?
here is the alternative:
df.first()['column name']
it will give you the desired output. you can store it in a variable.
Return the int value for a specific column in a pandas data frame
In general, a condition of the form
DF.someCondition = condition
may be True more than once. That is why
DF[DF.someCondition=condition].A
returns an object of shape (1,)
rather than a scalar value.
If you are certain that the condition is True only once, then you can extract the scalar value using item
DF[DF.someCondition=condition].A.item()
However, as MaxU suggested, it is better to use .loc
to avoid chained-indexing:
DF.loc[DF.someCondition=condition, 'A'].item()
For example,
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(6).reshape(3,2), columns=list('AB'))
df[df['B']==3].A
# 1 2
# Name: A, dtype: int64
df.loc[df['B']==3, 'A'].item()
# 2
How to calculate and extract prices from timestamps in R
Up front, it might be a better path to convert your Date
and Time
fields into a single POSIXt
-class object. This would be a good method if you have need for Date+Time to be a numeric-like field at some point (e.g., plotting something over time). It's not required, but in my experience I almost always have need to treat time numerically (and date usually needs to be there too).
If you don't want/need to change to POSIXt
or Time
class, you can do the below. (I added a couple of data rows in order to show multiple summary rows.)
dat$min <- substr(dat$Time, 1, 5)
aggregate(dat$Price, dat[,c("Date","min")], function(Price) c(Start=Price[1], End=Price[length(Price)], Low=min(Price), High=max(Price)))
# Date min x.Start x.End x.Low x.High
# 1 19990104 14:11 220 215 200 221
# 2 19990104 14:12 229 209 209 229
dplyrlibrary(dplyr)
dat %>%
arrange(Date, Time) %>%
group_by(Date, min = substr(dat$Time, 1, 5)) %>%
summarize(Time = min(Time), Start = first(Price), End = last(Price), Low = min(Price), High = max(Price)) %>%
ungroup() %>%
select(-min)
# # A tibble: 2 x 6
# Date Time Start End Low High
# <int> <chr> <int> <int> <int> <int>
# 1 19990104 14:11:14 220 215 200 221
# 2 19990104 14:12:14 229 209 209 229
Data
dat <- structure(list(Date = c(19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L), Time = c("14:11:14", "14:11:21", "14:11:36", "14:11:45", "14:11:56", "14:12:14", "14:12:21"), Price = c(220L, 200L, 221L, 202L, 215L, 229L, 209L)), class = "data.frame", row.names = c(NA, -7L))
Apply Math calculation to all rows of DF by Column Values
I would just gather all the Occ..
, Tot..
columns together and perform the required arithmetic
occ_cols <- grep("^Occ", names(df))
tot_cols <- grep("^Totl", names(df))
df[paste0("Probability_", 1:length(occ_cols))] <-
(df[occ_cols] + 1)/(df[tot_cols] + df$Unique_words)
df
# word Occ_1 Occ_2 Occ_3 Totl_1 Totl_2 Totl_3 Unique_words Probability_1
#1 car 0 1 0 11 9 7 17 0.03571429
#2 saturn 2 0 2 11 9 7 17 0.10714286
#3 survival 1 2 0 11 9 7 17 0.07142857
#4 baseball 1 1 0 11 9 7 17 0.07142857
#5 color 0 0 1 11 9 7 17 0.03571429
#6 muscle 0 1 0 11 9 7 17 0.03571429
# Probability_2 Probability_3
#1 0.07692308 0.04166667
#2 0.03846154 0.12500000
#3 0.11538462 0.04166667
#4 0.07692308 0.04166667
#5 0.03846154 0.08333333
#6 0.07692308 0.04166667
However, make sure all your Occ..
and Tot..
columns are in the same order. For this example, we have Occ_1
, Occ_2
, Occ_3
followed by Totl_1
, Totl_2
and Totl_3
.
Related Topics
How to Divide Each Column of Pandas Dataframe by a Series
Discord Bot Messaging a User With a Specific User Id
How to Install Pip for a Specific Python Version
How to Remove/Delete a Virtualenv
Making a Dictionary from Each Line in a File
How to Remove Name and Dtype from Pandas Output
Printing a Multiplication Table With Nested Loops
Python Handling Socket.Error: [Errno 104] Connection Reset by Peer
How to Extract Column Value Within Square Brackets in Pyspark
How to Get Text from Span Tag in Beautifulsoup
Key Error When Selecting Columns in Pandas Dataframe After Read_Csv
How to Delete the Words Between Two Delimiters
How to Extract a Value (I Want an Int Not Row) from a Dataframe and Do Simple Calculations on It
Json Dump in Python Writing Newline Character and Carriage Returns in File.
Webdriverexception: Message: Unknown Error: Chrome Failed to Start: Crashed
How to Fill Empty Cell Value in Pandas With Condition
How to Compute the Gradients of Image Using Python
Vary the Color of Each Bar in Bargraph Using Particular Value