Convert data frame common rows to columns
a <- c(rep(1:10, 3))
b <- c(rep("aa", 10), rep("bb", 10), rep("cc", 10))
set.seed(123)
c <- sample(seq(from = 20, to = 50, by = 5), size = 30, replace = TRUE)
d <- data.frame(a,b, c)
#how to transpose it#
e<-reshape(d,idvar='a',timevar='b',direction='wide')
e
convert rows into columns according to the date that they have in common in R
Here is an option in tidyverse
where we create a grouping column based on the presence of 'Station Name:' string in 'Column1', create a new column by extracting the first
value of 'Column2' ('A', 'B', 'C'), then remove the first two rows as they are headers (slice
), rename
the column, and reshape to 'wide' format with pivot_wider
. If needed, arrange
the rows based on the 'Date' in ascending order
library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)
df %>%
group_by(grp = cumsum(str_detect(Column1, 'Station Name:'))) %>%
mutate(nm1 = first(Column2)) %>%
slice(-(1:2)) %>%
ungroup %>%
rename(Date = Column1) %>%
type.convert(as.is = TRUE) %>%
select(-grp) %>%
pivot_wider(names_from = nm1, values_from = Column2) %>%
arrange(dmy(Date))
-output
# A tibble: 7 x 4
# Date A B C
# <chr> <dbl> <dbl> <dbl>
#1 01/01/1999 NA NA 12.5
#2 02/01/1999 NA NA 8.39
#3 01/01/2000 2.9 1.19 NA
#4 02/01/2000 2.42 1.16 NA
#5 01/10/2009 NA NA 6.48
#6 07/03/2010 2.06 1.13 NA
#7 31/12/2020 1.92 1.08 9.87
Or in base R
with split/Reduce/merge
out <- type.convert(Reduce(function(...) merge(..., by = 'Date', all = TRUE),
lapply(split(df, cumsum(grepl('Station Name:', df$Column1))),
function(x) setNames(x, c("Date", x$Column2[1]))[-(1:2),])),
as.is = TRUE)
how to convert pandas data frame rows into columns
What you want is a called a pivot
:
df.pivot(*df).fillna(0).add_suffix('_Sales')
output:
Brand B1_Sales B2_Sales B3_Sales B4_Sales B5_Sales
ChannelPartnerID
10000 29630 38573 1530 21793 7155
10001 26477 42158 0 0 14612
10002 6649 0 0 6468 0
NB. df.pivot(*df)
is a shortcut for df.pivot(index='ChannelPartnerID', columns='Brand', values='Sales')
Converting rows to columns for a dataframe in R
Try
library(reshape2)
df
Date Time Object_Name Object_Value
1 7/28/2017 8:00 A1 58.56
2 7/28/2017 8:00 A2 51.66
3 7/28/2017 8:30 A1 60.20
4 7/28/2017 8:30 A2 65.20
dcast(df, Date + Time ~ Object_Name)
Date Time A1 A2
1 7/28/2017 8:00 58.56 51.66
2 7/28/2017 8:30 60.20 65.20
Alternatively,
library(tidyr)
spread(df, Object_Name, Object_Value)
Date Time A1 A2
1 7/28/2017 8:00 58.56 51.66
2 7/28/2017 8:30 60.20 65.20
To address the comment, the above works well if you have unique cases. Consider for instance the following:
df
Date Time Object_Name Object_Value
1 7/28/2017 8:00 A1 58.56
2 7/28/2017 8:00 A1 50.00
3 7/28/2017 8:00 A2 51.66
4 7/28/2017 8:30 A1 60.20
5 7/28/2017 8:30 A2 65.20
Look at the first two rows, and you can see that for the same date, time and Object_Name, we have two values. This implies that dcast
does not know what to do and gives the following warning: Aggregation function missing: defaulting to length
. We can handle this by specifying the aggregation function. For instance, let's take the mean of these values:
dcast(df, Date + Time ~ Object_Name, fun.aggregate = mean)
Date Time A1 A2
1 7/28/2017 8:00 54.28 51.66
2 7/28/2017 8:30 60.20 65.20
R - Convert and transpose data to columns by group
We can use tidyr::spread
library(tidyverse)
df %>% group_by(a) %>% mutate(n = 1:n()) %>% spread(a, b) %>% select(-n)
## A tibble: 5 x 3
# Group1 Group2 Group3
# <fct> <fct> <fct>
#1 Item1 Item4 Item9
#2 Item2 Item5 NA
#3 Item3 Item6 NA
#4 NA Item7 NA
#5 NA Item8 NA
Or if you prefer "--"
instead of NA
you can do (thanks @AntoniosK)
df %>%
group_by(a) %>%
mutate(n = 1:n()) %>%
spread(a, b) %>%
select(-n) %>%
mutate_all(~ifelse(is.na(.), "--", as.character(.)))
## A tibble: 5 x 3
# Group1 Group2 Group3
# <chr> <chr> <chr>
#1 Item1 Item4 Item9
#2 Item2 Item5 --
#3 Item3 Item6 --
#4 -- Item7 --
#5 -- Item8 --
or using tidyr::spread
s fill
argument
df %>%
mutate_if(is.factor, as.character) %>%
group_by(a) %>%
mutate(n = 1:n()) %>%
spread(a, b, fill = "--") %>%
select(-n)
giving the same result.
Sample data
a <- c("Group1", "Group1", "Group1", "Group2", "Group2", "Group2", "Group2", "Group2", "Group3")
b <- c("Item1", "Item2", "Item3", "Item4", "Item5", "Item6", "Item7", "Item8", "Item9")
df <- data.frame(a = a, b = b)
How to find common rows between two data frames?
You can use the following code:
c<- data.frame(A = c(4,6,7), B = c(5,9,8),C = c("T","T","F"))
d<- data.frame(A = c(6,7,3),B = c(9,8,3),C = c("T","F","F"))
merge(c, d, by= c("A", "B", "C"))
Output:
A B C
1 6 9 T
2 7 8 F
Related Topics
How to Plot Igraph Community with Defined Colors
How to Measure Area Between 2 Distribution Curves in R/Ggplot2
How to Extract Multiples of a Number from a Vector
2 Knitr/R Markdown/Rstudio Issues: Highcharts and Morris.Js
Fixing Variance Values in Lme4
Create and Call Linear Models from List
How to Import Only One Function from Another Package, Without Loading the Entire Namespace
How to Adjust the Font Size of Tablegrob
R - Scaling Numeric Values Only in a Dataframe with Mixed Types
In R, How to Find the Optimal Variable to Maximize or Minimize Correlation Between Several Datasets
Align Grob at Fixed Top/Center Position, Regardless of Size
R Plotly How to Get 3D Surface with Lat, Long and Z
Directly Adding Titles and Labels to Visnetwork
Error in Terms.Formula(Formula):'.' in Formula and No 'Data' Argument