tidyr spread function generates sparse matrix when compact vector expected
The key here is that spread
doesn't aggregate the data.
Hence, if you hadn't already used xtabs
to aggregate first, you would be doing this:
a <- data.frame(P=c(F,T,F,T,F),A=c(F,F,T,T,T), Freq = 1) %>%
unite(S,A,P)
a
## S Freq
## 1 FALSE_FALSE 1
## 2 FALSE_TRUE 1
## 3 TRUE_FALSE 1
## 4 TRUE_TRUE 1
## 5 TRUE_FALSE 1
a %>% spread(S, Freq)
## FALSE_FALSE FALSE_TRUE TRUE_FALSE TRUE_TRUE
## 1 1 NA NA NA
## 2 NA 1 NA NA
## 3 NA NA 1 NA
## 4 NA NA NA 1
## 5 NA NA 1 NA
Which wouldn't make sense any other way (without aggregation).
This is predictable based on the help file for the fill
parameter:
If there isn't a value for every combination of the other variables
and the key column, this value will be substituted.
In your case, there aren't any other variables to combine with the key column. Had there been, then...
b <- data.frame(P=c(F,T,F,T,F),A=c(F,F,T,T,T), Freq = 1
, h = rep(c("foo", "bar"), length.out = 5)) %>%
unite(S,A,P)
b
## S Freq h
## 1 FALSE_FALSE 1 foo
## 2 FALSE_TRUE 1 bar
## 3 TRUE_FALSE 1 foo
## 4 TRUE_TRUE 1 bar
## 5 TRUE_FALSE 1 foo
> b %>% spread(S, Freq)
## Error: Duplicate identifiers for rows (3, 5)
...it would fail, because it can't aggregate rows 3 and 5 (because it isn't designed to).
The tidyr
/dplyr
way to do it would be group_by
and summarize
instead of xtabs
, because summarize
preserves the grouping column, hence spread
can tell which observations belong in the same row:
b %>% group_by(h, S) %>%
summarize(Freq = sum(Freq))
## Source: local data frame [4 x 3]
## Groups: h
##
## h S Freq
## 1 bar FALSE_TRUE 1
## 2 bar TRUE_TRUE 1
## 3 foo FALSE_FALSE 1
## 4 foo TRUE_FALSE 2
b %>% group_by(h, S) %>%
summarize(Freq = sum(Freq)) %>%
spread(S, Freq)
## Source: local data frame [2 x 5]
##
## h FALSE_FALSE FALSE_TRUE TRUE_FALSE TRUE_TRUE
## 1 bar NA 1 NA 1
## 2 foo 1 NA 2 NA
Why am I getting repeat rows with NAs using tidyr's spread function?
You could do price and cost separately and then merge (join) them (or cbind
them, depending on the specifics of your data):
x <- read.table(text = "Date State Price.Name Cost.Name Price Cost
+ Jan AZ firm1.price firm1.cost 100 50
+ Jan AZ firm2.price firm2.cost 200 100",header = TRUE,sep = "")
> x %>% select(-Cost,-Cost.Name) %>% spread(Price.Name,Price)
Date State firm1.price firm2.price
1 Jan AZ 100 200
> x %>% select(-Price,-Price.Name) %>% spread(Cost.Name,Cost)
Date State firm1.cost firm2.cost
1 Jan AZ 50 100
tidyr::spread() without creating separate rows?
I think you want something like this:
library(dplyr)
library(tidyr)
answer =
babynames %>%
filter(name == "Kerry") %>%
group_by(year, sex) %>%
summarize(n = sum(n)) %>%
spread(sex, n, fill = 0)
Replace the first few observations of a sparse matrix
Here is a tidy
solution.
dat_sparse <- dat %>%
as_tibble() %>%
count(col1, col2) %>%
spread(col2, n, fill = 0) %>%
column_to_rownames("col1") %>%
as.matrix() %>%
Matrix(., sparse = TRUE)
dat_sparse
Output:
group 1 . . . 1 . 1 . . 1 . . . . . . 1 1 . . 1 . . . . . . . . .
group 2 . 1 . . . . . . . 1 1 . . . 1 . . 1 1 . . . . 1 . . . 1 .
group 3 1 . 1 . . . 1 1 . . . 1 1 1 . . . . . . . 1 . . 1 1 . . 1
group 4 . . . . 1 . . . . . . . . . . . . . . . 1 . . . . . 1 . .
group 5 . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . .
Loop through each column and row, do stuff
How about this:
df.new = as.data.frame(lapply(df, function(x) ifelse(is.na(x), 0, 1)))
lapply
applies a function to each column of the data frame df
. In this case, the function does the 0/1 replacement. lapply
returns a list. Wrapping it in as.data.frame
converts the list to a data frame (which is a special type of list).
In R
you can often replace a loop with one of the *apply
family of functions. In this case, lapply
"loops" over the columns of the data frame. Also, many R
functions are "vectorized" meaning the function operates on every value in a vector at once. In this case, ifelse
does the replacement on an entire column of the data frame.
Related Topics
How to Extract Coefficients' Standard Error from an "Aov" Model
How to Manually Set Colours to a Categorical Variables Using Ggplot()
Applying Function (Ks.Test) Between Two Data Frames Column-Wise in R
How to Install Doredis Package Version 1.0.5 into R 3.0.1 on Windows
Merge Data Based on Nearest Date R
Getting Stargazer Column Labels to Print on Two or Three Lines
How to Remove Trailing Zeros in R Dataframe
Select List Element Programmatically Using Name Stored as String
Count Number of Values in Row Using Dplyr
Add Points to Usmap with Ggplot in R
Importing Multiple .Csv Files into R and Adding a New Column with File Name
Remove Whiskers in Box-Whisker-Plot
How to Keep The Only Intersection of The Spatial Features & Remove Everything Outside of a Boundary
How to Make Hyperlinks in The Pop-Up of a Tm_Bubbles Item
Ggplot2: How to Separate Geom_Polygon and Geom_Line in Legend Keys