reshape data to different columns based on value from single column
The easiest way to do this would be with dcast
from "data.table" or even reshape
from base R.
Assuming your vectors are collected in a data.frame
named "d", try the following:
library(data.table)
setDT(d)
x <- dcast(d, time ~ code, value.var = paste0("var", 1:3))
head(x)
# time var1_1 var1_2 var1_3 var2_1 var2_2 var2_3 var3_1 var3_2 var3_3
# 1: 8/10/2017 18:17 NA 66 NA NA 66 NA NA 132 NA
# 2: 8/10/2017 20:10 NA 38 NA NA 115 NA NA 71 NA
# 3: 8/10/2017 22:34 NA NA 11 NA NA 66 NA NA 44
# 4: 8/11/2017 10:21 5 NA NA 115 NA NA 38 NA NA
# 5: 8/11/2017 2:16 NA 60 NA NA 104 NA NA 77 NA
# 6: 8/11/2017 4:09 49 NA NA 126 NA NA 66 NA NA
OR
reshape(d, direction = "wide", idvar = "time", timevar = "code")
If you wanted to use the tidyverse
, you would need to first gather
, then create a new "times" variable, and then reshape to the wide format:
library(tidyverse)
d %>%
gather(variable, value, starts_with("var")) %>%
unite(key, code, variable) %>%
spread(key, value)
Reshape data for values in one column
dcast from the reshape2 package does this:
require(reshape2)
dcast(data, test ~ ID , value_var = 'test_result' )
# test 1 2 3 4 5
#1 A NA 9 11 NA NA
#2 B 10 NA NA NA NA
#3 C NA NA NA 7 NA
#4 F NA NA NA NA 5
Reshape table using column values as column names?
We need pivot_wider
after arrange
ing the rows by 'Year'. With pivot_wider
, it uses the same order of occurrence of data
library(dplyr)
library(tidyr)
df %>%
arrange(Year) %>%
pivot_wider(names_from = Year, values_from = N, values_fill = 0)
-output
# A tibble: 3 x 6
Organization `1999` `2008` `2009` `2010` `2011`
<chr> <int> <int> <int> <int> <int>
1 X 3 0 0 3 0
2 Z 0 5 0 0 5
3 Y 0 0 4 5 5
data
df <- structure(list(Organization = c("X", "X", "Y", "Y", "Y", "Z",
"Z"), Year = c(1999L, 2010L, 2009L, 2010L, 2011L, 2008L, 2011L
), N = c(3L, 3L, 4L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA,
-7L))
Reshape one column to multiple columns in R
Another idea is:
df %>%
group_by(Groups) %>%
mutate(index = row_number()) %>%
pivot_wider(names_from = "Groups", values_from = "Col1")
# A tibble: 2 x 4
index G1 G2 G3
<int> <int> <int> <int>
1 1 1 3 5
2 2 7 9 11
Can drop index
in the end
How can I reshape one column dataframe into 4 columns in R?
Easier with matrix
if the number of rows needed is a multiple of the total number of rows in the 'df'
as.data.frame( matrix(df$Value, 4, 4, byrow = TRUE))
Reshape data frame and conver values into columns
Functions spread
and gather
are deprecated in favor of pivot_wider
and pivot_longer
. Start by normalizing the data into 3NF / tidy form into a long form using pivot_longer
. Then you have just name (key), value pairs that can be put into multiple new columns using pivot_wider
:
library(tidyverse)
data <- structure(list(Segment.Number = c(
"Start Event", "Start Time",
"End Event", "End Time", "Segment Duration", "Total SCRs", "ER-SCRs",
"NS-SCRs", "Tonic SCL", "Mean SC", "Tonic Period"
), X1 = c(
"time.txt:start pressed (F1):BL1",
"60", "time.txt:start pressed (F1):BL2", "200", "140", "27",
"0", "27", "16.877020827457564", "17.325101513639225", "80.45693848354793"
), X2 = c(
"time.txt:start pressed (F1):F1", "215", "time.txt:start pressed (F1):F2",
"515", "300", "68", "1", "67", "18.507943774797333", "19.012163892375462",
"165.33022014676453"
), X3 = c(
"time.txt:start pressed (F1):W1",
"2040", "time.txt:start pressed (F1):W2", "2940", "900", "155",
"0", "155", "22.1224503921822", "22.600699854723032", "546.20937986219167"
), Path = c(
"Code1", "Code1", "Code1", "Code1", "Code1", "Code1",
"Code1", "Code1", "Code1", "Code1", "Code1"
)), row.names = c(
NA,
-11L
), class = "data.frame")
data %>%
as_tibble() %>%
# always normalize the data first
pivot_longer(c(X1, X2, X3), names_to = "Time") %>%
# format to desired shape
pivot_wider(names_from = Segment.Number)
#> # A tibble: 3 × 13
#> Path Time `Start Event` `Start Time` `End Event` `End Time` `Segment Durat…`
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Code1 X1 time.txt:sta… 60 time.txt:s… 200 140
#> 2 Code1 X2 time.txt:sta… 215 time.txt:s… 515 300
#> 3 Code1 X3 time.txt:sta… 2040 time.txt:s… 2940 900
#> # … with 6 more variables: `Total SCRs` <chr>, `ER-SCRs` <chr>,
#> # `NS-SCRs` <chr>, `Tonic SCL` <chr>, `Mean SC` <chr>, `Tonic Period` <chr>
Created on 2022-06-28 by the reprex package (v2.0.0)
Reshape data frame, so the index column values become the columns
You can transpose
the dataframe and then split
and set the new index:
Transpose
dft = df1.T
print(dft)
Cat V W X Y Z
Gender_Male 5 15 11 22 8
Gender_Female 4 12 15 18 7
Location_london 4 12 16 21 7
Location_North 2 7 4 9 4
Location_South 3 8 6 9 4
Split and set the new index
dft.index = dft.index.str.split('_', expand=True)
dft.columns.name = None
print(dft)
V W X Y Z
Gender Male 5 15 11 22 8
Female 4 12 15 18 7
Location london 4 12 16 21 7
North 2 7 4 9 4
South 3 8 6 9 4
Reshape multiple value columns to wide format
Your best option is to reshape your data to long format, using melt
, and then to dcast
:
library(reshape2)
meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)
The first few lines of output:
expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000
Reshape and transform a dataframe/array from 32 * 32 columns by 16 rows to (32 * 16) by 32
Here are three different functions, the first uses Pandas methods (stacking). The second uses regular python lists, building the result row by row. And the final one uses numpy reshaping.
The numpy reshaping method is twice as efficient as the others with almost all computation time actually being spent converting the DataFrame to numpy array format and then back to pandas.
Here's a link to the notebook I used for this if you want to play around with the code.
def stack_image_df(image_df):
"""
Performance: 100 loops, best of 5: 19 ms per loop
"""
# create a MultiIndex indicating Row and Column information for each image
row_col_index = pd.MultiIndex.from_tuples(
[(i // 32, i % 32) for i in range(0, 1024)], name=["row", "col"]
)
image_df.columns = row_col_index
image_df.index = range(1, 17)
image_df.index.name = "Image"
# Use MultiIndex to reshape data
return image_df.stack(level=1).T
def build_image_df(image_df):
"""
Performance: 10 loops, best of 5: 19.2 ms per loop
"""
image_data = image_df.values.tolist()
reshaped = []
for r_num in range(0, 32):
row = []
for image_num in range(0, 16):
# for each image
for c_num in range(0, 32):
# get the corresponding index in the raw data
# and add the pixel data to the row we're building
raw_index = r_num * 32 + c_num
pixel = image_data[image_num][raw_index]
row.append(pixel)
reshaped.append(row)
reshaped_df = pd.DataFrame(reshaped)
return reshaped_df
def reshape_image_df(image_df):
"""
Performance: 100 loops, best of 5: 9.56 ms per loop
Note: numpy methods only account for 0.82 ms of this
"""
return pd.DataFrame(
np.rot90(np.fliplr(raw_df.to_numpy().reshape(512, 32)))
)
Related Topics
R: How to Judge Date in the Same Week
How to Plot Charts with Nested Categories Axes
Retain Numerical Precision in an R Data Frame
Variable Results with Dplyr Summarise, Depending on Output Variable Naming
R Cmd Check Not Looking for Gcc in Rtools Directory
Web Scraping Data Table with R Rvest
Npc Coordinates of Geom_Point in Ggplot2
As.Posixct with Datetimes Including Midnight
Add Multiple Curves/Functions to One Ggplot Through Looping
Cannot Install Library(Xlsx) in R and Look for an Alternative
Replace Na with Grouped Means in R
R Read Abbreviated Month Form a Date That Is Not in English
Drawing Journey Path Using Leaflet in R
Object Not Found Error with Ggplot2 When Adding Shape Aesthetic
Can Ggplot2 Find the Intersections - or Is There Any Other Neat Way
Distance Calculation on Large Vectors [Performance]