Reshape Data for Values in One Column

reshape data to different columns based on value from single column

The easiest way to do this would be with dcast from "data.table" or even reshape from base R.

Assuming your vectors are collected in a data.frame named "d", try the following:

library(data.table)
setDT(d)
x <- dcast(d, time ~ code, value.var = paste0("var", 1:3))
head(x)
# time var1_1 var1_2 var1_3 var2_1 var2_2 var2_3 var3_1 var3_2 var3_3
# 1: 8/10/2017 18:17 NA 66 NA NA 66 NA NA 132 NA
# 2: 8/10/2017 20:10 NA 38 NA NA 115 NA NA 71 NA
# 3: 8/10/2017 22:34 NA NA 11 NA NA 66 NA NA 44
# 4: 8/11/2017 10:21 5 NA NA 115 NA NA 38 NA NA
# 5: 8/11/2017 2:16 NA 60 NA NA 104 NA NA 77 NA
# 6: 8/11/2017 4:09 49 NA NA 126 NA NA 66 NA NA

OR

reshape(d, direction = "wide", idvar = "time", timevar = "code")

If you wanted to use the tidyverse, you would need to first gather, then create a new "times" variable, and then reshape to the wide format:

library(tidyverse)
d %>%
gather(variable, value, starts_with("var")) %>%
unite(key, code, variable) %>%
spread(key, value)

Reshape data for values in one column

dcast from the reshape2 package does this:

require(reshape2)
dcast(data, test ~ ID , value_var = 'test_result' )

# test 1 2 3 4 5
#1 A NA 9 11 NA NA
#2 B 10 NA NA NA NA
#3 C NA NA NA 7 NA
#4 F NA NA NA NA 5

Reshape table using column values as column names?

We need pivot_wider after arrangeing the rows by 'Year'. With pivot_wider, it uses the same order of occurrence of data

library(dplyr)
library(tidyr)
df %>%
arrange(Year) %>%
pivot_wider(names_from = Year, values_from = N, values_fill = 0)

-output

# A tibble: 3 x 6
Organization `1999` `2008` `2009` `2010` `2011`
<chr> <int> <int> <int> <int> <int>
1 X 3 0 0 3 0
2 Z 0 5 0 0 5
3 Y 0 0 4 5 5

data

df <- structure(list(Organization = c("X", "X", "Y", "Y", "Y", "Z", 
"Z"), Year = c(1999L, 2010L, 2009L, 2010L, 2011L, 2008L, 2011L
), N = c(3L, 3L, 4L, 5L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA,
-7L))

Reshape one column to multiple columns in R

Another idea is:

df %>%
group_by(Groups) %>%
mutate(index = row_number()) %>%
pivot_wider(names_from = "Groups", values_from = "Col1")

# A tibble: 2 x 4
index G1 G2 G3
<int> <int> <int> <int>
1 1 1 3 5
2 2 7 9 11

Can drop index in the end

How can I reshape one column dataframe into 4 columns in R?

Easier with matrix if the number of rows needed is a multiple of the total number of rows in the 'df'

as.data.frame( matrix(df$Value, 4, 4, byrow = TRUE))

Reshape data frame and conver values into columns

Functions spread and gather are deprecated in favor of pivot_wider and pivot_longer. Start by normalizing the data into 3NF / tidy form into a long form using pivot_longer. Then you have just name (key), value pairs that can be put into multiple new columns using pivot_wider:

library(tidyverse)

data <- structure(list(Segment.Number = c(
"Start Event", "Start Time",
"End Event", "End Time", "Segment Duration", "Total SCRs", "ER-SCRs",
"NS-SCRs", "Tonic SCL", "Mean SC", "Tonic Period"
), X1 = c(
"time.txt:start pressed (F1):BL1",
"60", "time.txt:start pressed (F1):BL2", "200", "140", "27",
"0", "27", "16.877020827457564", "17.325101513639225", "80.45693848354793"
), X2 = c(
"time.txt:start pressed (F1):F1", "215", "time.txt:start pressed (F1):F2",
"515", "300", "68", "1", "67", "18.507943774797333", "19.012163892375462",
"165.33022014676453"
), X3 = c(
"time.txt:start pressed (F1):W1",
"2040", "time.txt:start pressed (F1):W2", "2940", "900", "155",
"0", "155", "22.1224503921822", "22.600699854723032", "546.20937986219167"
), Path = c(
"Code1", "Code1", "Code1", "Code1", "Code1", "Code1",
"Code1", "Code1", "Code1", "Code1", "Code1"
)), row.names = c(
NA,
-11L
), class = "data.frame")

data %>%
as_tibble() %>%
# always normalize the data first
pivot_longer(c(X1, X2, X3), names_to = "Time") %>%
# format to desired shape
pivot_wider(names_from = Segment.Number)
#> # A tibble: 3 × 13
#> Path Time `Start Event` `Start Time` `End Event` `End Time` `Segment Durat…`
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Code1 X1 time.txt:sta… 60 time.txt:s… 200 140
#> 2 Code1 X2 time.txt:sta… 215 time.txt:s… 515 300
#> 3 Code1 X3 time.txt:sta… 2040 time.txt:s… 2940 900
#> # … with 6 more variables: `Total SCRs` <chr>, `ER-SCRs` <chr>,
#> # `NS-SCRs` <chr>, `Tonic SCL` <chr>, `Mean SC` <chr>, `Tonic Period` <chr>

Created on 2022-06-28 by the reprex package (v2.0.0)

Reshape data frame, so the index column values become the columns

You can transpose the dataframe and then split and set the new index:

Transpose

dft = df1.T
print(dft)

Cat V W X Y Z
Gender_Male 5 15 11 22 8
Gender_Female 4 12 15 18 7
Location_london 4 12 16 21 7
Location_North 2 7 4 9 4
Location_South 3 8 6 9 4

Split and set the new index

dft.index = dft.index.str.split('_', expand=True)
dft.columns.name = None
print(dft)

V W X Y Z
Gender Male 5 15 11 22 8
Female 4 12 15 18 7
Location london 4 12 16 21 7
North 2 7 4 9 4
South 3 8 6 9 4

Reshape multiple value columns to wide format

Your best option is to reshape your data to long format, using melt, and then to dcast:

library(reshape2)

meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)

The first few lines of output:

             expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000

Reshape and transform a dataframe/array from 32 * 32 columns by 16 rows to (32 * 16) by 32

Here are three different functions, the first uses Pandas methods (stacking). The second uses regular python lists, building the result row by row. And the final one uses numpy reshaping.

The numpy reshaping method is twice as efficient as the others with almost all computation time actually being spent converting the DataFrame to numpy array format and then back to pandas.

Here's a link to the notebook I used for this if you want to play around with the code.

def stack_image_df(image_df):
"""
Performance: 100 loops, best of 5: 19 ms per loop
"""
# create a MultiIndex indicating Row and Column information for each image
row_col_index = pd.MultiIndex.from_tuples(
[(i // 32, i % 32) for i in range(0, 1024)], name=["row", "col"]
)
image_df.columns = row_col_index

image_df.index = range(1, 17)
image_df.index.name = "Image"

# Use MultiIndex to reshape data
return image_df.stack(level=1).T

def build_image_df(image_df):
"""
Performance: 10 loops, best of 5: 19.2 ms per loop
"""
image_data = image_df.values.tolist()
reshaped = []
for r_num in range(0, 32):
row = []
for image_num in range(0, 16):
# for each image
for c_num in range(0, 32):
# get the corresponding index in the raw data
# and add the pixel data to the row we're building
raw_index = r_num * 32 + c_num
pixel = image_data[image_num][raw_index]
row.append(pixel)
reshaped.append(row)
reshaped_df = pd.DataFrame(reshaped)
return reshaped_df

def reshape_image_df(image_df):
"""
Performance: 100 loops, best of 5: 9.56 ms per loop
Note: numpy methods only account for 0.82 ms of this

"""
return pd.DataFrame(
np.rot90(np.fliplr(raw_df.to_numpy().reshape(512, 32)))
)



Related Topics



Leave a reply



Submit