Subset and split a dataframe into multiple dataframes based on two columns
Try using groupby
d={i: y for i , (x , y) in enumerate(df.groupby(grCols))}
d[0]
unique_id target value response scan plan filter flag
4 CTA15 21.0 22.4 32.4 T3 TROY 1 be
5 AC007 1.8 2.0 28.9 E1 TROY 0 be
Splitting dataframe into multiple dataframes
Firstly your approach is inefficient because the appending to the list on a row by basis will be slow as it has to periodically grow the list when there is insufficient space for the new entry, list comprehensions are better in this respect as the size is determined up front and allocated once.
However, I think fundamentally your approach is a little wasteful as you have a dataframe already so why create a new one for each of these users?
I would sort the dataframe by column 'name'
, set the index to be this and if required not drop the column.
Then generate a list of all the unique entries and then you can perform a lookup using these entries and crucially if you only querying the data, use the selection criteria to return a view on the dataframe without incurring a costly data copy.
Use pandas.DataFrame.sort_values
and pandas.DataFrame.set_index
:
# sort the dataframe
df.sort_values(by='name', axis=1, inplace=True)
# set the index to be this and don't drop
df.set_index(keys=['name'], drop=False,inplace=True)
# get a list of names
names=df['name'].unique().tolist()
# now we can perform a lookup on a 'view' of the dataframe
joe = df.loc[df.name=='joe']
# now you can query all 'joes'
Split a dataframe into multiple dataframes based on specific row value in R
You are probably looking for the split
function. I made a small example where I split every time the b
column is equal to a
(d<-data.frame(a=1:10, b=sample(letters[1:3], replace = T, size = 10)))
#> a b
#> 1 1 a
#> 2 2 a
#> 3 3 c
#> 4 4 b
#> 5 5 c
#> 6 6 b
#> 7 7 c
#> 8 8 b
#> 9 9 c
#> 10 10 a
d$f<-cumsum(d$b=='a')
lst<-split(d, d$f)
lst
#> $`1`
#> a b f
#> 1 1 a 1
#>
#> $`2`
#> a b f
#> 2 2 a 2
#> 3 3 c 2
#> 4 4 b 2
#> 5 5 c 2
#> 6 6 b 2
#> 7 7 c 2
#> 8 8 b 2
#> 9 9 c 2
#>
#> $`3`
#> a b f
#> 10 10 a 3
Created on 2021-10-05 by the reprex package (v2.0.1)
Split dataframe into several data frames within a list, each column separately
Try this tidyverse
approach. You can format your data to long to transform columns into rows. Then, with split()
you can create a list based on the column name. Finally, you can apply a function to transform your data to wide at each dataframe in the list and reach the desired output. Here the code:
library(tidyverse)
#Data
df <- data.frame(my_names=sample(LETTERS,4,replace=F),
column2=sample(1.3:100.3,4,replace=T),
column3=sample(1.3:100.3,4,replace=T),
column4=sample(1.3:100.3,4,replace=T),
column5=sample(1.3:100.3,4,replace=T))
#Reshape to long
df2 <- df %>% pivot_longer(cols = -1)
#Split into a list
List <- split(df2,df2$name)
#Now reshape function for wide format
List2 <- lapply(List,function(x){x<-pivot_wider(x,names_from = name,values_from = value);return(x)})
names(List2) <- paste0('df',1:length(List2))
Output:
List2
$df1
# A tibble: 4 x 2
my_names column2
<fct> <dbl>
1 N 21.3
2 H 35.3
3 X 42.3
4 U 89.3
$df2
# A tibble: 4 x 2
my_names column3
<fct> <dbl>
1 N 94.3
2 H 54.3
3 X 2.3
4 U 38.3
$df3
# A tibble: 4 x 2
my_names column4
<fct> <dbl>
1 N 75.3
2 H 94.3
3 X 87.3
4 U 100.
$df4
# A tibble: 4 x 2
my_names column5
<fct> <dbl>
1 N 60.3
2 H 88.3
3 X 14.3
4 U 99.3
Group dataframe by ID and then split it into multiple dataframes for each group
Creating the data frame:
ID = c("A", "B", "C", "A", "B", "C", "A", "B", "C")
Date = c("01/01/2022", "01/02/2022", "01/03/2022", "01/01/2022", "01/02/2022", "01/03/2022", "01/01/2022", "01/02/2022", "01/03/2022")
Value = c("45", "24", "33", "65", "24", "87", "51", "32", "72")
df <- data.frame(ID,Date,Value)
Splitting the data:
df_a <- df %>%
filter(ID =="A")
df_b <- df %>%
filter(ID =="B")
df_c <- df %>%
filter(ID =="C")
Printing the data:
Now just run the split data frames below:
df_a
df_b
df_c
This will give you the following output:
ID Date Value
1 A 01/01/2022 45
2 A 01/01/2022 65
3 A 01/01/2022 51
ID Date Value
1 B 01/02/2022 24
2 B 01/02/2022 24
3 B 01/02/2022 32
ID Date Value
1 C 01/03/2022 33
2 C 01/03/2022 87
3 C 01/03/2022 72
Related Topics
Stepwise Regression Using P-Values to Drop Variables with Nonsignificant P-Values
How to Copy and Paste Data into R from the Clipboard
Normalizing Y-Axis in Histograms in R Ggplot to Proportion
Calculate Correlation with Cor(), Only for Numerical Columns
How to Hold Figure Position with Figure Caption in PDF Output of Knitr
Convert a Dataframe to a Vector (By Rows)
How to Pass Command-Line Arguments When Calling Source() on an R File Within Another R File
Populating a Data Frame in R in a Loop
Ggplot2 Bar Plot, No Space Between Bottom of Geom and X Axis Keep Space Above
Simple Approach to Assigning Clusters for New Data After K-Means Clustering
How to Extract the Fill Colours from a Ggplot Object
How to Get a Barplot with Several Variables Side by Side Grouped by a Factor
How to Clear Only a Few Specific Objects from the Workspace
How to Match by Nearest Date from Two Data Frames
Difference Between If() and Ifelse() Functions
Is There Anything Wrong with Using T & F Instead of True & False