split column in data.table to multiple rows
There's a function in the package splitstackshape
called cSplit
which is perfectly suited for this task. Simply pass ";" as the separator and "long" as the direction to get what we need.
> library(splitstackshape)
> dat <- data.frame(V1 = c("x", "y", "z"), V2 = c("b;c;d", "d;ef", "d;ef"), V3 = 1:3, stringsAsFactors = FALSE)
> cSplit(dat, "V2", sep = ";", direction = "long")
# V1 V2 V3
# 1: x b 1
# 2: x c 1
# 3: x d 1
# 4: y d 2
# 5: y ef 2
# 6: z d 3
# 7: z ef 3
R data.table split a row into multiple rows based on string values
Here's one approach with tstrsplit
that should work for you:
library(data.table)
dt[, lapply(.SD, function(x) unlist(tstrsplit(x, "; ?"))),
.SDcols = "sha",by = c("title","date")]
title date sha
1: First Title 1/1/2020 12345
2: Second Title 1/2/2020 2345
3: Second Title 1/2/2020 66543
4: Second Title 1/2/2020 33423
5: Third Title 1/3/2020 22222
6: Third Title 1/3/2020 12345678
7: Fourth Title 1/4/2020 666662345
8: Fourth Title 1/4/2020 444
Data
dt <- data.table("title"=c("First Title", "Second Title", "Third Title", "Fourth Title"),
"sha"=c("12345", "2345; 66543; 33423", "22222; 12345678;", "666662345; 444"),
"date" = c("1/1/2020","1/2/2020","1/3/2020","1/4/2020"))
Fast data.table column split to multiple rows based on delimiter
We can use tstrsplit
on the third column to split into multiple columns and assign (:=
) the output to column names of interest
data[, paste0("V", 1:3) := tstrsplit(`Peptide IDs`, ";", type.convert = TRUE)]
If we need the 'long' format
library(splitstackshape)
cSplit(data, "Peptide IDs", ";", "long")
Splitting a column into multiple rows
You can first split Code
column on comma ,
then explode
it to get the desired output.
df['Code']=df['Code'].str.split(',')
df=df.explode('Code')
OUTPUT:
ID A B C D Code
0 1 a z s m AB
0 1 a z s m BC
0 1 a z s m A
1 2 b x d j AD
1 2 b x d j KL
2 3 c y w j AD
2 3 c y w j KL
3 4 a x h AB
3 4 a x h BC
4 5 b y s m A
5 6 b z s h A
6 7 c x s h B
If needed, you can replace empty string by NaN
Split values from many columns accordingly over multiple rows
You may try to transform the values in the L_VALUE
, H_VALUE
and UNIT
columns as JSON (-10;25
into ["-10", "-25"]
) and parse the values with additional OPENJSON()
call. The result from the second OPENJSON()
is a table with columns key
, value
and type
and in case of an array, the key
column contains the index of each item in the JSON array, so you need an appropriate JOIN
s:
Table and JSON:
DECLARE @JsonData NVARCHAR(MAX);
SET @JsonData = N'[
{"id": 1, "lval": "-10;15", "hval": "-20;45", "unit": "kg;m"},
{"id": 2, "lval": "-10;15;13", "hval": "-20;45;55", "unit": "kg;m;cm"},
{"id": 3, "lval": "-10", "hval": "-20", "unit": "kg"}
]';
DECLARE @ExampleTable TABLE (
EQ BIGINT,
L_VALUE NVARCHAR(100),
H_VALUE NVARCHAR(100),
UNIT NVARCHAR (30)
)
Statement:
INSERT INTO @ExampleTable
SELECT j.[EQ], a.[L_VALUE], a.[H_VALUE], a.[UNIT]
FROM OPENJSON(@JsonData) WITH (
[EQ] BIGINT 'strict $.id',
[L_VALUE] NVARCHAR(100) '$.lval',
[H_VALUE] NVARCHAR(100) '$.hval',
[UNIT] NVARCHAR(20) '$.unit'
) j
CROSS APPLY (
SELECT l.[value], h.[value], u.[value]
FROM OPENJSON(CONCAT('["', REPLACE(j.L_VALUE, ';', '","'), '"]')) l
JOIN OPENJSON(CONCAT('["', REPLACE(j.H_VALUE, ';', '","'), '"]')) h ON l.[key] = h.[key]
JOIN OPENJSON(CONCAT('["', REPLACE(j.UNIT, ';', '","'), '"]')) u ON l.[key] = u.[key]
) a (L_VALUE, H_VALUE, UNIT)
Result:
EQ L_VALUE H_VALUE UNIT
----------------------
1 -10 -20 kg
1 15 45 m
2 -10 -20 kg
2 15 45 m
2 13 55 cm
3 -10 -20 kg
How to split single row values into multiple rows based on columns c#
I expect that your test csv file looks like that:
USA;UK;Australia;Michael;Mitchell;David;222;333;444
Colombia;Paraguay;Bolivia;;John;Chris;;555;7777
Brazil;Germany;Japan;Silvia;Ana;;888;999;;
You will get what you want in modifiedData
variable:
using System.Collections.Generic;
using System.IO;
namespace CsvMod
{
public class OriginalData
{
public string Country1 { get; set; }
public string Country2 { get; set; }
public string Country3 { get; set; }
public string Name1 { get; set; }
public string Name2 { get; set; }
public string Name3 { get; set; }
public string Phone1 { get; set; }
public string Phone2 { get; set; }
public string Phone3 { get; set; }
}
public class ModifiedData
{
public string Country1 { get; set; }
public string Country2 { get; set; }
public string Country3 { get; set; }
public string Name { get; set; }
public string Phone { get; set; }
}
class Program
{
static void Main(string[] args)
{
var csvLines = File.ReadAllLines("test.csv");
var originalData = new List<OriginalData>();
foreach (var line in csvLines)
{
var items = line.Split(';');
originalData.Add(new OriginalData
{
Country1 = items[0],
Country2 = items[1],
Country3 = items[2],
Name1 = items[3],
Name2 = items[4],
Name3 = items[5],
Phone1 = items[6],
Phone2 = items[7],
Phone3 = items[8],
});
}
var modifiedData = new List<ModifiedData>();
foreach (var item in originalData)
{
modifiedData.AddRange(new List<ModifiedData>
{
new ModifiedData
{
Country1 = item.Country1,
Country2 = item.Country2,
Country3 = item.Country3,
Name = item.Name1,
Phone = item.Phone1,
},
new ModifiedData
{
Country1 = item.Country1,
Country2 = item.Country2,
Country3 = item.Country3,
Name = item.Name2,
Phone = item.Phone2,
},
new ModifiedData
{
Country1 = item.Country1,
Country2 = item.Country2,
Country3 = item.Country3,
Name = item.Name3,
Phone = item.Phone3,
},
});
}
}
}
}
Or if you really trust your data, then one LINQ statement and result
contains the same:
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace CsvMod
{
public class ModifiedData
{
public string Country1 { get; set; }
public string Country2 { get; set; }
public string Country3 { get; set; }
public string Name { get; set; }
public string Phone { get; set; }
}
class Program
{
static void Main(string[] args)
{
var csvLines = File.ReadAllLines("test.csv");
var result = csvLines.Aggregate(new List<ModifiedData>(), (acc, x) =>
{
var items = x.Split(';');
acc.AddRange(new List<ModifiedData>
{
new ModifiedData
{
Country1 = items[0],
Country2 = items[1],
Country3 = items[2],
Name = items[3],
Phone = items[6],
},
new ModifiedData
{
Country1 = items[0],
Country2 = items[1],
Country3 = items[2],
Name = items[4],
Phone = items[7],
},
new ModifiedData
{
Country1 = items[0],
Country2 = items[1],
Country3 = items[2],
Name = items[5],
Phone = items[8],
},
});
return acc;
});
}
}
}
Split delimited strings in multiple columns and separate them into rows
We may do this in an easier way if we make the delimiter same
library(dplyr)
library(tidyr)
library(stringr)
to_expand %>%
mutate(first = str_replace(first, "~", "|")) %>%
separate_rows(first, second, sep = "\\|")
# A tibble: 2 x 2
first second
<chr> <chr>
1 a 1~2~3
2 b 4~5~6
Splitting and creating 2 rows out of one in R data table
You may cbind
the splits to get a column which you cbind
again to the val
(recycling).
res <- do.call(rbind, Map(data.frame, id=lapply(strsplit(dat$id, "&&"), cbind),
val=dat$val))
res <- cbind(n=1:nrow(res), res)
res
# n id val
# 1 1 1 10
# 2 2 2 10
# 3 3 3 20
# 4 4 4 30
# 5 5 5 30
Related Topics
Is There an Alternative to "Revalue" Function from Plyr When Using Dplyr
How Many Elements in a Vector Are Greater Than X Without Using a Loop
Specifying the Colour Scale for Maps in Ggplot
Removing Attributes of Columns in Data.Frames on Multilevel Lists in R
Importing S3 Method from Another Package
Space Between Gpplot2 Horizontal Legend Elements
Split Concatenated Column to Corresponding Column Positions
How to Load Xlsx File Using Fread Function
Why Do Rapply and Lapply Handle Null Differently
Setting Working Directory: Julia Versus R
How to Let R Use All the Cores of the Computer
Including Images in R-Package Documentation (.Rd) Files
Change from Date and Hour Format to Numeric Format
How to Find Which Polygon a Point Belong to via Sf