Split Column in Data.Table to Multiple Rows

split column in data.table to multiple rows

There's a function in the package splitstackshape called cSplit which is perfectly suited for this task. Simply pass ";" as the separator and "long" as the direction to get what we need.

> library(splitstackshape)
> dat <- data.frame(V1 = c("x", "y", "z"), V2 = c("b;c;d", "d;ef", "d;ef"), V3 = 1:3, stringsAsFactors = FALSE)
> cSplit(dat, "V2", sep = ";", direction = "long")
# V1 V2 V3
# 1: x b 1
# 2: x c 1
# 3: x d 1
# 4: y d 2
# 5: y ef 2
# 6: z d 3
# 7: z ef 3

R data.table split a row into multiple rows based on string values

Here's one approach with tstrsplit that should work for you:

library(data.table)
dt[, lapply(.SD, function(x) unlist(tstrsplit(x, "; ?"))),
.SDcols = "sha",by = c("title","date")]
title date sha
1: First Title 1/1/2020 12345
2: Second Title 1/2/2020 2345
3: Second Title 1/2/2020 66543
4: Second Title 1/2/2020 33423
5: Third Title 1/3/2020 22222
6: Third Title 1/3/2020 12345678
7: Fourth Title 1/4/2020 666662345
8: Fourth Title 1/4/2020 444

Data

dt <- data.table("title"=c("First Title", "Second Title", "Third Title", "Fourth Title"), 
"sha"=c("12345", "2345; 66543; 33423", "22222; 12345678;", "666662345; 444"),
"date" = c("1/1/2020","1/2/2020","1/3/2020","1/4/2020"))

Fast data.table column split to multiple rows based on delimiter

We can use tstrsplit on the third column to split into multiple columns and assign (:=) the output to column names of interest

data[, paste0("V", 1:3) := tstrsplit(`Peptide IDs`, ";", type.convert = TRUE)] 

If we need the 'long' format

library(splitstackshape)
cSplit(data, "Peptide IDs", ";", "long")

Splitting a column into multiple rows

You can first split Code column on comma , then explode it to get the desired output.

df['Code']=df['Code'].str.split(',')
df=df.explode('Code')

OUTPUT:

  ID  A  B  C  D Code
0 1 a z s m AB
0 1 a z s m BC
0 1 a z s m A
1 2 b x d j AD
1 2 b x d j KL
2 3 c y w j AD
2 3 c y w j KL
3 4 a x h AB
3 4 a x h BC
4 5 b y s m A
5 6 b z s h A
6 7 c x s h B

If needed, you can replace empty string by NaN

Split values from many columns accordingly over multiple rows

You may try to transform the values in the L_VALUE, H_VALUE and UNIT columns as JSON (-10;25 into ["-10", "-25"]) and parse the values with additional OPENJSON() call. The result from the second OPENJSON() is a table with columns key, value and type and in case of an array, the key column contains the index of each item in the JSON array, so you need an appropriate JOINs:

Table and JSON:

DECLARE @JsonData NVARCHAR(MAX);
SET @JsonData = N'[
{"id": 1, "lval": "-10;15", "hval": "-20;45", "unit": "kg;m"},
{"id": 2, "lval": "-10;15;13", "hval": "-20;45;55", "unit": "kg;m;cm"},
{"id": 3, "lval": "-10", "hval": "-20", "unit": "kg"}
]';
DECLARE @ExampleTable TABLE (
EQ BIGINT,
L_VALUE NVARCHAR(100),
H_VALUE NVARCHAR(100),
UNIT NVARCHAR (30)
)

Statement:

INSERT INTO @ExampleTable
SELECT j.[EQ], a.[L_VALUE], a.[H_VALUE], a.[UNIT]
FROM OPENJSON(@JsonData) WITH (
[EQ] BIGINT 'strict $.id',
[L_VALUE] NVARCHAR(100) '$.lval',
[H_VALUE] NVARCHAR(100) '$.hval',
[UNIT] NVARCHAR(20) '$.unit'
) j
CROSS APPLY (
SELECT l.[value], h.[value], u.[value]
FROM OPENJSON(CONCAT('["', REPLACE(j.L_VALUE, ';', '","'), '"]')) l
JOIN OPENJSON(CONCAT('["', REPLACE(j.H_VALUE, ';', '","'), '"]')) h ON l.[key] = h.[key]
JOIN OPENJSON(CONCAT('["', REPLACE(j.UNIT, ';', '","'), '"]')) u ON l.[key] = u.[key]
) a (L_VALUE, H_VALUE, UNIT)

Result:

EQ L_VALUE H_VALUE UNIT
----------------------
1 -10 -20 kg
1 15 45 m
2 -10 -20 kg
2 15 45 m
2 13 55 cm
3 -10 -20 kg

How to split single row values into multiple rows based on columns c#

I expect that your test csv file looks like that:

USA;UK;Australia;Michael;Mitchell;David;222;333;444
Colombia;Paraguay;Bolivia;;John;Chris;;555;7777
Brazil;Germany;Japan;Silvia;Ana;;888;999;;

You will get what you want in modifiedData variable:

using System.Collections.Generic;
using System.IO;

namespace CsvMod
{
public class OriginalData
{
public string Country1 { get; set; }
public string Country2 { get; set; }
public string Country3 { get; set; }
public string Name1 { get; set; }
public string Name2 { get; set; }
public string Name3 { get; set; }
public string Phone1 { get; set; }
public string Phone2 { get; set; }
public string Phone3 { get; set; }
}

public class ModifiedData
{
public string Country1 { get; set; }
public string Country2 { get; set; }
public string Country3 { get; set; }
public string Name { get; set; }
public string Phone { get; set; }
}

class Program
{
static void Main(string[] args)
{
var csvLines = File.ReadAllLines("test.csv");

var originalData = new List<OriginalData>();

foreach (var line in csvLines)
{
var items = line.Split(';');

originalData.Add(new OriginalData
{
Country1 = items[0],
Country2 = items[1],
Country3 = items[2],
Name1 = items[3],
Name2 = items[4],
Name3 = items[5],
Phone1 = items[6],
Phone2 = items[7],
Phone3 = items[8],
});
}

var modifiedData = new List<ModifiedData>();

foreach (var item in originalData)
{
modifiedData.AddRange(new List<ModifiedData>
{
new ModifiedData
{
Country1 = item.Country1,
Country2 = item.Country2,
Country3 = item.Country3,
Name = item.Name1,
Phone = item.Phone1,
},
new ModifiedData
{
Country1 = item.Country1,
Country2 = item.Country2,
Country3 = item.Country3,
Name = item.Name2,
Phone = item.Phone2,
},
new ModifiedData
{
Country1 = item.Country1,
Country2 = item.Country2,
Country3 = item.Country3,
Name = item.Name3,
Phone = item.Phone3,
},
});
}
}
}
}

Or if you really trust your data, then one LINQ statement and result contains the same:

using System.Collections.Generic;
using System.IO;
using System.Linq;

namespace CsvMod
{
public class ModifiedData
{
public string Country1 { get; set; }
public string Country2 { get; set; }
public string Country3 { get; set; }
public string Name { get; set; }
public string Phone { get; set; }
}

class Program
{
static void Main(string[] args)
{
var csvLines = File.ReadAllLines("test.csv");

var result = csvLines.Aggregate(new List<ModifiedData>(), (acc, x) =>
{
var items = x.Split(';');

acc.AddRange(new List<ModifiedData>
{
new ModifiedData
{
Country1 = items[0],
Country2 = items[1],
Country3 = items[2],
Name = items[3],
Phone = items[6],
},
new ModifiedData
{
Country1 = items[0],
Country2 = items[1],
Country3 = items[2],
Name = items[4],
Phone = items[7],
},
new ModifiedData
{
Country1 = items[0],
Country2 = items[1],
Country3 = items[2],
Name = items[5],
Phone = items[8],
},
});

return acc;
});
}
}
}

Split delimited strings in multiple columns and separate them into rows

We may do this in an easier way if we make the delimiter same

library(dplyr)
library(tidyr)
library(stringr)
to_expand %>%
mutate(first = str_replace(first, "~", "|")) %>%
separate_rows(first, second, sep = "\\|")
# A tibble: 2 x 2
first second
<chr> <chr>
1 a 1~2~3
2 b 4~5~6

Splitting and creating 2 rows out of one in R data table

You may cbind the splits to get a column which you cbind again to the val (recycling).

res <- do.call(rbind, Map(data.frame, id=lapply(strsplit(dat$id, "&&"), cbind), 
val=dat$val))
res <- cbind(n=1:nrow(res), res)
res
# n id val
# 1 1 1 10
# 2 2 2 10
# 3 3 3 20
# 4 4 4 30
# 5 5 5 30


Related Topics



Leave a reply



Submit