Importing "Csv" File with Multiple-Character Separator to R

Multiple Separators for the same file input R

Try this:

# dummy data
df <- read.table(text="
Name Name1 *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID 2013-09-32 14:39:55.0 2013-10-16 13:58:00.0 0 218 4 93 1377907200000
Name Name2 *CCC_Name3_KB_MobApp_M-18-25_AU_PI ANDROID 2013-09-32 14:39:55.0 2013-10-16 13:58:00.0 0 218 4 93 1377907200000
", as.is = TRUE)

# replace "_" to "-"
df_V3 <- gsub(pattern="_", replacement="-", df$V3, fixed = TRUE)

# strsplit, make dataframe
df_V3 <- do.call(rbind.data.frame, strsplit(df_V3, split = "-"))

# output, merge columns
output <- cbind(df[, c(1:2)],
df_V3,
df[, c(4:ncol(df))])

Building on the comments below, here is another related option, but one which uses read.table instead of strsplit.

splitCol <- "V3"
temp <- read.table(text = gsub("-", "_", df[, splitCol]), sep = "_")
names(temp) <- paste(splitCol, seq_along(temp), sep = "_")
cbind(df[setdiff(names(df), splitCol)], temp)

Is there a work around for using multi-character delimiters for `csv.reader`?

Here's the first part of my answer to the question CSV writing strings of text that need a unique delimiter adapted to work in Python 3.7:

import csv

DELIMITER = chr(255)
data = ["itemA", "itemB", "itemC",
"Sentence that might contain commas, colons: or even \"quotes\"."]

with open('data.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile, delimiter=DELIMITER)
writer.writerow(data)

with open('data.csv', 'r', newline='') as infile:
reader = csv.reader(infile, delimiter=DELIMITER)
for row in reader:
print(row)

Use Multiple Character Delimiter in Python Pandas read_csv

The solution would be to use read_table instead of read_csv:

1*|*2*|*3*|*4*|*5
12*|*12*|*13*|*14*|*15
21*|*22*|*23*|*24*|*25

So, we could read this with:

pd.read_table('file.csv', header=None, sep='\*\|\*')

How can I parse a .txt with a delimiter that has multiple characters into a pandas df?

that's because if the separator is longer than 1 char it's interpreted as a regular expression, as stated in http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html; so the + indicat "any number of matches of the before char", which there isn't, so there's "nothing to repeat".

i think escaping the symbols might work.

How to read data with different separators?

I'd probably do this.

read.table(text = gsub(",", "\t", readLines("file.txt")))
V1 V2 V3 V4 V5
1 a 1 2 3 5
2 b 4 5 6 7
3 c 5 6 7 8

Unpacking that just a bit:

  • readLines() reads the file into R as a character vector with one element for each line.
  • gsub(",", "\t", ...) replaces every comma with a tab, so that now we've got lines with just one kind of separating character.
  • The text = argument to read.table() lets it know you are passing it a character vector to be read directly (rather than the name of a file containing your text data).


Related Topics



Leave a reply



Submit