Multiple Separators for the same file input R
Try this:
# dummy data
df <- read.table(text="
Name Name1 *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID 2013-09-32 14:39:55.0 2013-10-16 13:58:00.0 0 218 4 93 1377907200000
Name Name2 *CCC_Name3_KB_MobApp_M-18-25_AU_PI ANDROID 2013-09-32 14:39:55.0 2013-10-16 13:58:00.0 0 218 4 93 1377907200000
", as.is = TRUE)
# replace "_" to "-"
df_V3 <- gsub(pattern="_", replacement="-", df$V3, fixed = TRUE)
# strsplit, make dataframe
df_V3 <- do.call(rbind.data.frame, strsplit(df_V3, split = "-"))
# output, merge columns
output <- cbind(df[, c(1:2)],
df_V3,
df[, c(4:ncol(df))])
Building on the comments below, here is another related option, but one which uses read.table
instead of strsplit
.
splitCol <- "V3"
temp <- read.table(text = gsub("-", "_", df[, splitCol]), sep = "_")
names(temp) <- paste(splitCol, seq_along(temp), sep = "_")
cbind(df[setdiff(names(df), splitCol)], temp)
Is there a work around for using multi-character delimiters for `csv.reader`?
Here's the first part of my answer to the question CSV writing strings of text that need a unique delimiter adapted to work in Python 3.7:
import csv
DELIMITER = chr(255)
data = ["itemA", "itemB", "itemC",
"Sentence that might contain commas, colons: or even \"quotes\"."]
with open('data.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile, delimiter=DELIMITER)
writer.writerow(data)
with open('data.csv', 'r', newline='') as infile:
reader = csv.reader(infile, delimiter=DELIMITER)
for row in reader:
print(row)
Use Multiple Character Delimiter in Python Pandas read_csv
The solution would be to use read_table instead of read_csv:
1*|*2*|*3*|*4*|*5
12*|*12*|*13*|*14*|*15
21*|*22*|*23*|*24*|*25
So, we could read this with:
pd.read_table('file.csv', header=None, sep='\*\|\*')
How can I parse a .txt with a delimiter that has multiple characters into a pandas df?
that's because if the separator is longer than 1 char it's interpreted as a regular expression, as stated in http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html; so the + indicat "any number of matches of the before char", which there isn't, so there's "nothing to repeat".
i think escaping the symbols might work.
How to read data with different separators?
I'd probably do this.
read.table(text = gsub(",", "\t", readLines("file.txt")))
V1 V2 V3 V4 V5
1 a 1 2 3 5
2 b 4 5 6 7
3 c 5 6 7 8
Unpacking that just a bit:
readLines()
reads the file into R as a character vector with one element for each line.gsub(",", "\t", ...)
replaces every comma with a tab, so that now we've got lines with just one kind of separating character.- The
text =
argument toread.table()
lets it know you are passing it a character vector to be read directly (rather than the name of a file containing your text data).
Related Topics
Should I Avoid Programming Packages with Pipe Operators
How to Find Common Rows Between Two Dataframe in R
In R, How to Check If Two Variable Names Reference the Same Underlying Object
Change Plotly Chart Y Variable Based on Selectinput
Overlay Geom_Points() on Geom_Boxplot(Fill=Group)
Index Unique Values in Data.Table
Add Axis Tick-Marks on Top and to the Right to a Ggplot
Linear Model Function Lm() Error: Na/Nan/Inf in Foreign Function Call (Arg 1)
How to Train a Ml Model in Sparklyr and Predict New Values on Another Dataframe
Find Consecutive Values in Vector in R
In Read.Table(): Incomplete Final Line Found by Readtableheader
Extract First Word from a Column and Insert into New Column
Minus Operation of Data Frames
How to Merge Two Data Frames on Common Columns in R with Sum of Others