How to Read Data with Different Separators

How to read data with different separators?

I'd probably do this.

read.table(text = gsub(",", "\t", readLines("file.txt")))
V1 V2 V3 V4 V5
1 a 1 2 3 5
2 b 4 5 6 7
3 c 5 6 7 8

Unpacking that just a bit:

  • readLines() reads the file into R as a character vector with one element for each line.
  • gsub(",", "\t", ...) replaces every comma with a tab, so that now we've got lines with just one kind of separating character.
  • The text = argument to read.table() lets it know you are passing it a character vector to be read directly (rather than the name of a file containing your text data).

How to read a CSV file into R which uses two types of separators in the file?

A double-tap:

x1 <- read.csv("quux.csv", check.names = FALSE)
x2 <- read.csv2(text = x1[[1]], header = FALSE)
names(x2) <- unlist(read.csv2(text = names(x1)[1], header = FALSE))
cbind(x2, x1[,-1,drop=FALSE])
# car_brand car_model total
# 1 Toyota 9289 29781
# 2 Seat 20981 1610
# 3 Volkswagen 11140 904
# 4 Suzuki 11640 658
# 5 Renault 13075 647
# 6 Ford 15855 553

The use of check.names=FALSE is required because otherwise names(x1)[1] looks like "car_brand..car_model". While it can be parsed like this, I thought it better to parse the original text.

How do i read a .txt file into R with different separators, and run on lines?

You can solve this in a few different ways. One approach would be to import the data into a single column and then use tidyr::separate or data.table::strsplit to split the column at the appropriate places. Here's an example with tidyr:

# Use a separator symbol that is unlikely to appear in the file,
# to read the data into a single column:
data <- read.table("filename.txt", sep = "^")

# First split the column at the @-sign, and then at the ": "-part:
library(tidyr)
data %>% separate(V1,
into = c("Date", "User"),
sep = " @") %>%
separate(User,
into = c("User", "Review"),
sep = ": ") -> data

# If you want to add back the @-sign to the usernames:
data$User <- paste("@", data$User, sep = "")

Python - Reading a data text file with different delimiters

Solution using pandas:

data = pd.read_csv('data.txt',
sep=";|:|,",
header=None,
engine='python')

This will write every value in a new column. Hope this could be helpful.

Read txt file with multiple separators

Replace each [ with a newline and each ] and comma with a space and then read it in:

txt <- '["201801",111],["201802",222],["201803",333]'
read.table(text = chartr("[],", "\n ", txt))

giving:

      V1  V2
1 201801 111
2 201802 222
3 201803 333

Multiple Separators for the same file input R

Try this:

# dummy data
df <- read.table(text="
Name Name1 *XYZ_Name3_KB_MobApp_M-18-25_AU_PI ANDROID 2013-09-32 14:39:55.0 2013-10-16 13:58:00.0 0 218 4 93 1377907200000
Name Name2 *CCC_Name3_KB_MobApp_M-18-25_AU_PI ANDROID 2013-09-32 14:39:55.0 2013-10-16 13:58:00.0 0 218 4 93 1377907200000
", as.is = TRUE)

# replace "_" to "-"
df_V3 <- gsub(pattern="_", replacement="-", df$V3, fixed = TRUE)

# strsplit, make dataframe
df_V3 <- do.call(rbind.data.frame, strsplit(df_V3, split = "-"))

# output, merge columns
output <- cbind(df[, c(1:2)],
df_V3,
df[, c(4:ncol(df))])

Building on the comments below, here is another related option, but one which uses read.table instead of strsplit.

splitCol <- "V3"
temp <- read.table(text = gsub("-", "_", df[, splitCol]), sep = "_")
names(temp) <- paste(splitCol, seq_along(temp), sep = "_")
cbind(df[setdiff(names(df), splitCol)], temp)

Read csv in pandas with different separator (commas)

Use regex separator [,]+ - one or more ,:

import pandas as pd
from pandas.compat import StringIO

temp=u"""iBG,6141.6,6141.6,,3.0,,,ic"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep="[,]+", header=None, engine='python')
print (df)
0 1 2 3 4
0 iBG 6141.6 6141.6 3.0 ic


Related Topics



Leave a reply



Submit