How do you read multiple .txt files into R?
Thanks for all the answers!
In the meanwhile, I also hacked a method on my own. Let me know if it is any useful:
library(foreign)
setwd("/path/to/directory")
files <-list.files()
data <- 0
for (f in files) {
tempData = scan( f, what="character")
data <- c(data,tempData)
}
Read multiple .txt files in R with scan()
As @MrFlick mentioned you can use list.files
to get all the text files in working directory and then you can use lapply
to read them in a list.
filenames <- list.files(pattern = '\\.txt$')
result <- lapply(filenames, scan, what = "character", sep=NULL)
read multiple text files into r for text mining purposes
I often have this same problem. The textreadr package that I maintain is designed to make reading .csv, .pdf, .doc, and .docx documents and directories of these documents easy. It would reduce what you're doing to:
textreadr::read_dir("../data/InauguralSpeeches/")
Your example is not reproducible so I do it below (please make your example reproducible in the future).
library(textreadr)
## Minimal working example
dir.create('delete_me')
file.copy(dir(system.file("docs/Maas2011/pos", package = "textreadr"), full.names=TRUE), 'delete_me', recursive=TRUE)
write.csv(mtcars, 'delete_me/mtcars.csv')
write.csv(CO2, 'delete_me/CO2.csv')
cat('test\n\ntesting\n\ntester', file='delete_me/00_00.txt')
## the read in of a directory
read_dir('delete_me')
output
The output below shows the tibble output with each document registered in the document
column. For every line in the document there is one row for that document. Depending on what's in the csv files this may not be fine grained enough.
## document content
## 1 0_9 Bromwell High is a cartoon comedy. It ra
## 2 00_00 test
## 3 00_00
## 4 00_00 testing
## 5 00_00
## 6 00_00 tester
## 7 1_7 If you like adult comedy cartoons, like
## 8 10_9 I'm a male, not given to women's movies,
## 9 11_9 Liked Stanley & Iris very much. Acting w
## 10 12_9 Liked Stanley & Iris very much. Acting w
## .. ... ...
## 141 mtcars "Ferrari Dino",19.7,6,145,175,3.62,2.77,
## 142 mtcars "Maserati Bora",15,8,301,335,3.54,3.57,1
## 143 mtcars "Volvo 142E",21.4,4,121,109,4.11,2.78,18
Read multiple .txt files and add new column identifying file name in R
This should work, if your read.table
command is correct:
myData_list <- lapply(files, function(x) {
out <- tryCatch(read.table(x, header = F, sep = ','), error = function(e) NULL)
if (!is.null(out)) {
out$source_file <- x
}
return(out)
})
myData <- data.table::rbindlist(myData_list)
In the past I found that you can spare yourself a lot of headache using data.table::fread
instead of read.table
. So you could consider this:
myData_list <- lapply(files, function(x) {
out <- data.table::fread(x, header = FALSE)
out$source_file <- x
return(out)
})
myData <- data.table::rbindlist(myData_list)
You can add the tryCatch
part back if necessary. Depending on how the files
vector looks, basename()
might be interesting to use on the column source_file
.
Reading multiple txt files into data frames and merging them into one
The following should work well. However, without sample data or a more clear description of what you want it's hard to know for certain if this if what you are looking to accomplish.
#set working directory
setwd("C:/Users/path/to/my/files")
#read in all .txt files but skip the first 8 rows
Data.in <- lapply(list.files(pattern = "\\.txt$"),read.csv,header=T,skip=8)
#combines all of the tables by column into one
Data.in <- do.call(rbind,Data.in)
Read, subset and bind many .txt files in R
assign
should usually be avoided and in this case we don't need to create these objects in global environment. Try using lapply
.
#List all text files in the working directory
filenames <- list.files(pattern = '\\.txt$')
#Read every text file with header, skipping the 1st row.
#Keep only the 5th column after reading the data.
result <- lapply(filenames, function(x) read.table(x,skip = 1,header = TRUE)[,5])
result
Related Topics
Split Data.Frame Based on Levels of a Factor into New Data.Frames
Add Regression Line Equation and R^2 on Graph
Understanding Exactly When a Data.Table Is a Reference to (Vs a Copy Of) Another Data.Table
Select Rows from a Data Frame Based on Values in a Vector
Expand Ranges Defined by "From" and "To" Columns
How to Read Multiple (Excel) Files into R
How to Prevent Ifelse() from Turning Date Objects into Numeric Objects
Is R'S Apply Family More Than Syntactic Sugar
Why Does Summarize or Mutate Not Work With Group_By When I Load 'Plyr' After 'Dplyr'
Add Count of Unique/Distinct Values by Group to the Original Data
Replace Missing Values (Na) With Most Recent Non-Na by Group
How to Specifically Order Ggplot2 X Axis Instead of Alphabetical Order
Unique Combination of All Elements from Two (Or More) Vectors
Categorize Numeric Variable into Group/ Bins/ Breaks
Ggplot Does Not Work If It Is Inside a For Loop Although It Works Outside of It
How to Force R to Use a Specified Factor Level as Reference in a Regression
What Does "The Following Object Is Masked from 'Package:Xxx'" Mean