unable to get column names when using skip along with read.csv
If you skip lines in a file, you skip the complete line, so if your header is in the first line and you skip 100 lines, the header line will be skipped. If you want to skip part of the the file and still keep headers, you'll need to read them separately
headers <- names(read.csv("mycsvfile.csv",nrows=1))
mydf <- read.csv("mycsvfile.csv", header=F, col.names=headers, skip=100)
read.csv from list to get unique colnames
How do I change function(x) so that the the second column has colname similar to the file-name?
datalist = lapply(file_list, function(x){
dat = read.csv(file=x, header=F, sep = "\t")
names(dat)[2] = x
return(dat)
})
This will put the name of the file as the name of the second column. If you want to edit the name, use gsub
or substr
(or similar) on x
to modify the string.
Use csv.reader to read a file into a list but skip a specific column (Python)
In general, I would use pandas instead. Say you have the CSV file called test.csv
:
a,b,c,d
1,2,3,4
5,6,7,8
We can read it using pandas:
import itertools
import pandas as pd
df = pd.read_csv('test.csv', skiprows=[0], usecols=[0,1,3], header=None)
print(df)
0 1 3
0 1 2 4
1 5 6 8
Then, you can generate the lists from rows as:
lists = df.values.tolist()
And finally into a single list:
merged = list(itertools.chain(*lists))
print(merged)
[1, 2, 4, 5, 6, 8]
Skip specific rows using read.csv in R
One way to do this is using two read.csv
commands, the first one reads the headers and the second one the data:
headers = read.csv(file, skip = 1, header = F, nrows = 1, as.is = T)
df = read.csv(file, skip = 3, header = F)
colnames(df)= headers
I've created the following text file to test this:
do not read
a,b,c
previous line are headers
1,2,3
4,5,6
The result is:
> df
a b c
1 1 2 3
2 4 5 6
R using fread colClasses or skip arguments to read csv with no column headers
I think the argument you're looking for is drop
. Try:
require(data.table) # 1.9.2+
pp <- fread("AUDUSD-2013-05.csv", drop = 1)
Note that you can drop
by name or position.
fread("AUDUSD-2013-05.csv", drop = c("columThree","anotherColumnName"))
fread("AUDUSD-2013-05.csv", drop = 10:15) # read all columns other than 10:15
And you can select
by name or position, too.
fread("AUDUSD-2013-05.csv", select = 10:15) # read only columns 10:15
fread("AUDUSD-2013-05.csv", select = c("columnA","columnName2"))
These arguments were added to v1.9.2 (released to CRAN in Feb 2014) and are documented in ?fread
. You'll need to upgrade to use them.
Reading in multiple CSVs with different numbers of lines to skip at start of file
The function fread
from the package data.table does automatic detection of number of rows to be skipped. The function is in development stage currently.
Here is an example code:
require(data.table)
cat("blah\nblah\nblah\nVARIABLE,X1,X2\nA,1,2\n", file="myfile1.csv")
cat("blah\nVARIABLE,A1,A2\nA,1,2\n", file="myfile2.csv")
cat("blah\nblah\nVARIABLE,Z1,Z2\nA,1,2\n", file="myfile3.csv")
lapply(list.files(pattern = "myfile.*.csv"), fread)
Trailing space causing column names to not match in read_csv with usecols
I think you could do a combination of your 2nd and 3rd options, manually set the column names by reading the first row and working out what the headers should be called on the fly.
Read the first line to get a list of column names currently
headers_df = pd.read_csv("mydata.csv", nrows=1, header = None)
Convert the headers to a list
headers = headers_df.values.tolist()[0]
Fix the column names to remove the spaces
fixed_headers = [x.strip(' ') for x in headers]
manually replace the file headers with the fixed ones, selecting the two that you need
d = pd.read_csv('test.csv', header=0, names=fixed_headers, usecols=['apple','banana'])
Related Topics
Getting Stargazer Column Labels to Print on Two or Three Lines
R Script in Power Bi Returns Date as Microsoft.Oledb.Date
How to Remove Trailing Zeros in R Dataframe
R - Carry Last Observation Forward N Times
Extract Only Folder Name Right Before Filename from Full Path
Loop Linear Regression and Saving Coefficients
Add Points to Usmap with Ggplot in R
Group Data Frame by Pattern in R
Using: = in Data.Table with Paste()
Ggplotly Not Displaying Geom_Line Correctly
How to Keep The Only Intersection of The Spatial Features & Remove Everything Outside of a Boundary
Barplot with Multiple Columns in R
Data.Table Objects Aren't Updated in Rstudio Environment Panel
How to Read All Files in One Directory into R at Once
How to Install Doredis Package Version 1.0.5 into R 3.0.1 on Windows