How to Get Column Names When Using Skip Along with Read.Csv

unable to get column names when using skip along with read.csv

If you skip lines in a file, you skip the complete line, so if your header is in the first line and you skip 100 lines, the header line will be skipped. If you want to skip part of the the file and still keep headers, you'll need to read them separately

headers <- names(read.csv("mycsvfile.csv",nrows=1))
mydf <- read.csv("mycsvfile.csv", header=F, col.names=headers, skip=100)

read.csv from list to get unique colnames

How do I change function(x) so that the the second column has colname similar to the file-name?

datalist = lapply(file_list, function(x){
    dat = read.csv(file=x, header=F, sep = "\t")
    names(dat)[2] = x
    return(dat)
})

This will put the name of the file as the name of the second column. If you want to edit the name, use gsub or substr (or similar) on x to modify the string.

Use csv.reader to read a file into a list but skip a specific column (Python)

In general, I would use pandas instead. Say you have the CSV file called test.csv:

a,b,c,d
1,2,3,4
5,6,7,8

We can read it using pandas:

import itertools
import pandas as pd

df = pd.read_csv('test.csv', skiprows=[0], usecols=[0,1,3], header=None)
print(df)
   0  1  3
0  1  2  4
1  5  6  8

Then, you can generate the lists from rows as:

lists = df.values.tolist()

And finally into a single list:

merged = list(itertools.chain(*lists))
print(merged)
[1, 2, 4, 5, 6, 8]

Skip specific rows using read.csv in R

One way to do this is using two read.csv commands, the first one reads the headers and the second one the data:

headers = read.csv(file, skip = 1, header = F, nrows = 1, as.is = T)
df = read.csv(file, skip = 3, header = F)
colnames(df)= headers

I've created the following text file to test this:

do not read
a,b,c
previous line are headers
1,2,3
4,5,6

The result is:

> df
  a b c
1 1 2 3
2 4 5 6

R using fread colClasses or skip arguments to read csv with no column headers

I think the argument you're looking for is drop. Try:

require(data.table)  # 1.9.2+
pp <- fread("AUDUSD-2013-05.csv", drop = 1)

Note that you can drop by name or position.

fread("AUDUSD-2013-05.csv", drop = c("columThree","anotherColumnName"))

fread("AUDUSD-2013-05.csv", drop = 10:15)  # read all columns other than 10:15

And you can select by name or position, too.

fread("AUDUSD-2013-05.csv", select = 10:15)  # read only columns 10:15

fread("AUDUSD-2013-05.csv", select = c("columnA","columnName2"))

These arguments were added to v1.9.2 (released to CRAN in Feb 2014) and are documented in ?fread. You'll need to upgrade to use them.

Reading in multiple CSVs with different numbers of lines to skip at start of file

The function fread from the package data.table does automatic detection of number of rows to be skipped. The function is in development stage currently.

Here is an example code:

require(data.table)

cat("blah\nblah\nblah\nVARIABLE,X1,X2\nA,1,2\n", file="myfile1.csv")
cat("blah\nVARIABLE,A1,A2\nA,1,2\n", file="myfile2.csv")
cat("blah\nblah\nVARIABLE,Z1,Z2\nA,1,2\n", file="myfile3.csv")

lapply(list.files(pattern = "myfile.*.csv"), fread)

Trailing space causing column names to not match in read_csv with usecols

I think you could do a combination of your 2nd and 3rd options, manually set the column names by reading the first row and working out what the headers should be called on the fly.

Read the first line to get a list of column names currently

headers_df = pd.read_csv("mydata.csv", nrows=1, header = None)

Convert the headers to a list

headers = headers_df.values.tolist()[0]

Fix the column names to remove the spaces

fixed_headers = [x.strip(' ') for x in headers]

manually replace the file headers with the fixed ones, selecting the two that you need

d = pd.read_csv('test.csv', header=0, names=fixed_headers, usecols=['apple','banana'])