How to Read Column Names 'As Is' from CSV File

Reading column names alone in a csv file

You can read the header by using the next() function which return the next row of the reader’s iterable object as a list. then you can add the content of the file to a list.

import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
rest = list(reader)

Now i has the column's names as a list.

print i
>>>['id', 'name', 'age', 'sex']

Also note that reader.next() does not work in python 3. Instead use the the inbuilt next() to get the first line of the csv immediately after reading like so:

import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = next(reader)

print(i)
>>>['id', 'name', 'age', 'sex']

Read CSV items with column name

You are looking for DictReader

with open('info.csv') as f:
reader = csv.DictReader(f, delimiter=';')
for row in reader:
name = row['name']
blah = row['blah']

to quote from the link:

Create an object which operates like a regular reader but maps the
information read into a dict whose keys are given by the optional
fieldnames parameter.
...
If the fieldnames parameter is omitted, the values in the first row of
the csvfile will be used as the fieldnames.

Read column names from a csv file and subset a dataframe

I have managed to fix the problem as follow:

  1. Create a csv file, named columns_names.csv, that contains the desired column names (see content of the csv file below)
    RHO_1 RHO_2 RHO_3
  2. Use the following code:
df1 <-  read.table(file = "/Users/kotsios/Desktop/RCODE_CLUSTERING/auxilliary_codes/column_names.csv")
names(df1) <- as.matrix(df1[1, ])
df1 <- df1[-1, ]

#create a dataframe:
RHO_1 <- c("Tom", "Dick", "Harry", "RHO_1" ,"John","RHO_2", "Paul", "George","RHO_3", "Ringo")
RHO_2 <- c(1, 2, 3,4,5,6,7,8,9,10);RHO_3 <- c(1, 2, 3,4,5,6,7,8,9,10);RHO_4 <- c(11, 21, 31,41,51,61,71,81,91,101)
df2 <- data.frame(RHO_1, RHO_2,RHO_3,RHO_4)

#keep the desired column names
df5 <- df2[, (colnames(df2) %in% colnames(df1)) ]

How to call values by column name after reading in CSV

You've defined the variables col1, col2, and col3 within a specific scope of your script—within the for loop—and outside of that section of code, you cannot access these variables. Here are two suggestions I have:

  1. The quickest way to perform a manipulation of your column data might be to insert a statement before you do data.append(). In other words, if you wanted to add 5 to column 2, you could do something like this:
data = []

with open('sample.txt', 'r') as file:
for line in file.readlines():
col1, col2, col3 = line.split('\t')

col2 += 5 # Modify column before appending

data.append([col1, col2, col3])

  1. If you need all of the data to be collected first, and then you'd like to modify it afterward in a different step, you can start another for loop. Keep in mind that you have now doubled the amount of time that your script will run (you loop over the data twice instead of once). You can use a Python grammar feature called "list unpacking" to get your column variables back, like so:
data = []

with open('sample.txt', 'r') as file:
for line in file.readlines():
col1, col2, col3 = line.split('\t')
data.append([col1, col2, col3])

modified_data = []

for row in data:
col1, col2, col3 = row # This is list unpacking

. . . # (do something with columns here)

modified_data.append([col1, col2, col3])

Reading Column Names and Column Values for Extremely Large File R

Here is a way.

1. Column names

The column names are read with reaLines, setting n = 1, in order to read just the columns headers line. Then scan with sep = "," will break the line into column names.

library(sqldf)

col_names <- readLines(tmpfile, n = 1)
col_names
#[1] "mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb"

tc <- textConnection(col_names)
col_names <- scan(tc, sep = ",", what = character())
close(tc)

2. Data

The 4th column is "hp. Read only that one with read.csv.sql. The SQL statement is put together with sprintf.

col_names[4]
#[1] "hp"

SQL <- sprintf("select %s from file", col_names[4])
SQL
#[1] "select hp from file"

hp <- read.csv.sql(tmpfile, sql = SQL)
str(hp)
#'data.frame': 6718464 obs. of 1 variable:
# $ hp: int 110 110 93 110 175 105 245 62 95 123 ...

Read csv file including the column names as values

Just set the header property to None, because the default value is infer(column names are inferred from the first line of the file).

x = pd.read_csv(csv_filename, header=None)

Pandas read csv using column names included in a list

Usa a callable for usecols, i.e. df = pd.read_csv(filename, header=0, usecols=lambda c: c in columns_to_use). From the docs of the usecols parameter:

If callable, the callable function will be evaluated against the
column names, returning names where the callable function evaluates to
True.

Working example that will only read col1 and not throw an error on missing col3:

import pandas as pd
import io

s = """col1,col2
1,2"""

df = pd.read_csv(io.StringIO(s), usecols=lambda c: c in ['col1', 'col3'])


Related Topics



Leave a reply



Submit