Reading column names alone in a csv file
You can read the header by using the next()
function which return the next row of the reader’s iterable object as a list. then you can add the content of the file to a list.
import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
rest = list(reader)
Now i has the column's names as a list.
print i
>>>['id', 'name', 'age', 'sex']
Also note that reader.next()
does not work in python 3. Instead use the the inbuilt next()
to get the first line of the csv immediately after reading like so:
import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = next(reader)
print(i)
>>>['id', 'name', 'age', 'sex']
Read CSV items with column name
You are looking for DictReader
with open('info.csv') as f:
reader = csv.DictReader(f, delimiter=';')
for row in reader:
name = row['name']
blah = row['blah']
to quote from the link:
Create an object which operates like a regular reader but maps the
information read into a dict whose keys are given by the optional
fieldnames parameter.
...
If the fieldnames parameter is omitted, the values in the first row of
the csvfile will be used as the fieldnames.
Read column names from a csv file and subset a dataframe
I have managed to fix the problem as follow:
- Create a csv file, named columns_names.csv, that contains the desired column names (see content of the csv file below)
RHO_1 RHO_2 RHO_3 - Use the following code:
df1 <- read.table(file = "/Users/kotsios/Desktop/RCODE_CLUSTERING/auxilliary_codes/column_names.csv")
names(df1) <- as.matrix(df1[1, ])
df1 <- df1[-1, ]
#create a dataframe:
RHO_1 <- c("Tom", "Dick", "Harry", "RHO_1" ,"John","RHO_2", "Paul", "George","RHO_3", "Ringo")
RHO_2 <- c(1, 2, 3,4,5,6,7,8,9,10);RHO_3 <- c(1, 2, 3,4,5,6,7,8,9,10);RHO_4 <- c(11, 21, 31,41,51,61,71,81,91,101)
df2 <- data.frame(RHO_1, RHO_2,RHO_3,RHO_4)
#keep the desired column names
df5 <- df2[, (colnames(df2) %in% colnames(df1)) ]
How to call values by column name after reading in CSV
You've defined the variables col1
, col2
, and col3
within a specific scope of your script—within the for
loop—and outside of that section of code, you cannot access these variables. Here are two suggestions I have:
- The quickest way to perform a manipulation of your column data might be to insert a statement before you do
data.append()
. In other words, if you wanted to add 5 to column 2, you could do something like this:
data = []
with open('sample.txt', 'r') as file:
for line in file.readlines():
col1, col2, col3 = line.split('\t')
col2 += 5 # Modify column before appending
data.append([col1, col2, col3])
- If you need all of the data to be collected first, and then you'd like to modify it afterward in a different step, you can start another
for
loop. Keep in mind that you have now doubled the amount of time that your script will run (you loop over the data twice instead of once). You can use a Python grammar feature called "list unpacking" to get your column variables back, like so:
data = []
with open('sample.txt', 'r') as file:
for line in file.readlines():
col1, col2, col3 = line.split('\t')
data.append([col1, col2, col3])
modified_data = []
for row in data:
col1, col2, col3 = row # This is list unpacking
. . . # (do something with columns here)
modified_data.append([col1, col2, col3])
Reading Column Names and Column Values for Extremely Large File R
Here is a way.
1. Column names
The column names are read with reaLines
, setting n = 1
, in order to read just the columns headers line. Then scan
with sep = ","
will break the line into column names.
library(sqldf)
col_names <- readLines(tmpfile, n = 1)
col_names
#[1] "mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb"
tc <- textConnection(col_names)
col_names <- scan(tc, sep = ",", what = character())
close(tc)
2. Data
The 4th column is "hp
. Read only that one with read.csv.sql
. The SQL statement is put together with sprintf
.
col_names[4]
#[1] "hp"
SQL <- sprintf("select %s from file", col_names[4])
SQL
#[1] "select hp from file"
hp <- read.csv.sql(tmpfile, sql = SQL)
str(hp)
#'data.frame': 6718464 obs. of 1 variable:
# $ hp: int 110 110 93 110 175 105 245 62 95 123 ...
Read csv file including the column names as values
Just set the header
property to None
, because the default value is infer
(column names are inferred from the first line of the file).
x = pd.read_csv(csv_filename, header=None)
Pandas read csv using column names included in a list
Usa a callable for usecols
, i.e. df = pd.read_csv(filename, header=0, usecols=lambda c: c in columns_to_use)
. From the docs of the usecols parameter:
If callable, the callable function will be evaluated against the
column names, returning names where the callable function evaluates to
True.
Working example that will only read col1
and not throw an error on missing col3
:
import pandas as pd
import io
s = """col1,col2
1,2"""
df = pd.read_csv(io.StringIO(s), usecols=lambda c: c in ['col1', 'col3'])
Related Topics
How to Edit Column Names in Datatable Function When Running R Shiny App
Handling Missing Combinations of Factors in R
R: Reading a Binary File That Is Zipped
Calculate Difference Between Dates by Group in R
R Shiny - Ui.R Seems to Not Recognize a Dataframe Read by Server.R
Convert Month's Number to Month Name
Vector of Cumulative Sums in R
As.Posixct with Datetimes Including Midnight
Removing Unicode Symbols from Column Names
Reshape Data from Wide to Long
Adding a Legend to an Rgl 3D Plot
Interleave Columns of Two Data Frames
How to Substitute Symbols in a Language Object
Ggplot2: Have Common Facet Bar in Outer Facet Panel in 3-Way Plot
Character "|" in Strsplit Function (Vertical Bar/Pipe)
Subsetting a Data Frame to the Rows Not Appearing in Another Data Frame