How to Parse a CSV File to Grab the Column Names First Then the Rows That Relate to It

how do i parse a csv file to grab the column names first then the rows that relate to it?

For reading it all at once you can use:

$csv = array_map("str_getcsv", file("file1.csv",FILE_SKIP_EMPTY_LINES));
$keys = array_shift($csv);

To turn all the rows into a nice associative array you could then apply:

foreach ($csv as $i=>$row) {
$csv[$i] = array_combine($keys, $row);
}

How to call values by column name after reading in CSV

You've defined the variables col1, col2, and col3 within a specific scope of your script—within the for loop—and outside of that section of code, you cannot access these variables. Here are two suggestions I have:

  1. The quickest way to perform a manipulation of your column data might be to insert a statement before you do data.append(). In other words, if you wanted to add 5 to column 2, you could do something like this:
data = []

with open('sample.txt', 'r') as file:
for line in file.readlines():
col1, col2, col3 = line.split('\t')

col2 += 5 # Modify column before appending

data.append([col1, col2, col3])

  1. If you need all of the data to be collected first, and then you'd like to modify it afterward in a different step, you can start another for loop. Keep in mind that you have now doubled the amount of time that your script will run (you loop over the data twice instead of once). You can use a Python grammar feature called "list unpacking" to get your column variables back, like so:
data = []

with open('sample.txt', 'r') as file:
for line in file.readlines():
col1, col2, col3 = line.split('\t')
data.append([col1, col2, col3])

modified_data = []

for row in data:
col1, col2, col3 = row # This is list unpacking

. . . # (do something with columns here)

modified_data.append([col1, col2, col3])

Reading column names alone in a csv file

You can read the header by using the next() function which return the next row of the reader’s iterable object as a list. then you can add the content of the file to a list.

import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
rest = list(reader)

Now i has the column's names as a list.

print i
>>>['id', 'name', 'age', 'sex']

Also note that reader.next() does not work in python 3. Instead use the the inbuilt next() to get the first line of the csv immediately after reading like so:

import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = next(reader)

print(i)
>>>['id', 'name', 'age', 'sex']

Read CSV items with column name

You are looking for DictReader

with open('info.csv') as f:
reader = csv.DictReader(f, delimiter=';')
for row in reader:
name = row['name']
blah = row['blah']

to quote from the link:

Create an object which operates like a regular reader but maps the
information read into a dict whose keys are given by the optional
fieldnames parameter.
...
If the fieldnames parameter is omitted, the values in the first row of
the csvfile will be used as the fieldnames.

Reading Column Names and Column Values for Extremely Large File R

Here is a way.

1. Column names

The column names are read with reaLines, setting n = 1, in order to read just the columns headers line. Then scan with sep = "," will break the line into column names.

library(sqldf)

col_names <- readLines(tmpfile, n = 1)
col_names
#[1] "mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb"

tc <- textConnection(col_names)
col_names <- scan(tc, sep = ",", what = character())
close(tc)

2. Data

The 4th column is "hp. Read only that one with read.csv.sql. The SQL statement is put together with sprintf.

col_names[4]
#[1] "hp"

SQL <- sprintf("select %s from file", col_names[4])
SQL
#[1] "select hp from file"

hp <- read.csv.sql(tmpfile, sql = SQL)
str(hp)
#'data.frame': 6718464 obs. of 1 variable:
# $ hp: int 110 110 93 110 175 105 245 62 95 123 ...


Related Topics



Leave a reply



Submit