Reading csv file and want to skip first two columns
You should use the csv
module in the standard library. You might need to pass additional kwargs
(keyword arguments) depending on the format of your csv file.
import csv
with open('my_csv_file', 'r') as fin:
reader = csv.reader(fin)
for line in reader:
print(line[2:])
# do something with rest of columns...
Read multiple csv files (and skip 2 columns in each csv file) into one dataframe in R?
Using the data.table
package functions fread()
and rbindlist()
will provide the result you're after faster than any of the other base
or tidyverse
alternatives.
library(data.table)
## Create a list of the files
FileList <- list.files(pattern = ".csv")
## Pre-allocate a list to store all of the results of reading
## so that we aren't re-copying the list for each iteration
DTList <- vector(mode = "list", length = length(FileList))
## Read in all the files, excluding the first two columns
for(i %in% seq_along(DTList)) {
DTList[[i]] <- data.table::fread(FileList[[i]], drop = c(1,2))
}
## Combine the results into a single data.table
DT <- data.table::rbindlist(DTList)
## Optionally, convert the data.table to a data.frame to match requested result
## Though I would recommend looking into using data.table instead!
data.table::setDF(DT)
Skip the first column when reading a csv file Python
Ok, so removing the data
(or whichever the keyword is) could be done with a regular expression (which is not really the scope of the question but meh...)
About the regular expression:
Let's imagine your keyword is data
, right? You can use this: (?:data)*\W*(?P<juicy_data>\w+)\W*(?:data)*
If your keyword was something else, you can just change the two data
strings in that regular expression to whatever other value the keyword
contains...
You can test regular expressions online in www.pythonregex.com or www.debuggex.com
The regular expression is basically saying: Look for zero or more data
strings but (if you find any) don't do anything with them. Don't add them to the list of matched groups, don't show them... nothing, just match them but discard it. After that, look for zero or more non-word characters (anything that is not a letter or a number... just in case there's a data
: or a space after , or a data-->
... that \W
removes all the non-alphanumerical characters that came after data
) Then you get to your juicy_data
That is one or more characters that can be found in "regular" words (any alphanumeric character). Then, just in case there's a data
behind it, do the same that it was done with the first data
group. Just match it and remove it.
Now, to remove the first column: You can use the fact that a csv.reader is itself an iterator. When you iterate over it (as the code below does), it gives you a list containing all the columns found in one row. The fact that it gives you a list
of all the rows is very useful for your case: You just have to collect the first item of said row
, since that's the column you care about (you don't need row[0]
, nor row[1:]
)
So here it goes:
import csv
import re
def get_values_flexibly(csv_path, keyword):
def process(func):
return set([func(cell)] + [func(row[index]) for row in reader])
# Start fo real!
kwd_remover = re.compile(
r'(?:{kw})*\W*(?P<juicy_data>\w+)\W*(?:{kw})*'.format(kw=keyword)
)
result = []
with open(csv_path, 'r') as f:
reader = csv.reader(f)
first_row = [kwd_remover.findall(cell)[0] for cell in reader.next()]
print "Cleaned first_row: %s" % first_row
for index, row in enumerate(reader):
print "Before cleaning: %s" % row
cleaned_row = [kwd_remover.findall(cell)[0] for cell in row]
result.append(cleaned_row[1])
print "After cleaning: %s" % cleaned_row
return result
print "Result: %s" % get_values_flexibly("sample.csv", 'data')
Outputs:
Cleaned first_row: ['h1', 'h2', 'h3']
Before cleaning: ['a data', 'data: abc', 'tr']
After cleaning: ['a', 'abc', 'tr']
Before cleaning: ['b data', 'vf data', ' gh']
After cleaning: ['b', 'vf', 'gh']
Before cleaning: ['k data', 'grt data', ' ph']
After cleaning: ['k', 'grt', 'ph']
Result: ['abc', 'vf', 'grt']
Ignore the first space in CSV
Ideally you should be parsing the first two parts as a datetime. By using a space as a delimiter, it would imply the header has three columns. The space after the date though is being seen as an extra column.
A workaround is to skip the header entirely and supply your own column names. The parse_dates
parameter can be used to tell Pandas to parse the first two columns as a single combined datetime object.
For example:
import pandas as pd
points = pd.read_csv("test.csv", delimiter=" ",
skipinitialspace=True, skiprows=1, index_col=None,
parse_dates=[[0, 1]], names=["Date", "Time", "Latitude", "Longitude"])
print(points)
Should give you the following dataframe:
Date_Time Latitude Longitude
0 2021-09-12 23:13:00 44.63 -63.56
1 2021-09-14 23:13:00 43.78 -62.00
2 2021-09-16 23:14:00 44.83 -54.60
How to read a CSV without the first column
You can specify a converter for any column.
converters = {0: lambda s: float(s.strip('"')}
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, converters=converters)
Or, you can specify which columns to use, something like:
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, usecols=range(1,15))
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
One way you can skip the first column, without knowing the number of columns, is to read the number of columns from the csv manually. It's easy enough, although you may need to tweak this on occasion to account for formatting inconsistencies*.
with open("Data/sim.csv") as f:
ncols = len(f.readline().split(','))
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))
*If there are blank lines at the top, you'll need to skip them. If there may be commas in the field headers, you should count columns using the first data line instead. So, if you have specific problems, I can add some details to make the code more robust.
Related Topics
Passing a List of Objects into an MVC Controller Method Using Jquery Ajax
How to Asynchronously Wait for X Seconds and Execute Something Then
Posting Jsonobject With Httpclient from Web API
Setting Connection String With Username and Password in Asp.Core MVC
Why Does Integer Division in C# Return an Integer and Not a Float
Deserialize Json String in to Multiple C# Objects
Using Newtonsoft to Deserialize a Date Stamp That Might Consist Only of a Year
Json.Net: Deserilization With Double Quotes
Get File Name from Byte Array or Stream
How to Access a Form Control for Another Form
Merge Multiple Lists into One List With Linq
How to Round Up the Time to the Nearest X Minutes
How to Get the Xml Soap Request of an Wcf Web Service Request
Null Value on Xml Deserialization Using [Xmlattribute]
Sorting a Collection Containing Strings And/Or Numbers