Reading Columns of a Txt File on Python

Reading columns of a txt file on python

x.split(' ') is not useful, because columns of your text file separated by more than one space. Use x.split() to ignore spaces:

token = open('token_data.txt','r')
linestoken=token.readlines()
tokens_column_number = 1
resulttoken=[]
for x in linestoken:
resulttoken.append(x.split()[tokens_column_number])
token.close()
print(resulttoken)

How to read 2 columns of .txt file & store it in x & y variables in python

you can use csv reader

import csv
x = []
y = []
with open('file.txt') as f:
reader = csv.reader(f)
for row in reader:
x.append(float(row[0]))
y.append(float(row[1]))

Output:

>>x
[6.1101, 5.5277, 8.5186, 7.0032, 5.8598, 8.3829, 7.4764, 8.5781, 6.4862, 5.0546, 5.7107]
>>y
[17.592, 9.1302, 13.662, 11.854, 6.8233, 11.886, 4.3483, 12.0, 6.5987, 3.8166, 3.2522]

reading columns from text file python

this will create an empty DataFrame where the column names were imported from a txt using pandas:

import pandas as pd

df = pd.DataFrame(columns = pd.read_csv('column_names.txt', header=None)[0].values)

If you already have a Dataframe and you want to select only the columns in the txt file you can do:

import pandas as pd

new_df = old_df[pd.read_csv('column_names.txt', header=None)[0].values]

print(new_df)

read column of a text file using loop

Hi Your requirement can be fulfilled by a library called pandas.

the function is called pandas.read_csv please use the following example

data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]

more about the function

Reading a txt file and saving individual columns as lists

I would suggest the following:

import pandas as pd

df=pd.read_csv('./rvt.txt', sep='\t'), header=[a list with your column names])

Then you can use list(your_column) to work with your columns as lists

Reading a specific column of a text file into a list (python 3.6.3)

Solution

Sound like the case for a small helper function:

def read_col(fname, col=1, convert=int, sep=None):
"""Read text files with columns separated by `sep`.

fname - file name
col - index of column to read
convert - function to convert column entry with
sep - column separator
If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
"""
with open(fname) as fobj:
return [convert(line.split(sep=sep)[col]) for line in fobj]

res = read_col('mydata.txt')
print(res)

Output:

[34, 65, 106]

If you want the first column, i.e. at index 0:

read_col('mydata.txt', col=0)

If you want them to be floats:

read_col('mydata.txt', col=0, convert=float)

If the columns are separated by commas:

read_col('mydata.txt', sep=',')

You can use any combination of these optional arguments.

Explanation

We define a new function with default parameters:

def read_col(fname, col=1, convert=int, sep=None):

This means you have to supply the file fname. All other arguments are optional and the default values will be used if not provide when calling the function.

In the function, we open the file with:

with open(fname) as fobj:

Now fobj is an open file object. The file will be closed when we de-dent, i.e. here when we end the function.

This:

[convert(line.split(sep=sep)[col]) for line in fobj]

creates a list by going through all lines of the file. Each line is split at the separator sep. We take only the value for the column with index col. We also convert the value in the datatype of convert, i.e. into an integer per default.

Edit

You can also skip the first line in the file:

with open(fname) as fobj:
next(fobj)
return [convert(line.split(sep=sep)[col]) for line in fobj]

Or more sophisticated as optional argument:

def read_col(fname, col=1, convert=int, sep=None, skip_lines=0):
# skip first `skip_lines` lines
for _ in range(skip_lines):
next(fobj)
with open(fname) as fobj:
return [convert(line.split(sep=sep)[col]) for line in fobj]

Read in txt file with fixed width columns

Use read_fwf instead of read_csv.

[read_fwf reads] a table of fixed-width formatted lines into DataFrame.

https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html

import pandas as pd

colspecs = (
(0, 44),
(46, 47),
(48, 49),
(50, 51),
(52, 53),
(54, 55),
(56, 57),
(58, 59),
(60, 66),
(67, 73),
(74, 77),
(78, 80),
(81, 84),
(85, 87),
(88, 90),
(91, 95),
(96, 99),
(100, 103),
(104, 106),
)
data_url = "http://jse.amstat.org/datasets/04cars.dat.txt"

df = pd.read_fwf(data_url, colspecs=colspecs)
df.columns = (
"Vehicle Name",
"Is Sports Car",
"Is SUV",
"Is Wagon",
"Is Minivan",
"Is Pickup",
"Is All-Wheel Drive",
"Is Rear-Wheel Drive",
"Suggested Retail Price",
"Dealer Cost",
"Engine Size (litres)",
"Number of Cylinders",
"Horsepower",
"City Miles Per Gallon",
"Highway Miles Per Gallon",
"Weight (pounds)",
"Wheel Base (inches)",
"Lenght (inches)",
"Width (inches)",
)

And the output for print(df) would be:

                        Vehicle Name  ...  Width (inches)
0 Chevrolet Aveo LS 4dr hatch ... 66
1 Chevrolet Cavalier 2dr ... 69
2 Chevrolet Cavalier 4dr ... 68
3 Chevrolet Cavalier LS 2dr ... 69
4 Dodge Neon SE 4dr ... 67
.. ... ... ...
422 Nissan Titan King Cab XE ... *
423 Subaru Baja ... *
424 Toyota Tacoma ... *
425 Toyota Tundra Regular Cab V6 ... *
426 Toyota Tundra Access Cab V6 SR5 ... *

[427 rows x 19 columns]

Column names and specifications retrieved from here:

  • http://jse.amstat.org/datasets/04cars.txt

Note: Don't forget to specify where each column starts and ends. Without using colspecs, pandas is making an assumption based on the first row which leads to data errors. Below an extract of a unified diff between generated csv files (with specs and without):

Sample Image



Related Topics



Leave a reply



Submit