Reading columns of a txt file on python
x.split(' ')
is not useful, because columns of your text file separated by more than one space. Use x.split()
to ignore spaces:
token = open('token_data.txt','r')
linestoken=token.readlines()
tokens_column_number = 1
resulttoken=[]
for x in linestoken:
resulttoken.append(x.split()[tokens_column_number])
token.close()
print(resulttoken)
How to read 2 columns of .txt file & store it in x & y variables in python
you can use csv reader
import csv
x = []
y = []
with open('file.txt') as f:
reader = csv.reader(f)
for row in reader:
x.append(float(row[0]))
y.append(float(row[1]))
Output:
>>x
[6.1101, 5.5277, 8.5186, 7.0032, 5.8598, 8.3829, 7.4764, 8.5781, 6.4862, 5.0546, 5.7107]
>>y
[17.592, 9.1302, 13.662, 11.854, 6.8233, 11.886, 4.3483, 12.0, 6.5987, 3.8166, 3.2522]
reading columns from text file python
this will create an empty DataFrame where the column names were imported from a txt using pandas:
import pandas as pd
df = pd.DataFrame(columns = pd.read_csv('column_names.txt', header=None)[0].values)
If you already have a Dataframe and you want to select only the columns in the txt file you can do:
import pandas as pd
new_df = old_df[pd.read_csv('column_names.txt', header=None)[0].values]
print(new_df)
read column of a text file using loop
Hi Your requirement can be fulfilled by a library called pandas.
the function is called pandas.read_csv please use the following example
data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]
more about the function
Reading a txt file and saving individual columns as lists
I would suggest the following:
import pandas as pd
df=pd.read_csv('./rvt.txt', sep='\t'), header=[a list with your column names])
Then you can use list(your_column) to work with your columns as lists
Reading a specific column of a text file into a list (python 3.6.3)
Solution
Sound like the case for a small helper function:
def read_col(fname, col=1, convert=int, sep=None):
"""Read text files with columns separated by `sep`.
fname - file name
col - index of column to read
convert - function to convert column entry with
sep - column separator
If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
"""
with open(fname) as fobj:
return [convert(line.split(sep=sep)[col]) for line in fobj]
res = read_col('mydata.txt')
print(res)
Output:
[34, 65, 106]
If you want the first column, i.e. at index 0
:
read_col('mydata.txt', col=0)
If you want them to be floats:
read_col('mydata.txt', col=0, convert=float)
If the columns are separated by commas:
read_col('mydata.txt', sep=',')
You can use any combination of these optional arguments.
Explanation
We define a new function with default parameters:
def read_col(fname, col=1, convert=int, sep=None):
This means you have to supply the file fname
. All other arguments are optional and the default values will be used if not provide when calling the function.
In the function, we open the file with:
with open(fname) as fobj:
Now fobj
is an open file object. The file will be closed when we de-dent, i.e. here when we end the function.
This:
[convert(line.split(sep=sep)[col]) for line in fobj]
creates a list by going through all lines of the file. Each line is split at the separator sep
. We take only the value for the column with index col
. We also convert the value in the datatype of convert
, i.e. into an integer per default.
Edit
You can also skip the first line in the file:
with open(fname) as fobj:
next(fobj)
return [convert(line.split(sep=sep)[col]) for line in fobj]
Or more sophisticated as optional argument:
def read_col(fname, col=1, convert=int, sep=None, skip_lines=0):
# skip first `skip_lines` lines
for _ in range(skip_lines):
next(fobj)
with open(fname) as fobj:
return [convert(line.split(sep=sep)[col]) for line in fobj]
Read in txt file with fixed width columns
Use read_fwf
instead of read_csv
.
[
read_fwf
reads] a table of fixed-width formatted lines into DataFrame.
https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html
import pandas as pd
colspecs = (
(0, 44),
(46, 47),
(48, 49),
(50, 51),
(52, 53),
(54, 55),
(56, 57),
(58, 59),
(60, 66),
(67, 73),
(74, 77),
(78, 80),
(81, 84),
(85, 87),
(88, 90),
(91, 95),
(96, 99),
(100, 103),
(104, 106),
)
data_url = "http://jse.amstat.org/datasets/04cars.dat.txt"
df = pd.read_fwf(data_url, colspecs=colspecs)
df.columns = (
"Vehicle Name",
"Is Sports Car",
"Is SUV",
"Is Wagon",
"Is Minivan",
"Is Pickup",
"Is All-Wheel Drive",
"Is Rear-Wheel Drive",
"Suggested Retail Price",
"Dealer Cost",
"Engine Size (litres)",
"Number of Cylinders",
"Horsepower",
"City Miles Per Gallon",
"Highway Miles Per Gallon",
"Weight (pounds)",
"Wheel Base (inches)",
"Lenght (inches)",
"Width (inches)",
)
And the output for print(df)
would be:
Vehicle Name ... Width (inches)
0 Chevrolet Aveo LS 4dr hatch ... 66
1 Chevrolet Cavalier 2dr ... 69
2 Chevrolet Cavalier 4dr ... 68
3 Chevrolet Cavalier LS 2dr ... 69
4 Dodge Neon SE 4dr ... 67
.. ... ... ...
422 Nissan Titan King Cab XE ... *
423 Subaru Baja ... *
424 Toyota Tacoma ... *
425 Toyota Tundra Regular Cab V6 ... *
426 Toyota Tundra Access Cab V6 SR5 ... *
[427 rows x 19 columns]
Column names and specifications retrieved from here:
- http://jse.amstat.org/datasets/04cars.txt
Note: Don't forget to specify where each column starts and ends. Without using colspecs
, pandas
is making an assumption based on the first row which leads to data errors. Below an extract of a unified diff between generated csv
files (with specs and without):
Related Topics
Check If Dataframe Has a Zero Element
Retrieving Subfolders Names in S3 Bucket from Boto3
How to Pad a String With Leading Zeros in Python 3
Discord.Py Show Who Invited a User
Jsondecodeerror: Expecting Value: Line 1 Column 1 (Char 0)
How to Convert an Integer to Time
Python Key Error=0 - Can't Find Dict Error in Code
Matching Text Between a Pair of Single Quotes
Regex to Append Some Characters in a Certain Position
Python 3D Polynomial Surface Fit, Order Dependent
Creating Random Pairs from Lists
Convert String to Python Class Object
How to Replace Nan Values Where the Other Columns Meet a Certain Criteria
What Is the Easiest Way to Convert Ndarray into Cv::Mat
How to Display the Value of the Bar on Each Bar With Pyplot.Barh()
How to Disable the Security Certificate Check in Python Requests