Reading Excel File using Python, how do I get the values of a specific column with indicated column name?
This is one approach:
from xlrd import open_workbook
class Arm(object):
def __init__(self, id, dsp_name, dsp_code, hub_code, pin_code, pptl):
self.id = id
self.dsp_name = dsp_name
self.dsp_code = dsp_code
self.hub_code = hub_code
self.pin_code = pin_code
self.pptl = pptl
def __str__(self):
return("Arm object:\n"
" Arm_id = {0}\n"
" DSPName = {1}\n"
" DSPCode = {2}\n"
" HubCode = {3}\n"
" PinCode = {4} \n"
" PPTL = {5}"
.format(self.id, self.dsp_name, self.dsp_code,
self.hub_code, self.pin_code, self.pptl))
wb = open_workbook('sample.xls')
for sheet in wb.sheets():
number_of_rows = sheet.nrows
number_of_columns = sheet.ncols
items = []
rows = []
for row in range(1, number_of_rows):
values = []
for col in range(number_of_columns):
value = (sheet.cell(row,col).value)
try:
value = str(int(value))
except ValueError:
pass
finally:
values.append(value)
item = Arm(*values)
items.append(item)
for item in items:
print item
print("Accessing one single value (eg. DSPName): {0}".format(item.dsp_name))
print
You don't have to use a custom class, you can simply take a dict()
. If you use a class however, you can access all values via dot-notation, as you see above.
Here is the output of the script above:
Arm object:
Arm_id = 1
DSPName = JaVAS
DSPCode = 1
HubCode = AGR
PinCode = 282001
PPTL = 1
Accessing one single value (eg. DSPName): JaVAS
Arm object:
Arm_id = 2
DSPName = JaVAS
DSPCode = 1
HubCode = AGR
PinCode = 282002
PPTL = 3
Accessing one single value (eg. DSPName): JaVAS
Arm object:
Arm_id = 3
DSPName = JaVAS
DSPCode = 1
HubCode = AGR
PinCode = 282003
PPTL = 5
Accessing one single value (eg. DSPName): JaVAS
Python read only specific columns excel sheet by column name
If you want Col2
and Col3
the you can use the following code:
import pandas as pd
df = pd.read_excel(file_path, sheet_name=sheet_name, usecols = ['Col2','Col3'])
or you can use this:
import pandas as pd
df = pd.read_excel(file_path, sheet_name=sheet_name)[['Col2', 'Col3']]
Get column data by Column name and sheet name
Yes, you are looking for the col_values()
worksheet method. Instead of
arrayofvalues = sheet['columnname']
you need to do
arrayofvalues = sheet.col_values(columnindex)
where columnindex
is the number of the column (counting from zero, so column A is index 0, column B is index 1, etc.). If you have a descriptive heading in the first row (or first few rows) you can give a second parameter that tells which row to start from (again, counting from zero). For example, if you have one header row, and thus want values starting in the second row, you could do
arrayofvalues = sheet.col_values(columnindex, 1)
Please check out the tutorial for a reasonably readable discussion of the xlrd
package. (The official xlrd
documentation is harder to read.)
Also note that (1) while you are free to use the name arrayofvalues
, what you are really getting is a Python list, which technically isn't an array, and (2) the on_demand
workbook parameter has no effect when working with .xlsx files, which means xlrd
will attempt to load the entire workbook into memory regardless. (The on_demand
feature works for .xls files.)
Read certain column in excel to dataframe
there is a solution but csv are not treated the same way excel does.
from documentation, for csv:
usecols : list-like or callable, default None
For example, a valid list-like usecols parameter would be [0, 1, 2] or [‘foo’, ‘bar’, ‘baz’].
for excel:
usecols : int or list, default None
- If None then parse all columns,
- If int then indicates last column to be parsed
- If list of ints then indicates list of column numbers to be parsed
- If string then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides
so you need to call it like this:
xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='ForeignKey')
and if you need also 'number'
:
xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='number,ForeignKey')
EDIT:
you need to put the name of the excel column not the name of the data.
the other answer solve this.
however you won't need 'B:B', 'B' will do the trick BUT that won't improve the usecols with numbers.
if you can load all the datas in not time maybe the best way to solve this is to parse all columns and then select the desired columns:
xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2')['ForeignKey']
Pandas read columns from csv in a given data type with unknown column name
With pandas read_excel() or read_csv() function, you can provide it the 'dtype'
param, where you can specify the type you want any column to have, for example:
In your case, you can add that param like this:
df_model= pd.read_excel('filename.xlsx', dtype={'Std': int})
Related Topics
Convert Timedelta to Total Seconds
How to Access a Dictionary Key Value Present Inside a List
Operationalerror: Database Is Locked
Python SQLite Parameter Substitution with Wildcards in Like
Except-Clause Deletes Local Variable
Get an Attribute Value Based on the Name Attribute with Beautifulsoup
How to Edit a Seaborn Legend Title and Labels for Figure-Level Functions
Get the Position of the Largest Value in a Multi-Dimensional Numpy Array
Link Several Popen Commands with Pipes
What's the Cleanest Way to Extract Urls from a String Using Python
How to Give Column Name Dynamically from String Variable in SQL Alchemy Filter
Pygame How to Let Balls Collide
Failed to Upload Packages to Pypi: 410 Gone
Understanding String Reversal via Slicing