Faster way to read Excel files to pandas dataframe
As others have suggested, csv reading is faster. So if you are on windows and have Excel, you could call a vbscript to convert the Excel to csv and then read the csv. I tried the script below and it took about 30 seconds.
# create a list with sheet numbers you want to process
sheets = map(str,range(1,6))
# convert each sheet to csv and then read it using read_csv
df={}
from subprocess import call
excel='C:\\Users\\rsignell\\OTT_Data_All_stations.xlsx'
for sheet in sheets:
csv = 'C:\\Users\\rsignell\\test' + sheet + '.csv'
call(['cscript.exe', 'C:\\Users\\rsignell\\ExcelToCsv.vbs', excel, csv, sheet])
df[sheet]=pd.read_csv(csv)
Here's a little snippet of python to create the ExcelToCsv.vbs script:
#write vbscript to file
vbscript="""if WScript.Arguments.Count < 3 Then
WScript.Echo "Please specify the source and the destination files. Usage: ExcelToCsv <xls/xlsx source file> <csv destination file> <worksheet number (starts at 1)>"
Wscript.Quit
End If
csv_format = 6
Set objFSO = CreateObject("Scripting.FileSystemObject")
src_file = objFSO.GetAbsolutePathName(Wscript.Arguments.Item(0))
dest_file = objFSO.GetAbsolutePathName(WScript.Arguments.Item(1))
worksheet_number = CInt(WScript.Arguments.Item(2))
Dim oExcel
Set oExcel = CreateObject("Excel.Application")
Dim oBook
Set oBook = oExcel.Workbooks.Open(src_file)
oBook.Worksheets(worksheet_number).Activate
oBook.SaveAs dest_file, csv_format
oBook.Close False
oExcel.Quit
""";
f = open('ExcelToCsv.vbs','w')
f.write(vbscript.encode('utf-8'))
f.close()
This answer benefited from Convert XLS to CSV on command line and csv & xlsx files import to pandas data frame: speed issue
Best /Fastest way to read an Excel Sheet into a DataTable?
I have always used OLEDB
for this, something like...
Dim sSheetName As String
Dim sConnection As String
Dim dtTablesList As DataTable
Dim oleExcelCommand As OleDbCommand
Dim oleExcelReader As OleDbDataReader
Dim oleExcelConnection As OleDbConnection
sConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Test.xls;Extended Properties=""Excel 12.0;HDR=No;IMEX=1"""
oleExcelConnection = New OleDbConnection(sConnection)
oleExcelConnection.Open()
dtTablesList = oleExcelConnection.GetSchema("Tables")
If dtTablesList.Rows.Count > 0 Then
sSheetName = dtTablesList.Rows(0)("TABLE_NAME").ToString
End If
dtTablesList.Clear()
dtTablesList.Dispose()
If sSheetName <> "" Then
oleExcelCommand = oleExcelConnection.CreateCommand()
oleExcelCommand.CommandText = "Select * From [" & sSheetName & "]"
oleExcelCommand.CommandType = CommandType.Text
oleExcelReader = oleExcelCommand.ExecuteReader
nOutputRow = 0
While oleExcelReader.Read
End While
oleExcelReader.Close()
End If
oleExcelConnection.Close()
The ACE.OLEDB
provider will read both .xls
and .xlsx
files and I have always found the speed quite good.
How to read an excel(.xls) file like text?
If I'm understanding your question, you're trying to read the excel file like a text file.
The challenge is that .xls files are actually binary files containing the text, formatting, sheet information, macro information, etc...
You'd either need to save the files as .csv (Either via Excel before running your program or through your program directly), upgrade them to .xlsx (which has numerous libraries that can read the file as an XML at that point) or use a library (such as apache POI or anything similar) or even query the data out using ADO.
Good luck and I hope that's what you were implying via your question.
How can I read an excel file with python?
you can use pandas library by :
import pandas as pd
then you have to make sure the location of your excel file
to initialize the excel_credenciales variable, the most important thing is the correct file path
if your excel file is in the same hierarchy as your python file then its use can be done like this example
excel_credenciales = 'prueba.xlsx'
df = pd.read_excel(excel_credenciales)
but if your file is in an external folder you can add the path ../ before the filename
example excel_credenciales = '../prueba.xlsx'
How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?
The solution to your problem is to use Spark Excel
dependency in your project.
Spark Excel has flexible options
to play with.
I have tested the following code to read from excel
and convert it to dataframe
and it just works perfect
def readExcel(file: String): DataFrame = sqlContext.read
.format("com.crealytics.spark.excel")
.option("location", file)
.option("useHeader", "true")
.option("treatEmptyValuesAsNulls", "true")
.option("inferSchema", "true")
.option("addColorColumns", "False")
.load()
val data = readExcel("path to your excel file")
data.show(false)
you can give sheetname
as option
if your excel sheet has multiple sheets
.option("sheetName", "Sheet2")
I hope its helpful
Related Topics
Send Smtp Email Using System.Net.Mail via Exchange Online (Office 365)
Cancelling a Task Is Throwing an Exception
How to Implement a Progress Bar in C#
How to Use Dependency Injection with an Attribute
Ziparchive Creates Invalid Zip File
Make ASP.NET Wcf Convert Dictionary to JSON, Omitting "Key" & "Value" Tags
Find the Most Occurring Number in a List<Int>
What's So Bad About Building Xml with String Concatenation
Create Dynamic Buttons in a Grid Layout - Create a Magic Square Ui
Fast Creation of Objects Instead of Activator.Createinstance(Type)
How to Check If a List Is Ordered
Creating Composite Key Entity Framework
String Interning in .Net Framework - What Are the Benefits and When to Use Interning
Deserialize Nested JSON into C# Objects