How to convert xls to xlsx using R
In case someone else is also interested, I really didn't figure out how to do this in R but I did found this https://www.extendoffice.com/documents/excel/1349-excel-batch-convert-xls-to-xlsx.html#a2 and I can keep the exact same format from the original file.
Converting xls files to xlsx using python
use pandas
import pandas as pd
df = pd.read_excel("file.xls")
df.to_excel("file.xlsx")
Powershell - Convert .XLS file to .XLSX
The error message is clear. You forgot to close the try{..}
block with an ending bracket }
and a try{..}
should be followed up by either one or more catch{..}
blocks and optionally a finally{..}
block.
You can read about that on about Try Catch Finally.
Then, there are some other things wrong and/or can be improved upon in your code as well.
$folderpath
is not defined and should be the source folder$downloadfolder
- use
-Filter
instead of-Include
as it is much faster. Also you have left out the dot in '*.xls' - append switch
-File
to the Get-ChildItem cmdlet to make sure you will not receive and try to process directories as well - you can save the converted .xlsx files directly to the uploadfolder, no need to create first and then move
- to remove the used COM objects, release them from memory first and then initiate the Garbage Collect.
Do this after you have quit Excel.
# set folders
$downloadfolder = "C:\Users\Test" # folder where the .xls files are
$uploadfolder = "C:\Users\Test\Upload" # folder that uploads the .xlsx files
$backupfolder = "C:\Users\Test\Backup" # folder that has .xls files as backup
# open and convert xls to xlsx
Add-Type -AssemblyName Microsoft.Office.Interop.Excel
$xlFixedFormat = [Microsoft.Office.Interop.Excel.XlFileFormat]::xlOpenXMLWorkbook
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $false # it is much faster if Excel is not visible
# loop through the .xls files and process them
Get-ChildItem -Path $downloadfolder -Filter '*.xls' -Recurse -File |
ForEach-Object {
try {
$xlsfilename = $_.FullName
#copy file to backup folder
Copy-Item -Path $xlsfilename -Destination $backupfolder -Force
# open the xls
Write-Host "Converting $xlsfilename"
$workbook = $excel.Workbooks.Open($xlsfilename)
# save converted file (as xlsx) directly to the upload folder
$newfilename = Join-Path -Path $uploadfolder -ChildPath ('{0}.xlsx' -f $_.BaseName)
$workbook.SaveAs($newfilename, $xlFixedFormat)
$workbook.Close()
#remove old file
Write-Host "Delete old file '$xlsfilename'"
Remove-Item -Path $xlsfilename -Force
}
catch {
# write out a warning as to why something went wrong
Write-Warning "Could not convert '$xlsfilename':`r`n$($_.Exception.Message)"
}
}
# close excel
$excel.Quit()
# garbage collection
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
Converting .xls to .xlsx files with VBA doesn't seem to reduce size
I seem to have figured it out.
For some reason if you don't reopen and close with saving changes a freshly converted file excel doesn't reduce files size. Maybe it compresses file in the background or something - not sure. Right now I just added a few lines opening and closing converted files with saving changes - works for me! Sometimes reducing size so drastically that it becomes 10MBs less
Convert xls to xlsx with excess headers
Answering the second part of my question. Still not certain on how to take in xls files.
If I convert the file to a CSV file, then use this command to remove the top few lines. skiprows is the method to use that cuts out the top section of a csv or xlsx file during the df's intialization.
df = pd.read_csv('file_name.csv', skiprows = 8)
How to convert xls to xlsx with Python Azure Webjob
On Azure WebApps for Windows, due to there is a secure environment called Azure Web App Sandbox which will restricts many operations include Win32k.sys (User32/GDI32) Restrictions
. So you can not call any COM components via win32com
that cause your issue.
In pure Python, there are many soultions to convert xls
to xlsx
, such as the common two as below.
Install
pyexcel
viapip install pyexcel pyexcel-xls pyexcel-xlsx
to do it.import pyexcel as p
p.save_book_as(file_name='<your input file>.xls', dest_file_name='<your output file>.xlsx')Install
pandas
viapip install pandas
to do it.import pandas as pd
dataFrame = pd.read_excel('<your input file>.xls')
dataFrame.to_excel('<your output file>.xlsx', index=False)
Note: the two solutions above just can convert these xls
files of common data types which like string, number, not rich content (chart or image, etc.)
If you want to convert a xls
file with rich content or without any format changes via win32com
with Excel.Application
, you have to use Azure Windows VM to run your script and integrate with your WebApp.
Related Topics
What Determines Which Strings Are Interned and When
When Are Objects Garbage Collected in Python
Elegant Way to Check If a Nested Key Exists in a Dict
Tkinter Vanishing Photoimage Issue
Differencebetween .Quit and .Quit in Pygame
How to Count the Nan Values in a Column in Pandas Dataframe
A Tool to Convert Matlab Code to Python
Progress Indicator During Pandas Operations
Ssl Insecureplatform Error When Using Requests Package
Adding a Legend to Pyplot in Matplotlib in the Simplest Manner Possible
Super() Raises "Typeerror: Must Be Type, Not Classobj" for New-Style Class
How to Add a Custom Ca Root Certificate to the Ca Store Used by Pip in Windows
How to Read File N Lines at a Time
Fastest Way to Grow a Numpy Numeric Array
How to Get All the Contiguous Substrings of a String in Python