Fastest Way to Interface Between Live (Unsaved) Excel Data and C# Objects

Fastest way to interface between live (unsaved) Excel data and C# objects

If the C# application is a stand-alone application, then you will always have cross-process marshaling involved that will overwhelm any optimizations you can do by switching languages from, say, C# to C++. Stick to your most preferred language in this situation, which sounds like is C#.

If you are willing to make an add-in that runs within Excel, however, then your operations will avoid cross-process calls and run about 50x faster.

If you run within Excel as an add-in, then VBA is among the fastest options, but it does still involve COM and so C++ calls using an XLL add-in would be fastest. But VBA is still quite fast in terms of calls to the Excel object model. As for actual calculation speed, however, VBA runs as pcode, not as fully compiled code, and so executes about 2-3x slower than native code. This sounds very bad, but it isn't because the vast majority of the execution time taken with a typical Excel add-in or application involves calls to the Excel object model, so VBA vs. a fully compiled COM add-in, say using natively compiled VB 6.0, would only be about 5-15% slower, which is not noticeable.

VB 6.0 is a compiled COM approach, and runs 2-3x faster than VBA for non-Excel related calls, but VB 6.0 is about 12 years old at this point and won't run in 64 bit mode, say if installing Office 2010, which can be installed to run 32 bit or 64 bit. Usage of 64 bit Excel is tiny at the moment, but will grow in usage, and so I would avoid VB 6.0 for this reason.

C#, if running in-process as an Excel add-in would execute calls to the Excel object model as fast as VBA, and execute non-Excel calls 2-3x faster than VBA -- if running unshimmed. The approach recommended by Microsoft, however, is to run fully shimmed, for example, by making use of the COM Shim Wizard. By being shimmed, Excel is protected from your code (if it's faulty) and your code is fully protected from other 3rd party add-ins that could otherwise potentially cause problems. The down-side to this, however, is that a shimmed solution runs within a separate AppDomain, which requires cross-AppDomain marshaling that incurrs an execution speed penalty of about 40x -- which is very noticeable in many contexts.

Add-ins using Visual Studio Tools for Office (VSTO) are automatically loaded within a shim and executes within a separate AppDomain. There is no avoiding this if using VSTO. Therefore, calls to the Excel object model would also incur an approximately 40x execution speed degradation. VSTO is a gorgeous system for making very rich Excel add-ins, but execution speed is its weakness for applications such as yours.

ExcelDna is a free, open source project that allows you to use C# code, which is then converted for you to an XLL add-in that uses C++ code. That is, ExcelDna parses your C# code and creates the required C++ code for you. I've not used it myself, but I am familiar with the process and it's very impressive. ExcelDna gets very good reviews from those that use it. [Edit: Note the following correction as per Govert's comments below: "Hi Mike - I want add a small correction to clarify the Excel-Dna implementation: all the managed-to-Excel glue works at runtime from your managed assembly using reflection - there is no extra pre-compilation step or C++ code generation. Also, even though Excel-Dna uses .NET, there need not be any COM interop involved when talking to Excel - as an .xll the native interface can be used directly from .NET (though you can also use COM if you want). This makes high-performance UDFs and macros possible." – Govert]

You also might want to look at Add-in Express. It's not free, but it would allow you to code in C# and although it shims your solution into a separate AppDomain, I believe that it's execution speed is outstanding. If I am understanding its execution speed correctly, then I'm not sure how Add-in Express doing this, but it might be taking advantage of something called FastPath AppDomain marshaling. Don't quote me on any of this, however, as I'm not very familiar with Add-in Express. You should check it out though and do your own research. [Edit: Reading Charles Williams' answer, it looks like Add-in Express enables both COM and C API access. And Govert states that Excel DNA also enables both COM and the fastrer C API access. So you'd probably want to check out both and compare them to ExcelDna.]

My advice would be to research Add-in Express and ExcelDna. Both approaches would allow you to code using C#, which you seem most familiar with.

The other main issue is how you make your calls. For example, Excel is very fast when handling an entire range of data passed back-and-forth as an array. This is vastly more efficient than looping through the cells individually. For example, the following code makes use of the Excel.Range.set_Value accessor method to assign a 10 x 10 array of values to a 10 x 10 range of cells in one shot:

void AssignArrayToRange()
{
// Create the array.
object[,] myArray = new object[10, 10];

// Initialize the array.
for (int i = 0; i < myArray.GetLength(0); i++)
{
for (int j = 0; j < myArray.GetLength(1); j++)
{
myArray[i, j] = i + j;
}
}

// Create a Range of the correct size:
int rows = myArray.GetLength(0);
int columns = myArray.GetLength(1);
Excel.Range range = myWorksheet.get_Range("A1", Type.Missing);
range = range.get_Resize(rows, columns);

// Assign the Array to the Range in one shot:
range.set_Value(Type.Missing, myArray);
}

One can similarly make use of the Excel.Range.get_Value accessor method to read an array of values from a range in one step. Doing this and then looping through the values within the array is vastly faster than looping trough the values within the cells of the range individually.

Fastest way to access live Excel data/properties from another process with .NET

Maybe you can read the file using one of the direct file readers, instead of going through Excel. (If need be you can automate Excel to save the file, or a copy of part of the file.)

As a start I suggest you look at ClosedXML, but there are many similar projects. ClosedXML look like an active project and tries to mirror the COM interface. [I have no experience with it myself.]

Fastest way to write cells to Excel with Office Interop?

You should avoid reading and writing cell by cell if you can. It is much faster to work with arrays, and read or write entire blocks at once. I wrote a post a while back on reading from worksheets using C#; basically, the same code works the other way around (see below), and will run much faster, especially with larger blocks of data.

  var sheet = (Worksheet)Application.ActiveSheet;
var range = sheet.get_Range("A1", "B2");
var data = new string[3,3];
data[0, 0] = "A1";
data[0, 1] = "B1";
data[1, 0] = "A2";
data[1, 1] = "B2";
range.Value2 = data;

ExcelDna/NetOffice/Excel: Most efficient way to insert values?

Setting data per cell is slow. You can set the value of a large range to an array of values, and this will be immensely faster.

Temporarily switching off screen updating and recalculation will help too.

For lengthy and detailed discussions, including code for using the C API from you Excel-DNA add-in, see Fastest way to interface between live (unsaved) Excel data and C# objects

Processing large Excel files using multiple threads

Excel is essentially a single-threaded application (technically, the COM objects live in a Single-Threaded Apartment). That means any COM access gets automatically marshalled to the main thread, so there is no benefit in using extra thread to make COM calls.

For your use case, it would make sense to get the whole data array in a single call to Range.Value and then process this array further without using extra COM calls.

You might also have a look at this question for ideas on how to read and write the range data quickly, including an example that uses the Excel C API.

A different approach is to read the Excel data file directly, not interacting with the Excel application. For this you can use a high-level wrapper over the xml-based file format like ClosedXML.

Open Excel programmatically in Read Shared mode

You may check readonly or isDirty.

With a sample program this may be checked quickly by opening the same workbook in Excel at the same time.

VSTO excel object performance issue

You're diving into the world of COM threading models. This is a good a start as any: http://msdn.microsoft.com/en-us/library/ms693344(VS.85).aspx.

If the code runs on the Excel main thread (which you achieve by setting up the Dispatcher), the COM calls are not marshaled across different threads. Since you have many COM calls (each .Value counts as one) the overhead adds up to the differences you see.

One of the reasons why the marshaling is expensive in this context, is that the Excel COM objects are running in a single-threaded apartment (STA), which means there is a message loop set up (actually a Windows message loop) in Excel to serialize the COM calls. Every cross-apartment call you make results in a message being posted to this message loop, which is the processed on the main Excel thread.

So the two cases differ in performance due to the COM cross-apartment marshaling. It's actually remarkably fast, given what is going on behind the scenes.

In both cases, making a single call to set a large range's .Value to an array of values will be much faster. And for the fastest (million cells a second) way to set data into your Excel sheet, see here: Fastest way to interface between live (unsaved) Excel data and C# objects.

ExcelDnaUtil vs Interop.Excel

If performance matters then avoid VSTO-Interop.

If you need to target multiple Excel versions avoid VSTO-Interop.

If you think you might want UDFs in the future avoid VSTO-Interop.

Otherwise VSTO is OK.

As well as Excel-DNA you should also look at Addin-Express, which also does not suffer from the VSTO shortcomings.



Related Topics



Leave a reply



Submit