Favourite Performance Tuning Tricks

SQL Profiler and Tuning Advisor

This script can be used to determine if you have choosen the right indexes. You need to look at how often the index is used for seek and compare it to how often the index is updated. Seek performance comes at the cost of update performance. And what is worse, when index is frequently updated you causes the index to be fragmented and the statistics to be out of date.

You should also compare the range_scan_count to singleton_lookup_count. Range scan is preferred before singleton lookup. A singleton lookup may be the cause of and index seek and a key lookup operation. That is, for every row found in the index seek, sql will lookup the datapage in the clustered index, and that is okay for lets say a couple of thousands, but not for millions of rows.

CREATE PROCEDURE [ADMIN].[spIndexCostBenefit]
    @dbname [nvarchar](75)
WITH EXECUTE AS CALLER
AS
--set @dbname='Chess'
declare @dbid nvarchar(5)
declare @sql nvarchar(2000)
select @dbid = convert(nvarchar(5),db_id(@dbname))

set @sql=N'select ''object'' = object_name(iu.object_id, iu.database_id)
        , i.name
        ,''user reads'' = iu.user_seeks + iu.user_scans + iu.user_lookups
        ,''system reads'' = iu.system_seeks + iu.system_scans + iu.system_lookups
        ,''user writes'' = iu.user_updates
        ,''system writes'' = iu.system_updates
from '+ @dbname + '.sys.dm_db_index_usage_stats iu
,' + @dbname + '.sys.indexes i
where 
    iu.database_id = ' + @dbid + '
    and iu.index_id=i.index_id
    and iu.object_id=i.object_id
    and (iu.user_seeks + iu.user_scans + iu.user_lookups)<iu.user_updates
order by ''user reads'' desc'

exec sp_executesql @sql

set @sql=N'SELECT
   ''object'' = object_name(o.object_id, o.database_id),
   o.index_id,
   ''usage_reads'' = user_seeks + user_scans + user_lookups,
   ''operational_reads'' = range_scan_count + singleton_lookup_count,
   range_scan_count,
   singleton_lookup_count,
   ''usage writes'' = user_updates,
   ''operational_leaf_writes'' = leaf_insert_count + leaf_update_count + leaf_delete_count,
   leaf_insert_count,
   leaf_update_count,
   leaf_delete_count,
   ''operational_leaf_page_splits'' = leaf_allocation_count,
   ''operational_nonleaf_writes'' = nonleaf_insert_count + nonleaf_update_count + nonleaf_delete_count,
   ''operational_nonleaf_page_splits'' = nonleaf_allocation_count
FROM
   ' + @dbname + '.sys.dm_db_index_operational_stats(' + @dbid + ', NULL, NULL, NULL) o,
   ' + @dbname + '.sys.dm_db_index_usage_stats u
WHERE
   u.object_id = o.object_id
   AND u.index_id = o.index_id
ORDER BY
   operational_reads DESC,
   operational_leaf_writes,
   operational_nonleaf_writes'

exec sp_executesql @sql

GO

What Simple Changes Made the Biggest Improvements to Your Delphi Programs

The biggest improvement came when I started using AsyncCalls to convert single-threaded applications that used to freeze up the UI, into (sort of) multi-threaded apps.

Although AsyncCalls can do a lot more, I've found it useful for this very simple purpose. Let's say you have a subroutine blocked like this: Disable Button, Do Work, Enable Button.
You move the 'Do Work' part to a local function (call it AsyncDoWork), and add four lines of code:

var  a: IAsyncCall;    
a := LocalAsyncCall(@AsyncDoWork);  
while (NOT a.Finished) do 
  application.ProcessMessages;  
a.Sync;

What this does for you is run AsyncDoWork in a separate thread, while your main thread remains available to respond to the UI (like dragging the window or clicking Abort.) When AsyncDoWork is finished the code continues. Because I moved it to a local function, all local vars are available, an the code does not need to be changed.

This is a very limited type of 'multi-threading'. Specifically, it's dual threading. You must ensure that your Async function and the UI do not both access the same VCL components or data structures. (I disable all controls except the stop button.)

I don't use this to write new programs. It's just a really quick & easy way to make old programs more responsive.

What are some good PHP performance tips?

This question is really vague. When you want to optimize your script, you first check your database and try to optimize your algorithms. There aren't many pure PHP performance tips that are going to matter. Let's see :

Concatening variables is faster than just putting them in a double-quotation mark string.

$var = 'Hello ' . $world; // is faster than
$var = "Hello $world"; // or
$var = "Hello {$world}";

Yes, it's faster, but the second and third form are even more readable and the loss of speed is so low it doesn't even matter.

When using a loop, if your condition uses a constant, put it before the loop. For instance :
```
for ($i = 0; $i < count($my_array); $i++)
```

This will evaluate count($my_array) every time. Just make an extra variable before the loop, or even inside :

for ($i = 0, $count = count($my_array); $i < $count; $i++)

The worst thing is definitely queries inside loops. Either because of lack of knowledge (trying to simulate a JOIN in PHP) or just because you don't think about it (many insert into in a loop for instance).
```
$query = mysql_query("SELECT id FROM your_table");
while ($row = mysql_fetch_assoc($query)) {
    $query2 = mysql_query("SELECT * FROM your_other_table WHERE id = {$row['id']}");
    // etc
}
```

Never do this. That's a simple INNER JOIN.

There are probably more, but really, it's not worth writing all of them down. Write your code, optimize later.

P.S. I started writing this answer when there was none, there may be some things already said in links.

Edit: for some reason, I can't format the code correctly. I really don't understand why.

What is the best way to improve performance of NHibernate?

The first and most dramatic performance problem that you can run into with NHibernate is if you are creating a new session factory for every session you create. Only one session factory instance should be created for each application execution and all sessions should be created by that factory.

Along those lines, you should continue using the same session as long as it makes sense. This will vary by application, but for most web applications, a single session per request is recommended. If you throw away your session frequently, you aren't gaining the benefits of its cache. Intelligently using the session cache can change a routine with a linear (or worse) number of queries to a constant number without much work.

Equally important is that you want to make sure that you are lazy loading your object references. If you are not, entire object graphs could be loaded for even the most simple queries. There are only certain reasons not to do this, but it is always better to start with lazy loading and switch back as needed.

That brings us to eager fetching, the opposite of lazy loading. While traversing object hierarchies or looping through collections, it can be easy to lose track of how many queries you are making and you end up with an exponential number of queries. Eager fetching can be done on a per query basis with a FETCH JOIN. In rare circumstances, such as if there is a particular pair of tables you always fetch join, consider turning off lazy loading for that relationship.

As always, SQL Profiler is a great way to find queries that are running slow or being made repeatedly. At my last job we had a development feature that counted queries per page request as well. A high number of queries for a routine is the most obvious indicator that your routine is not working well with NHibernate. If the number of queries per routine or request looks good, you are probably down to database tuning; making sure you have enough memory to store execution plans and data in the cache, correctly indexing your data, etc.

One tricky little problem we ran into was with SetParameterList(). The function allows you to easily pass a list of parameters to a query. NHibernate implemented this by creating one parameter for each item passed in. This results in a different query plan for every number of parameters. Our execution plans were almost always getting released from the cache. Also, numerous parameters can significantly slow down a query. We did a custom hack of NHibernate to send the items as a delimited list in a single parameter. The list was separated in SQL Server by a table value function that our hack automatically inserted into the IN clause of the query. There could be other land mines like this depending on your application. SQL Profiler is the best way to find them.

JNA Fortran performance tuning

As noted in the JNA FAQ, direct mapping would be your best performance increase, but you've excluded that as an option. It also notes that the calling overhead for each native call is another performance hit, which you've partially addressed by changing setAutoWrite().

You also did mention flattening your structures to an array of primitives, but rejected that due to encoding/decoding complexity. However, moving in this direction is probably the next best choice, and it's possible that the biggest performance issue you're currently facing is a combination of JNA's Structure access using reflection and native reads. Oracle notes:

Because reflection involves types that are dynamically resolved,
certain Java virtual machine optimizations can not be performed.
Consequently, reflective operations have slower performance than their
non-reflective counterparts, and should be avoided in sections of code
which are called frequently in performance-sensitive applications.

Since you are here asking a performance-related question and using JNA Structures, I can only assume you're writing a "performance-sensitive application". Internally, the Structure does this:

for (StructField structField : fields().values()) {
    readField(structField);
}

which does a single Native read for each field, followed by this, which ends up using reflection under the hood.

setFieldValue(structField.field, result, true);

The moral of the story is that normally with Structures, generally each field involves a native read + reflection write, or a reflection read + native write.

The first step you can make without making any other changes is to setAutoSynch(false) on the structure. (You've already done half of this with the "write" version; this does both read and write.) From the docs:

For extremely large or complex structures where you only need to
access a small number of fields, you may see a significant performance
benefit by avoiding automatic structure reads and writes. If auto-read
and -write are disabled, it is up to you to ensure that the Java
fields of interest are synched before and after native function calls
via readField(String) and writeField(String,Object). This is typically
most effective when a native call populates a large structure and you
only need a few fields out of it. After the native call you can call
readField(String) on only the fields of interest.

To really go all out, flattening will possibly help a little more to get rid of any of the reflection overhead. The trick is making the offset conversions easy.

Some directions to go, balancing complexity vs. performance:

To write to native memory, allocate and clear a buffer of bytes (mem = new Memory(size); mem.clear(); or just new byte[size]), and write specific fields to the byte offset you determine using the value from Structure.fieldOffset(name). This does use reflection, but you could do this once for each structure and store a map of name to offset for later use.
For reading from native memory, make all your native read calls using a flat buffer to reduce the native overhead to a single read/write. You can cast that buffer to a Structure when you read it (incurring reflection for each field once) or read specific byte offsets per the above strategy.

HTML5 Canvas Performance and Optimization Tips, Tricks and Coding Best Practices

Redraw Regions

The best canvas optimization technique for animations is to limit the amount of pixels that get cleared/painted on each frame. The easiest solution to implement is resetting the entire canvas element and drawing everything over again but that is an expensive operation for your browser to process.

Reuse as many pixels as possible between frames. What that means is the fewer pixels that need to be processed each frame, the faster your program will run. For example, when erasing pixels with the clearRect(x, y, w, h) method, it is very beneficial to clear and redraw only the pixels that have changed and not the full canvas.

Procedural Sprites

Generating graphics procedurally is often the way to go, but sometimes that's not the most efficient one. If you're drawing simple shapes with solid fills then drawing them procedurally is the best way do so. But if you're drawing more detailed entities with strokes, gradient fills and other performance sensitive make-up you'd be better off using image sprites.

It is possible to get away with a mix of both. Draw graphical entities procedurally on the canvas once as your application starts up. After that you can reuse the same sprites by painting copies of them instead of generating the same drop-shadow, gradient and strokes repeatedly.

State Stack & Transformation

The canvas can be manipulated via transformations such as rotation and scaling, resulting in a change to the canvas coordinate system. This is where it's important to know about the state stack for which two methods are available: context.save() (pushes the current state to the stack) and context.restore() (reverts to the previous state). This enables you to apply transformation to a drawing and then restore back to the previous state to make sure the next shape is not affected by any earlier transformation. The states also include properties such as the fill and stroke colors.

Compositing

A very powerful tool at hand when working with canvas is compositing modes which, amongst other things, allow for masking and layering. There's a wide array of available composite modes and they are all set through the canvas context's globalCompositeOperation property. The composite modes are also part of the state stack properties, so you can apply a composite operation, stack the state and apply a different one, and restore back to the state before where you made the first one. This can be especially useful.

Anti-Aliasing

To allow for sub-pixel drawings, all browser implementations of canvas employ anti-aliasing (although this does not seem to be a requirement in the HTML5 spec). Anti-aliasing can be important to keep in mind if you want to draw crisp lines and notice the result looks blurred. This occurs because the browser will interpolate the image as though it was actually between those pixels. It results in a much smoother animation (you can genuinely move at half a pixel per update) but it'll make your images appear fuzzy.

To work around this you will need to either round to whole integer values or offset by half a pixel depending on if you're drawing fills or strokes.

Using Whole Numbers for drawImage() x and y positions

If you call drawImage on the Canvas element, it's much faster if you round the x and y position to a whole number.

Here's a test case on jsperf showing how much faster using whole numbers is compared to using decimals.

So round your x and y position to whole numbers before rendering.

Faster than Math.round()

Another jsperf test shows that Math.round() is not necessarily the fastest method for rounding numbers. Using a bitwise hack actually turns out to be faster than the built in method.

Canvas Sprite Optimization

Clearing the Canvas

To clear the entire canvas of any existing pixels context.clearRect(x, y, w, h) is typically used – but there is another option available. Whenever the width/height of the canvas are set, even if they are set to the same value repeatedly, the canvas is reset. This is good to know when working with a dynamically sized canvas as you will notice drawings disappearing.

Computation Distribution

The Chrome Developer Tools profiler is very useful for finding out what your performance bottlenecks are. Depending on your application you may need to refactor some parts of your program to improve the performance and how browsers handle specific parts of your code.

Optimization techniques

Performance tips for classic asp?

GetRows This will create the speed you seek. Here are some other tips I have used.