What Happens When Queryperformancecounter Is Called

What happens when QueryPerformanceCounter is called?

Windows QueryPerformanceCounter() has logic to determine the number of processors and invoke syncronization logic if necessary. It attempts to use the TSC register but for multiprocessor systems this register is not guaranteed to be syncronized between processors (and more importantly can vary greatly due to intelligent downclocking and sleep states).

MSDN says that it doesn't matter which processor this is called on so you may be seeing extra syncronization code for such a situation cause overhead. Also remember that it can invoke a bus transfer so you may be seeing bus contention delays.

Try using SetThreadAffinityMask() if possible to bind it to a specific processor. Otherwise you might just have to live with the delay or you could try a different timer (for example take a look at http://en.wikipedia.org/wiki/High_Precision_Event_Timer).

Performance impact of QueryPerformanceCounter

Although old, this Dr Dobb's article gives a nice summary of your options and their costs, pros and cons (see the tables right at the end), including QPC.

TBH, to get real timings for your situation, use a profiler (like AMD's CodeAnalyst) or something like Dr Fog's performance monitor(site, which also has some other things of use, depend how far you want to go into the hole of 'how much will this call/instruction/action cost').

How to use QueryPerformanceCounter?

#include <windows.h>

double PCFreq = 0.0;
__int64 CounterStart = 0;

void StartCounter()
{
    LARGE_INTEGER li;
    if(!QueryPerformanceFrequency(&li))
    cout << "QueryPerformanceFrequency failed!\n";

    PCFreq = double(li.QuadPart)/1000.0;

    QueryPerformanceCounter(&li);
    CounterStart = li.QuadPart;
}
double GetCounter()
{
    LARGE_INTEGER li;
    QueryPerformanceCounter(&li);
    return double(li.QuadPart-CounterStart)/PCFreq;
}

int main()
{
    StartCounter();
    Sleep(1000);
    cout << GetCounter() <<"\n";
    return 0;
}

This program should output a number close to 1000 (windows sleep isn't that accurate, but it should be like 999).

The StartCounter() function records the number of ticks the performance counter has in the CounterStart variable. The GetCounter() function returns the number of milliseconds since StartCounter() was last called as a double, so if GetCounter() returns 0.001 then it has been about 1 microsecond since StartCounter() was called.

If you want to have the timer use seconds instead then change

PCFreq = double(li.QuadPart)/1000.0;

PCFreq = double(li.QuadPart);

or if you want microseconds then use

PCFreq = double(li.QuadPart)/1000000.0;

But really it's about convenience since it returns a double.

QueryPerformanceCounter and overflows

QueryPerformanceCounter is notorious for its unreliability. It's fine to use for individual short-interval timing, if you're prepared to handle abnormal results. It is not exact - It's typically based on the PCI bus frequency, and a heavily loaded bus can lead to lost ticks.

GetTickCount is actually more stable, and can give you 1ms resolution if you've called timeBeginPeriod. It will eventually wrap, so you need to handle that.

__rdtsc should not be used, unless you're profiling and have control of which core you're running on and are prepared to handle variable CPU frequency.

GetSystemTime is decent for longer periods of measurements, but will jump when the system time is adjusted.

Also, Sleep(0) does not do what you think it does. It will yield the cpu if another context wants it - otherwise it'll return immediately.

In short, timing on windows is a mess. One would think that today it'd be possible to get accurate long-term timing from a computer without going through hoops - but this isn't the case. In our game framework we're using several time sources and corrections from the server to ensure all connected clients have the same game time, and there's a lot of bad clocks out there.

Your best bet would likely be to just use GetTickCount or GetSystemTime, wrap it into something that adjusts for time jumps/wrap arounds.

Also, you should convert your double interval to an int64 milliseconds and then use only integer math - this avoids problems due to floating point types' varying accuracy based on their contents.

QueryPerformanceCounter() vs QueryInterruptTime() vs KeQueryInterruptTime()

QueryInterruptTime() and QueryInterruptTimePrecise() require Windows 10 / Server 2016 as minimum versions.

You must have read Acquiring high-resolution time stamps while reading the QueryPerformanceCounter() documentation. This clearly discloses the advantages and pitfalls of QueryPerformanceCounter(). The function only shows notable overhead on HPET and PM timer platforms because it requires a kernel transition on such platforms. More recent platforms do perform timekeeping and performance counter business using the CPUs Time Stamp Counter (TSC) with very little overhead.

If it is just about precise time, you may also look into GetSystemTimePreciseAsFileTime() . This function is supported from Windows 8 / Server 2012 (desktop only). However, it also uses the performance counter under the hood and therefore suffers from the same bottleneck in terms of overhead (HPET/PM timer).

The same applies for the function QueryInterruptTimePrecise().

The function KeQueryUnbiasedInterruptTime() can be used in kernel mode drivers and it can bridge sleep states accurately when compared to KeQueryInterruptTimePrecise(). Note: No kernel transition here but you'd have to do the transition elsewhere I suspect.

Unfortunately, there is no KeQueryUnbiasedInterruptTimePrecise().

QueryPerformanceCounter throwing incorrect numbers

You are assuming that the frequency from the performance counter is exactly 1000000 Hz.

You need to call QueryPerformanceFrequency instead, as the frequency can vary (some kernels use the motherboard's 1.024 MHz timer, others use the CPUs time-stamp-counter, which runs at approximately the CPU's clock frequency).

Time running backwards with QueryPerformanceCounter()

Counter Value: 6536266821
Counter Value: 6536266262
Counter Value: 6536266604

even the third read is smaller than the first! But what is the relevance here? You should read the performance counter frequency using QueryPerformanceFrequency() to investigate what a difference of a few hundred counts actually means. With a frequence in the MHz range this would still be much better than a millisecond. Can you provide a longer list of consecutive reads of QueryPerformanceCounter()

You should also provide more details about the hardware. What resource is used for the performance counter? Acquiring high-resolution time stamps may help you to get a more detailed view.

Considering a linear behavior of your loop you could do a plot of values vs. time. This may particularize the problem. It may also allow to establish a rejection/interpolation scheme.

What Happens When Queryperformancecounter Is Called