Microsecond Resolution Timestamps on Windows

I believe this is still useful: System Internals: Guidelines For Providing Multimedia Timer Support.

It does a good job of explaining the various timers available and their limitations. It might be that your archenemy will not so much be resolution, but latency.

QueryPerformanceCounter will not always run at CPU speed. In fact, it might try to avoid RDTSC, especially on multi-processor(/multi-core) systems: it will use the HPET on Windows Vista and later if it is available or the ACPI/PM timer.
On my system (Windows 7 x64, dual core AMD) the timer runs at 14.31818 MHz.

The same is true for earlier systems:

By default, Windows Server 2003 Service Pack 2 (SP2) uses the PM timer for all multiprocessor APIC or ACPI HALs, unless the check process to determine whether the BIOS supports the APIC or ACPI HALs fails."

The problem is, when the check fails. This simply means that your computer/BIOS is broken in a way. Then you might either fix your BIOS (recommended), or at least switch to using the ACPI timer (/usepmtimer) for the time being.

It is easy from C# - without P/Invoke - to check for high-resolution timer support with Stopwatch.IsHighResolution and then peek at Stopwatch.Frequency. It will make the necessary QueryPerformanceCounter call internally.

Also consider that if the timers are broken, the whole system will go havoc and in general, behave strangely, reporting negative elapsed times, slowing down, etc. - not just your application.

This means that you can actually rely on QueryPerformanceCounter.

... and contrary to popular belief, QueryPerformanceFrequency() "cannot change while the system is running".

Edit: As the documentation on QueryPerformanceCounter() states, "it should not matter which processor is called" - and in fact the whole hacking around with thread affinity is only needed if the APIC/ACPI detection fails and the system resorts to using the TSC. It is a resort that should not happen. If it happens on older systems, there is likely a BIOS update/driver fix from the manufacturer. If there is none, the /usepmtimer boot switch is still there. If that fails as well, because the system does not have a proper timer apart from the Pentium TSC, you might in fact consider messing with thread affinity - even then, the sample provided by others in the "Community Content" area of the page is misleading as it has a non-negligible overhead due to setting thread affinity on every start/stop call - that introduces considerable latency and likely diminishes the benefits of using a high resolution timer in the first place.

Game Timing and Multicore Processors is a recommendation on how to use them properly. Please consider that it is now five years old, and at that time fewer systems were fully ACPI compliant/supported - that is why while bashing it, the article goes into so much detail about TSC and how to work around its limitations by keeping an affine thread.

I believe it is a fairly hard task nowadays to find a common PC with zero ACPI support and no usable PM timer. The most common case is probably BIOS settings, when ACPI support is incorrectly set (sometimes sadly by factory defaults).

Anecdotes tell that eight years ago, the situation was different in rare cases. (Makes a fun read, developers working around design "shortcomings" and bashing chip designers. To be fair, it might be the same way vice versa. :-)

Use QueryPerformanceCounter and QueryPerformanceFrequency for finest grain timing on Windows.

MSDN article on code timing with these APIs here (sample code is in VB - sorry).

Timers and timing is a tricky enough subject that In my opinion current cross platform implementations are not quite up to scratch. So I'd recommend a specific version for windows with appropriate #ifdef's. See other answers if you want a cross-platform version.

If you've got to/want to use a windows specific call then GetSystemTimeAsFileTime (or on windows 8 GetSystemTimePreciseAsFileTime) are the best calls for getting UTC time and QueryPerformanceCounter is good for high resolution timestamps. It gives back the number of 100-nanosecond intervals since January 1, 1601 UTC into a FILETIME structure.

This fine article goes into the gory details of measuring timers and timestamps in windows and is well worth a read.

EDIT: Converting a FILETIME to us, you need to go via a ULARGE_INTEGER.

li.LowPart = ft.dwLowDateTime;
li.HighPart = ft.dwHighDateTime;
unsigned long long valueAsHns = li.QuadPart;
unsigned long long valueAsUs = valueAsHns/10;

Update Aug. 2022:

In modern Python 3, import time followed by time.monotonic_ns() might be sufficient. See my new answer to this other question here: High-precision clock in Python. At the time of my answer in 2016 using Python 3.1 on a Raspberry Pi, that didn't exist.

See https://docs.python.org/3/library/time.html#time.monotonic_ns. This is new in Python 3.7. I haven't tested it myself yet though. Thanks for @HenrikMadsen for posting this in his answer here, which he since deleted, unfortunately.

I still need to test these new Python 3.7 and later functions to see if they are as good as what I have below.

So, try this first and compare it to what I have done below:

import time

time_ns = time.monotonic_ns()

You might also try time.clock_gettime_ns() on Unix or Linux systems. Based on its name, it appears to call the underlying clock_gettime() C function which I use in my nanos() function in C in my answer here and in my C Unix/Linux library here: timinglib.c.

Original answer in 2016:

Here's a fully-functional module for both Linux and Windows, and which is unique from all other answers here in that it works in pre-Python 3.3. All other answers there require Python 3.7 or later in most cases, and Python 3.3 or later in other cases. Again, my answer below works in Windows and Linux in any version of Python, going back at least as early as Python 3.0 or so, in case you need that (I can't remember if it works on Python 2.7 or not).

It uses the ctypes library to call C or C++ dynamic libraries in Python via .dll "dynamically linked library" files in Windows, or .so "shared object" library files in Unix or Linux.

Functions and code samples.

Functions include:

  • micros()
  • millis()
  • delay()
  • delayMicroseconds()

Download GS_timing.py from my eRCaGuy_PyTime repo, then do:

import GS_timing

time_ms = GS_timing.millis()
time_us = GS_timing.micros()
GS_timing.delay(10) # delay 10 ms
GS_timing.delayMicroseconds(10000) # delay 10000 us

Python code module (on GitHub as eRCaGuy_PyTime):

-create some low-level Arduino-like millis() (milliseconds) and micros()
(microseconds) timing functions for Python
By Gabriel Staples
-click "Contact me" at the top of my website to find my email address
Started: 11 July 2016
Updated: 13 Aug 2016

History (newest on top):
20160813 - v0.2.0 created - added Linux compatibility, using ctypes, so that it's compatible with pre-Python 3.3 (for Python 3.3 or later just use the built-in time functions for Linux, shown here: https://docs.python.org/3/library/time.html)
-ex: time.clock_gettime(time.CLOCK_MONOTONIC_RAW)
20160711 - v0.1.0 created - functions work for Windows *only* (via the QPC timer)

-personal (C++ code): GS_PCArduino.h
1) Acquiring high-resolution time stamps (Windows)
2) QueryPerformanceCounter function (Windows)
3) QueryPerformanceFrequency function (Windows)
4) LARGE_INTEGER union (Windows)




import ctypes, os

VERSION = '0.2.0'

#OS-specific low-level timing functions:
if (os.name=='nt'): #for Windows:
def micros():
"return a timestamp in microseconds (us)"
tics = ctypes.c_int64()
freq = ctypes.c_int64()

#get ticks on the internal ~2MHz QPC clock
#get the actual freq. of the internal ~2MHz QPC clock

t_us = tics.value*1e6/freq.value
return t_us

def millis():
"return a timestamp in milliseconds (ms)"
tics = ctypes.c_int64()
freq = ctypes.c_int64()

#get ticks on the internal ~2MHz QPC clock
#get the actual freq. of the internal ~2MHz QPC clock

t_ms = tics.value*1e3/freq.value
return t_ms

elif (os.name=='posix'): #for Linux:

CLOCK_MONOTONIC_RAW = 4 # see <linux/time.h> here: https://github.com/torvalds/linux/blob/master/include/uapi/linux/time.h

#prepare ctype timespec structure of {long, long}
class timespec(ctypes.Structure):
_fields_ =\
('tv_sec', ctypes.c_long),
('tv_nsec', ctypes.c_long)

#Configure Python access to the clock_gettime C library, via ctypes:
#-ctypes.CDLL: https://docs.python.org/3.2/library/ctypes.html
#-librt.so.1 with clock_gettime: https://docs.oracle.com/cd/E36784_01/html/E36873/librt-3lib.html #-
#-Linux clock_gettime(): http://linux.die.net/man/3/clock_gettime
librt = ctypes.CDLL('librt.so.1', use_errno=True)
clock_gettime = librt.clock_gettime
#specify input arguments and types to the C clock_gettime() function
# (int clock_ID, timespec* t)
clock_gettime.argtypes = [ctypes.c_int, ctypes.POINTER(timespec)]

def monotonic_time():
"return a timestamp in seconds (sec)"
t = timespec()
#(Note that clock_gettime() returns 0 for success, or -1 for failure, in
# which case errno is set appropriately)
#-see here: http://linux.die.net/man/3/clock_gettime
if clock_gettime(CLOCK_MONOTONIC_RAW , ctypes.pointer(t)) != 0:
#if clock_gettime() returns an error
errno_ = ctypes.get_errno()
raise OSError(errno_, os.strerror(errno_))
return t.tv_sec + t.tv_nsec*1e-9 #sec

def micros():
"return a timestamp in microseconds (us)"
return monotonic_time()*1e6 #us

def millis():
"return a timestamp in milliseconds (ms)"
return monotonic_time()*1e3 #ms

#Other timing functions:
def delay(delay_ms):
"delay for delay_ms milliseconds (ms)"
t_start = millis()
while (millis() - t_start < delay_ms):
pass #do nothing

def delayMicroseconds(delay_us):
"delay for delay_us microseconds (us)"
t_start = micros()
while (micros() - t_start < delay_us):
pass #do nothing

#Only executute this block of code if running this module directly,
#*not* if importing it
#-see here: http://effbot.org/pyfaq/tutor-what-is-if-name-main-for.htm
if __name__ == "__main__": #if running this module as a stand-alone program

#print loop execution time 100 times, using micros()
tStart = micros() #us
for x in range(0, 100):
tNow = micros() #us
dt = tNow - tStart #us; delta time
tStart = tNow #us; update
print("dt(us) = " + str(dt))

#print loop execution time 100 times, using millis()
tStart = millis() #ms
for x in range(0, 100):
tNow = millis() #ms
dt = tNow - tStart #ms; delta time
tStart = tNow #ms; update
print("dt(ms) = " + str(dt))

#print a counter once per second, for 5 seconds, using delay
for i in range(1,6):

#print a counter once per second, for 5 seconds, using delayMicroseconds
for i in range(1,6):

If you know how to get the above millisecond and microsecond-resolution timestamps in Linux, please post, as that would be very helpful too.

This works for Linux too, including in pre-Python 3.3, since I'm using C functions via the ctypes module in order to read the time stamps.

(Note: code above originally posted here: http://www.electricrcaircraftguy.com/2016/07/arduino-like-millisecond-and-microsecond-timestamps-in-python.html)

Special thanks to @ArminRonacher for his brilliant pre-Python 3.3 Linux answer here: https://stackoverflow.com/a/1205762/4561887

Timestamp and clock references:

  1. Windows: QueryPerformanceCounter(): https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter:

    Retrieves the current value of the performance counter, which is a high resolution (<1us) time stamp that can be used for time-interval measurements.

  2. Linux: clock_gettime(): https://man7.org/linux/man-pages/man3/clock_gettime.3.html (emphasis added):


    A nonsettable system-wide clock that represents monotonic time since—as described by POSIX—"some unspecified point in the past". On Linux, that point corresponds to the number of seconds that the system has been running since it was booted.

    CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)

    Similar to CLOCK_MONOTONIC, but provides access to a raw hardware-based time that is not subject to NTP adjustments or the incremental adjustments performed by adjtime(3). This clock does not count time that the system is suspended.

  3. Note that both clocks on both systems do NOT provide "wall clock" type timestamps. Rather, they both provide high-resolution (sub-microsecond) timestamps which generally count time since boot. These timestamps are useful for precision timing of events, producing repeatable, periodic loops, and measuring small time intervals in code, with great resolution, precision, and accuracy.

Update: prior to Python 3.3, the built-in Python time library (https://docs.python.org/3.5/library/time.html) didn't have any explicitly high-resolution functions. Now, however it does provide other options, including some high-resolution functions.

My module above, however, provides high-resolution timestamps for Python code before Python 3.3, as well as after, and it does so on both Linux and Windows.

Here's an example of what I mean, showing that the time.sleep() function is NOT necessarily a high-resolution function. On my Windows machine, it's resolution is perhaps 8ms at best, whereas my module above has 0.5us resolution (16000 times better!) on the same machine.

Code demonstration:

import time
import GS_timing as timing

def delayMicroseconds(n):
time.sleep(n / 1000000.)

def delayMillisecond(n):
time.sleep(n / 1000.)

t_start = 0
t_end = 0

#using time.sleep
print('using time.sleep')
for x in range(10):
t_start = timing.micros() #us
t_end = timing.micros() #us
print('dt (us) = ' + str(t_end - t_start))
for x in range(10):
t_start = timing.micros() #us
t_end = timing.micros() #us
print('dt (us) = ' + str(t_end - t_start))

#using GS_timing
print('\nusing GS_timing')
for x in range(10):
t_start = timing.micros() #us
t_end = timing.micros() #us
print('dt (us) = ' + str(t_end - t_start))
for x in range(10):
t_start = timing.micros() #us
t_end = timing.micros() #us
print('dt (us) = ' + str(t_end - t_start))

SAMPLE RESULTS ON MY WINDOWS 8.1 MACHINE (notice how much worse time.sleep does):

using time.sleep
dt (us) = 2872.059814453125
dt (us) = 886.3939208984375
dt (us) = 770.4649658203125
dt (us) = 1138.7698974609375
dt (us) = 1426.027099609375
dt (us) = 734.557861328125
dt (us) = 10617.233642578125
dt (us) = 9594.90576171875
dt (us) = 9155.299560546875
dt (us) = 9520.526611328125
dt (us) = 8799.3056640625
dt (us) = 9609.2685546875
dt (us) = 9679.5439453125
dt (us) = 9248.145263671875
dt (us) = 9389.721923828125
dt (us) = 9637.994262695312
dt (us) = 9616.450073242188
dt (us) = 9592.853881835938
dt (us) = 9465.639892578125
dt (us) = 7650.276611328125

using GS_timing
dt (us) = 53.3477783203125
dt (us) = 36.93310546875
dt (us) = 36.9329833984375
dt (us) = 34.8812255859375
dt (us) = 35.3941650390625
dt (us) = 40.010986328125
dt (us) = 38.4720458984375
dt (us) = 56.425537109375
dt (us) = 35.9072265625
dt (us) = 36.420166015625
dt (us) = 2039.526611328125
dt (us) = 2046.195068359375
dt (us) = 2033.8841552734375
dt (us) = 2037.4747314453125
dt (us) = 2032.34521484375
dt (us) = 2086.2059326171875
dt (us) = 2035.4229736328125
dt (us) = 2051.32470703125
dt (us) = 2040.03955078125
dt (us) = 2027.215576171875

SAMPLE RESULTS ON MY RASPBERRY PI VERSION 1 B+ (notice that the results between using time.sleep and my module are basically identical...apparently the low-level functions in time are already accessing better-resolution timers here, since it's a Linux machine (running Raspbian)...BUT in my GS_timing module I am explicitly calling the CLOCK_MONOTONIC_RAW timer. Who knows what's being used otherwise):

using time.sleep
dt (us) = 1022.0
dt (us) = 417.0
dt (us) = 407.0
dt (us) = 450.0
dt (us) = 2078.0
dt (us) = 393.0
dt (us) = 1297.0
dt (us) = 878.0
dt (us) = 1135.0
dt (us) = 2896.0
dt (us) = 2746.0
dt (us) = 2568.0
dt (us) = 2512.0
dt (us) = 2423.0
dt (us) = 2454.0
dt (us) = 2608.0
dt (us) = 2518.0
dt (us) = 2569.0
dt (us) = 2548.0
dt (us) = 2496.0

using GS_timing
dt (us) = 572.0
dt (us) = 673.0
dt (us) = 1084.0
dt (us) = 561.0
dt (us) = 728.0
dt (us) = 576.0
dt (us) = 556.0
dt (us) = 584.0
dt (us) = 576.0
dt (us) = 578.0
dt (us) = 2741.0
dt (us) = 2466.0
dt (us) = 2522.0
dt (us) = 2810.0
dt (us) = 2589.0
dt (us) = 2681.0
dt (us) = 2546.0
dt (us) = 3090.0
dt (us) = 2600.0
dt (us) = 2400.0


  1. My 3 sets of timestamp functions (cross-linked to each other):
    1. For C timestamps, see my answer here: Get a timestamp in C in microseconds?
    2. For C++ high-resolution timestamps, see my answer here: Getting an accurate execution time in C++ (micro seconds)
    3. For Python high-resolution timestamps, see my answer here: How can I get millisecond and microsecond-resolution timestamps in Python?
  2. My C and C++ Linux high-resolution timing library with millis(), micros(), nanos(), sleep_ns(), sleep_until_ns, use_realtime_scheduler(), get_estimated_resolution(), etc.
    1. timinglib.h
    2. timinglib.c
  3. [my answer for C and C++, including microcontrollers (or any other system)] How to do timestamp-based, non-blocking, single-threaded cooperative multi-tasking
  4. [my answer for C and C++, including microcontrollers and Arduino (or any other system)] Full coulomb counter example demonstrating the above concept with timestamp-based, single-threaded, cooperative multi-tasking
  5. [my answer for C and C++ in Linux--could be easily adapted to Python using the ctypes module, as shown above] How to run a high-resolution, high-precision periodic loop in Linux easily, at any frequency (ex: up to 10 KHz~100 KHz) using a soft real-time scheduler and nanosecond delays

Deleted article from CodeProject, this seems to be the copy: DateTimePrecise C# Class The idea is to use QueryPerformanceCounter API for accurate small increments and periodically adjust it in order to keep long term accuracy. This is about to give microsecond accuracy ("about" because it's still not exactly precise, but still quite usable).

GetSystemTimePreciseAsFileTime only became available with Windows 8 Desktop applications. It mimics Linuxes GetTimeOfDay. The implementation uses QueryPerformanceCounter to achieve the microsecond resolution. Timestamps are taken at the time of a system time increment. Subsequent calls to GetSystemTimePreciseAsFileTime will take the system time and add the elapsed "performance counter time" (elapsed ticks / performance counter frequency) as the high resolution part.

The functionallity of QueryPerformanceCounter again depends on platform specific details (HPET, ACPI PM timer, invariant TSC etc.). See MSDN: Acquiring high-resolution time stamps and SO: Is QueryPerformanceFrequency acurate when using HPET? for details.
The various versions of Windows do have specific schemes to update the system time. Windows XP has a fixed file time granularty which is independent of the systems timer resolution. Only post Windows XP versions allow to modify the system time granularity by changing the system timer resolution.

This can be accomplished by means of the multimedia timer API timeBeginPeriod and/or the hidden API NtSetTimerResolution (See this SO answer for more details about using `
timeBeginPeriod and NtSetTimerResolution).

As stated, GetSystemTimePreciseAsFileTime is only available for desktop applications. The reason for this is the need for specific hardware.

What I'm interested in is why Boost implemented it that way, when in turn there are possibly solutions that would be more fitting?

Taking the facts stated above will make the implementation very complex and the result very platform specific. Every (!) Windows version has undergone severe changes of time keeping. Even the latest small step from 8 to 8.1 has changed the time keeping procedure considerably. However, there is still room to further improve time matters on Windows.

I should mention that GetSystemTimePreciseAsFileTime is, as of Windows 8.1, not giving results as accurate as expected or specified at MSDN: GetSystemTimePreciseAsFileTime function. It combines the system file time with the result of QueryPerformanceCounter to fill the gap between consecutive file time increments but it does not take system time adjustments into account. An active system time adjustement, e.g. done by SetSystemTimeAdjustment, modifies the system time granularity and the progress of the system time. However, the used performance counter frequency to build the result of GetSystemTimePreciseAsFileTime is kept constant. As a result, the microseconds part is off by the adjustment gain set by SetSystemTimeAdjustment.

First, some functions:

// ==========================================================================
#define NOMINMAX
#define _AFXDLL
#include "afxwin.h" // TRACE
#include "windows.h" // ULARGE_INTEGER
#include "mmSystem.h" // timeGetTime
#pragma comment(lib, "Winmm.lib") // timeGetTime

// ==========================================================================
// (casting won't work on 64-bit platforms, due to alignment of FILETIME members)
inline void ToULL(const FILETIME& ft, ULONGLONG& uft)
uli.LowPart = ft.dwLowDateTime ;
uli.HighPart= ft.dwHighDateTime;
uft= uli.QuadPart;

// --------------------------------------------------------------------------
// (casting won't work on 64-bit platforms, due to alignment of FILETIME members)
inline void ToFILETIME(const ULONGLONG& uft, FILETIME& ft)
uli.QuadPart= uft;
ft.dwLowDateTime = uli.LowPart ;
ft.dwHighDateTime= uli.HighPart;

// --------------------------------------------------------------------------
// ULONGLONG version for GetSystemTimeAsFileTime
inline void GetSystemTimeAsULL(ULONGLONG& uft)
ToULL(ft, uft);

// --------------------------------------------------------------------------
// convert ULONGLONG to time-components
bool ULLToSystemTime(const ULONGLONG nTime , // [i]
WORD& nYear , // [o] 1601 - 30827
WORD& nMonth , // [o] 1 - 12
WORD& nDay , // [o] 1 - 31
WORD& nHour , // [o] 0 - 23
WORD& nMinute , // [o] 0 - 59
WORD& nSecond , // [o] 0 - 59
WORD& nMilliseconds ) // [o] 0 - 999
ToFILETIME(nTime, ft);

// the wDayOfWeek member of the SYSTEMTIME structure is ignored
if (0 == ::FileTimeToSystemTime(&ft, &sysTime))
return false;

nYear = sysTime.wYear ;
nMonth = sysTime.wMonth ;
nDay = sysTime.wDay ;
nHour = sysTime.wHour ;
nMinute = sysTime.wMinute ;
nSecond = sysTime.wSecond ;
nMilliseconds= sysTime.wMilliseconds;
return true;

// --------------------------------------------------------------------------
void TraceTime(const ULONGLONG nTime) // [i]
WORD nYear,nMonth,nDay,nHour,nMinute,nSecond,nMilliseconds;
ULLToSystemTime(nTime, nYear,nMonth,nDay,nHour,nMinute,nSecond,nMilliseconds);
TRACE("Time: %02u-%02u-%04u %02u:%02u:%02u.%03u\n", nDay,nMonth,nYear,nHour,nMinute,nSecond,nMilliseconds);

Now, how to use:


// wait for tick (each 14.4mS)
while (u0==u1);

DWORD d1= ::timeGetTime();

// d1 and u1 are now synchronized

// ... do some work

// get current time:
ULONGLONG u2= u1+(::timeGetTime() - d1)*10000; // mSec --> HectoNanoSec


Note that you should resync d1 and u1 once in 2-3 minutes to keep the accuracy.
Actually, you can measure the drift between the clocks to find the optimal resync interval.

If you have a threaded application running on a multicore computer QueryPerformanceCounter can (and will) return different values depending on which core the code is executing on. See this MSDN article. (rdtsc has the same problem)

This is not just a theoretical problem; we ran into it with our application and had to conclude that the only reliable time source is timeGetTime which only has ms precision (which fortunately was sufficient in our case). We also tried fixating the thread affinity for our threads to guarantee that each thread always got a consistent value from QueryPerformanceCounter, this worked but it absolutely killed the performance in the application.

To sum things up there isn't a reliable timer on windows that can be used to time thing with micro second precision (at least not when running on a multicore computer).

GetTickCount will not get it done for you.

Look into QueryPerformanceFrequency / QueryPerformanceCounter. The only gotcha here is CPU scaling though, so do your research.

