Systemtap Script to Profile Latency of Functions

SystemTap script to profile latency of functions

The gettimeofday_*() functions can only approximate wallclock time. It is possible that across CPUs, or across a time adjustment moment, the values won't move monotonically the way you expect. get_cycles() is more monotonic on a given CPU, and a few other clock-related functions are available.

Also, your begin variable is a simple scalar. What if the same function is being called from multiple threads/cpus, or if recursion occurs? It'll get overwritten. This should be enough (and work correctly, from a nesting/concurrency point of view):

// no probe FOO.call
probe module(@1).function(@2).return {
stats <<< gettimeofday_ns() - @entry(gettimeofday_ns())
}

How do I profile F# script

I recommend using BenchmarkDotNet for any benchmarking tasks (well, micro-benchmarks). Since it's a statistical tool, it accounts for many things that hand-rolled benchmarking will not. And just by applying a few attributes you can get a nifty report.

Create a .NET Core console app, add the BenchmarkDotNet package, create a benchmark, and run it to see the results. Here's an example that tests two trivial parsing functions, with one as the baseline for comparison, and informing BenchmarkDotNet to capture memory usage stats when running the benchmark:

open System
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running

module Parsing =
/// "123,456" --> (123, 456)
let getNums (str: string) (delim: char) =
let idx = str.IndexOf(delim)
let first = Int32.Parse(str.Substring(0, idx))
let second = Int32.Parse(str.Substring(idx + 1))
first, second

/// "123,456" --> (123, 456)
let getNumsFaster (str: string) (delim: char) =
let sp = str.AsSpan()
let idx = sp.IndexOf(delim)
let first = Int32.Parse(sp.Slice(0, idx))
let second = Int32.Parse(sp.Slice(idx + 1))
struct(first, second)

[<MemoryDiagnoser>]
type ParsingBench() =
let str = "123,456"
let delim = ','

[<Benchmark(Baseline=true)>]
member __.GetNums() =
Parsing.getNums str delim |> ignore

[<Benchmark>]
member __.GetNumsFaster() =
Parsing.getNumsSpan str delim |> ignore

[<EntryPoint>]
let main _ =
let summary = BenchmarkRunner.Run<ParsingBench>()
printfn "%A" summary

0 // return an integer exit code

In this case, the results will show that the getNumsFaster function allocations 0 bytes and runs about 33% faster.

Once you've found something that consistently performs better and allocates less, you can transfer that over to a script or some other environment where the code will actually execute.

As for hotspots, your best tool is to actually run the script under a profiler like PerfView and look at CPU time and allocations caused by the script while it's executing. There's no simple answer here: interpreting profiling results correctly is challenging and time consuming work.

There's no way to compile an F# script to an executable for .NET Core. It's possible only on Windows/.NET Framework, but this is legacy behavior that is considered deprecated. It's recommended that you convert code in your script to an application if you'd like it to run as an executable.



Related Topics



Leave a reply



Submit