SystemTap script to profile latency of functions
The gettimeofday_*()
functions can only approximate wallclock time. It is possible that across CPUs, or across a time adjustment moment, the values won't move monotonically the way you expect. get_cycles()
is more monotonic on a given CPU, and a few other clock-related functions are available.
Also, your begin
variable is a simple scalar. What if the same function is being called from multiple threads/cpus, or if recursion occurs? It'll get overwritten. This should be enough (and work correctly, from a nesting/concurrency point of view):
// no probe FOO.call
probe module(@1).function(@2).return {
stats <<< gettimeofday_ns() - @entry(gettimeofday_ns())
}
How do I profile F# script
I recommend using BenchmarkDotNet for any benchmarking tasks (well, micro-benchmarks). Since it's a statistical tool, it accounts for many things that hand-rolled benchmarking will not. And just by applying a few attributes you can get a nifty report.
Create a .NET Core console app, add the BenchmarkDotNet package, create a benchmark, and run it to see the results. Here's an example that tests two trivial parsing functions, with one as the baseline for comparison, and informing BenchmarkDotNet to capture memory usage stats when running the benchmark:
open System
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
module Parsing =
/// "123,456" --> (123, 456)
let getNums (str: string) (delim: char) =
let idx = str.IndexOf(delim)
let first = Int32.Parse(str.Substring(0, idx))
let second = Int32.Parse(str.Substring(idx + 1))
first, second
/// "123,456" --> (123, 456)
let getNumsFaster (str: string) (delim: char) =
let sp = str.AsSpan()
let idx = sp.IndexOf(delim)
let first = Int32.Parse(sp.Slice(0, idx))
let second = Int32.Parse(sp.Slice(idx + 1))
struct(first, second)
[<MemoryDiagnoser>]
type ParsingBench() =
let str = "123,456"
let delim = ','
[<Benchmark(Baseline=true)>]
member __.GetNums() =
Parsing.getNums str delim |> ignore
[<Benchmark>]
member __.GetNumsFaster() =
Parsing.getNumsSpan str delim |> ignore
[<EntryPoint>]
let main _ =
let summary = BenchmarkRunner.Run<ParsingBench>()
printfn "%A" summary
0 // return an integer exit code
In this case, the results will show that the getNumsFaster
function allocations 0 bytes and runs about 33% faster.
Once you've found something that consistently performs better and allocates less, you can transfer that over to a script or some other environment where the code will actually execute.
As for hotspots, your best tool is to actually run the script under a profiler like PerfView and look at CPU time and allocations caused by the script while it's executing. There's no simple answer here: interpreting profiling results correctly is challenging and time consuming work.
There's no way to compile an F# script to an executable for .NET Core. It's possible only on Windows/.NET Framework, but this is legacy behavior that is considered deprecated. It's recommended that you convert code in your script to an application if you'd like it to run as an executable.
Related Topics
Placement of '-L' Option in Gcc
Sigbus While Doing Memcpy from Mmap Ed Buffer Which Is in Ram as Identified by Mincore
How to Avoid High CPU Usage While Reading/Writing Character Device
Notify-Send Command Doesn't Launch The Notification Through Systemd Service
How to Get Complete Stack Dump from Profiler in Every Sample for Use in Flame Graph
How to Open The Default Text Editor in Linux
X11 Forwarding Through Google Colab
What Algorithm How to Use to Generate a 48-Bit Hash for Unique MAC Addresses
Raspberry Pi: Spi Not Working, Spi_Bcm2835 Not Showing with Lsmod
How to Make Library Installed from Opam Available to Ocaml
Find Installation Path in Linux
Process Control Block in Linux
How to Do "For Each" on Output from Find
Producer Consumer Implementation in a Block Device Driver
How to Increase The Size of Ephemeral Storage in a Kubernetes Worker Node
Udp Server Giving Segmentation Fault
Capturing User-Space Assembly with Ftrace and Kprobes (By Using Virtual Address Translation)