Call graphs for CUDA
You can do this with the CUDA support of clang 3.8.
First, compile your CUDA code to emit llvm (example on Windows with CUDA 7.5 installed):
clang++ -c main.cu --cuda-gpu-arch=sm_35 -o main.ll -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include"
Then, use the generated ll to build the callgraph with opt
:
opt.exe main.ll -analyze -dot-callgraph
Note that opt
is not part of the default binary distribution, you may need to build it yourself (I had a 3.7.1 build and it has been able to manage the ll from 3.8).
Example main.cu file:
#include <cuda_runtime.h>
__device__ int f() { return 1; }
__device__ float g(float* a) { return a[f()] ; }
__device__ float h() { return 42.0f ; }
__global__ void kernel (int a, float* b)
{
int c = a + f();
g(b);
b[c] = h();
}
Generated dot file:
digraph "Call graph" {
label="Call graph";
Node0x1e3d438 [shape=record,label="{external node}"];
Node0x1e3d438 -> Node0x1e3cfb0;
Node0x1e3d438 -> Node0x1e3ce48;
Node0x1e3d438 -> Node0x1e3d0a0;
Node0x1e3d438 -> Node0x1e3d258;
Node0x1e3d438 -> Node0x1e3cfd8;
Node0x1e3d438 -> Node0x1e3ce98;
Node0x1e3d438 -> Node0x1e3d000;
Node0x1e3d438 -> Node0x1e3cee8;
Node0x1e3d438 -> Node0x1e3d078;
Node0x1e3d000 [shape=record,label="{__cuda_module_ctor}"];
Node0x1e3d000 -> Node0x1e3ce98;
Node0x1e3d000 -> Node0x1e3d168;
Node0x1e3d078 [shape=record,label="{__cuda_module_dtor}"];
Node0x1e3d078 -> Node0x1e3cee8;
Node0x1e3cfb0 [shape=record,label="{^A?f@@YAHXZ}"];
Node0x1e3d0a0 [shape=record,label="{^A?h@@YAMXZ}"];
Node0x1e3ce48 [shape=record,label="{^A?g@@YAMPEAM@Z}"];
Node0x1e3ce48 -> Node0x1e3cfb0;
Node0x1e3d258 [shape=record,label="{^A?kernel@@YAXHPEAM@Z}"];
Node0x1e3d258 -> Node0x1e3cfb0;
Node0x1e3d258 -> Node0x1e3ce48;
Node0x1e3d258 -> Node0x1e3d0a0;
Node0x1e3d168 [shape=record,label="{__cuda_register_kernels}"];
Node0x1e3cee8 [shape=record,label="{__cudaUnregisterFatBinary}"];
Node0x1e3cee8 -> Node0x1e3d528;
Node0x1e3cfd8 [shape=record,label="{__cudaRegisterFunction}"];
Node0x1e3cfd8 -> Node0x1e3d528;
Node0x1e3ce98 [shape=record,label="{__cudaRegisterFatBinary}"];
Node0x1e3ce98 -> Node0x1e3d528;
}
Generating dynamic call graph of c/c++ programs on the run
I've used etrace to trace executions of programs.
Egypt does the same thing, but only has a limited support for C++.
Both requires the program to be instrumented with gcc.
Faster backtrace for call graph generation?
Is there a much faster way to generate call graphs?
Of course there is (using GDB for this is completely inappropriate).
The simplest solution is to use GCC -finstrument-functions
to insert a call at every function entry and exit, and implement the data collection in these "injected" functions. There is an example here.
Related Topics
Dreaming of Making My Own Os- What Should I Use? (Suggestions)
Where The Structure "Struct Page" Is Stored on The Linux Kernel
Sed Command Works on Linux, But Not on Os X
Reset Bash History Search Position
Vagrant, Shared Folder: Take Advantage of Inotify Over Nfs
How to Make Linux Ignore a Keyboard While Keeping It Available for My Program to Read
How to Get The Output of at Command in Current or Another Terminal Window
How to Distinguish Between Different Operating System Distros in Node.Js
Getting "Permission Denied" on Dirname and Basename
Linux: How to Check The Largest Contiguous Address Range Available to a Process
Do Here-Strings Undergo Word-Splitting
How to Get a Faster Output Pipe Than /Dev/Null
Docker Run Groupadd && Useradd Directives Have No Effect