Where Does Output of Print in Kernel Go

Where does output of print in kernel go?

It depends on the distribution, but many use klogd(8) to get the messages from the kernel and will either log them to a file (sometimes /var/log/dmesg or /var/log/kernel) or to the system log via syslog(3). In the latter case, where the log entries end up will depend on the configuration of syslogd(8).

One note about the dmesg command: Kernel messages are stored in a circular buffer, so large amounts of output will be overwritten.

Where does printk() print to?

dmesg should display printk messages.

Unusual kernel behaviour when outputting directly to text video memory

I was missing the fact that I need to compile 32-bit code delicately on a 64-bit system. I solved the problem by adding -m32 and -fno-PIC flags to GCC, -melf_i386 to LD, and changing elf64 to elf32 in NASM in my Makefile.

What does DebugView's Enable Verbose Kernel Output do?

TL;DR: it calls a driver to repeatedly call NtSetDebugFilterState on all kernel components, so that they are all able to print something on the debug output.

Program

Let start with the program itself; there's only one occurrence of the sentence "Enable Verbose Kernel Output":

mov     [rsp+78h+mi.wID], 9C7Ch
lea rax, aEnableVerboseK ; "Enable &Verbose Kernel Output"
sbb ecx, ecx
mov [rsp+78h+mi.dwTypeData], rax
and ecx, 8
mov [rsp+78h+mi.fState], ecx
mov rcx, cs:hMenu ; hMenu
call cs:GetSubMenu
mov rcx, rax ; hmenu
lea r9, [rsp+78h+mi] ; lpmi
lea edx, [rdi+3] ; item
lea r8d, [rdi+1] ; fByPosition
call cs:InsertMenuItemA

The above code insert the sub-menu into the main menu. What's important here is the the menu ID, namely 0x9C7C.

This menu ID is used only once more here:

movzx   edx, al ; al can either be 0 or 1
xor edi, edi
mov qword ptr [rsp+830h+iNumButtons], rdi ; lpOverlapped
lea rax, [rsp+830h+BytesReturned]
mov [rsp+830h+lpButtons], rax ; lpBytesReturned
xor edx, 1
mov dword ptr [rsp+830h+wBMID], edi ; nOutBufferSize
xor r9d, r9d ; nInBufferSize
xor r8d, r8d ; lpInBuffer
mov [rsp+830h+dwInitParam], rdi ; lpOutBuffer
lea edx, ds:0FFFFFFFF8305003Ch[rdx*4] ; dwIoControlCode
call cs:DeviceIoControl
movzx eax, cs:byte_1400935A3
mov edx, 9C7Ch ; uIDCheckItem
mov rcx, cs:hMenu ; hMenu
mov cs:byte_1400A2776, al
neg al
sbb r8d, r8d
and r8d, 8 ; uCheck
call cs:CheckMenuItem

The above code calls DeviceIoControl and then checks the menu item. The former means the program is actually talking with a device driver.

If we remove a bit of code we can see which IOCTL can be sent to the driver:

movzx   edx, al ; al can either be 0 or 1
; snip
xor edx, 1 ; invert AL
; snip
lea edx, ds:0FFFFFFFF8305003Ch[rdx*4] ; dwIoControlCode
call cs:DeviceIoControl

Since RDX can be either 0 or 1 we end up with (base 10):

  • [rdx*4-2096824260]

Thus:

  • 4 - 2096824260 = -2096824256
  • 0 - 2096824260 = -2096824260

Looking at the handles opened by dbgview64.exe we can see a \Device\dbgv is currently opened.

0: kd> !devobj \Device\dbgv
Device object (ffffd58a97007630) is for:
Dbgv \Driver\DBGV DriverObject ffffd58a8688aaa0
Current Irp 00000000 RefCount 0 Type 00008305 Flags 00000048
SecurityDescriptor ffffe58fb8bdeea0 DevExt 00000000 DevObjExt ffffd58a97007780
ExtensionFlags (0x00000800) DOE_DEFAULT_SD_PRESENT
Characteristics (0000000000)
Device queue is not busy.

0: kd> dt _driver_object ffffd58a8688aaa0
nt!_DRIVER_OBJECT
+0x000 Type : 0n4
+0x002 Size : 0n336
+0x008 DeviceObject : 0xffffd58a`97007630 _DEVICE_OBJECT
+0x010 Flags : 0x12
+0x018 DriverStart : 0xfffff800`dcf90000 Void
+0x020 DriverSize : 0x9000
+0x028 DriverSection : 0xffffd58a`a3ba9be0 Void
+0x030 DriverExtension : 0xffffd58a`8688abf0 _DRIVER_EXTENSION
+0x038 DriverName : _UNICODE_STRING "\Driver\DBGV"
+0x048 HardwareDatabase : 0xfffff800`8372e990 _UNICODE_STRING "\REGISTRY\MACHINE\HARDWARE\DESCRIPTION\SYSTEM"
+0x050 FastIoDispatch : (null)
+0x058 DriverInit : 0xfffff800`dcf97058 long +0
+0x060 DriverStartIo : (null)
+0x068 DriverUnload : (null)
+0x070 MajorFunction : [28] 0xfffff800`dcf91b80 long +0

0: kd> dt nt!_LDR_DATA_TABLE_ENTRY 0xffffd58a`a3ba9be0 Full*
+0x048 FullDllName : _UNICODE_STRING "\??\C:\WINDOWS\system32\Drivers\Dbgv.sys"

So the driver is currently loaded from C:\WINDOWS\system32\Drivers\Dbgv.sys (or you can extract it from the .rsrc section...).

Driver

Looking at the driver, in the driver entry we spot the function used for IRP_MJ_DEVICE_CONTROL:

lea     rax, sub_180001B80
mov [rdi+0E0h], rax ; IRP_MJ_DEVICE_CONTROL
mov [rdi+80h], rax
mov [rdi+70h], rax

Inside that function we have the usual setup before calling the right IOCTL:

movzx   eax, [rcx+_IO_STACK_LOCATION.MajorFunction]
mov r9d, [rcx+_IO_STACK_LOCATION.Parameters.DeviceIoControl.OutputBufferLength]
mov r10d, [rcx+_IO_STACK_LOCATION.Parameters.DeviceIoControl.IoControlCode]
test al, al ; IRP_MJ_CREATE
jz loc_180001C6C
cmp al, 2 ; IRP_MJ_CLOSE
jz short loc_180001C0C
cmp al, 0Eh ; IRP_MJ_DEVICE_CONTROL
jnz @@CompleteRequest
mov eax, r10d
and eax, 3
cmp al, METHOD_NEITHER
jnz short loc_180001BDF
mov rdx, [rdi+_IRP.UserBuffer]

loc_180001BDF:
mov [rsp+98h+do], r11 ; _DEVICE_OBJECT*
mov [rsp+98h+IoStatus], rbx ; IoStatus
mov [rsp+98h+ioctl], r10d ; IoCtl
mov [rsp+98h+OutputBufferLength], r9d ; OuputBufferLength
mov r9d, [rcx+_IO_STACK_LOCATION.Parameters.DeviceIoControl.InputBufferLength] ; int
mov rcx, [rcx+_IO_STACK_LOCATION.FileObject]
mov qword ptr [rsp+98h+Buffer], rdx ; Buffer
mov dl, 1 ; int
call sub_1800017E0
jmp @@CompleteRequest

Inside the call (sub_1800017E0) we have a big switch for the IOCTL, here's the case -2096824260 (case -2096824256 is slightly different):

loc_1800018B9:
call sub_180002470 ; jumptable 000000018000182F case -2096824260
jmp loc_180001AEB

This function is mostly comprised of two loops:

loc_1800024A0:
xor ebx, ebx

@@LoopQuerySetDebugFilter:
mov edx, ebx
mov ecx, esi
call cs:qword_180005438 ; DbgQueryDebugFilterState
mov r8b, 1 ; State
mov edx, ebx ; Level (keeps incrementing up to 0x1E)
mov ecx, esi ; ComponentId (keeps incrementing up to 0x82)
mov [rdi], al ; save current state.
call cs:qword_180005440 ; DbgSetDebugFilterState
inc ebx
inc rdi
cmp ebx, 1Eh
jb short @@LoopQuerySetDebugFilter
inc esi
cmp esi, 82h ; '‚'
jb short loc_1800024A0

Both calls are on DbgQueryDebugFilterState and DbgSetDebugFilterState (reactos source)
which is just a minimal wrapper around NtSetDebugFilterState (reactos source).

As far as we can see the debug filter state is queried, saved, and then set for all kernel components (following is the component tables from the kernel, there are a lot of them):

.rdata:00000001400073E0 KdComponentTable dq offset Kd_SYSTEM_Mask
.rdata:00000001400073E0 ; DATA XREF: NtQueryDebugFilterState+36↓o
.rdata:00000001400073E0 ; NtSetDebugFilterState+43↓o ...
.rdata:00000001400073E8 dq offset Kd_SMSS_Mask
.rdata:00000001400073F0 dq offset Kd_SETUP_Mask
.rdata:00000001400073F8 dq offset Kd_NTFS_Mask
.rdata:0000000140007400 dq offset Kd_FSTUB_Mask
.rdata:0000000140007408 dq offset Kd_CRASHDUMP_Mask
.rdata:0000000140007410 dq offset Kd_CDAUDIO_Mask
.rdata:0000000140007418 dq offset Kd_CDROM_Mask
.rdata:0000000140007420 dq offset Kd_CLASSPNP_Mask
....

Which finally means that all kernel components are able to print something to the debug output.

Note that the other IOCTL just reset the components masks to what they were before checking the menu in the main program.

CUDA kernel printf() produces no output in terminal, works in profiler

You are implicitly, and mistakenly, assuming a certain order of occurrences when main() is done. Specifically, you're assuming that because the default stream is synchronous, everything having to do with your kernel is over and done with by the time the next line of code after the kernel launch gets executed. That is not 100% true - as @RobertCrovella suggests; specifically, it's not guaranteed that the device's printf() buffer will be ferried back into host memory and dumped into the standard output stream before control returns to your program.

You will need to synchronize the (default, current) CUDA device with the host, i.e. execute:

cuda::device::current::get().synchronize();

or at least synchronize the device's default stream:

cuda::device::current::get().default_stream().synchronize();

and this will ensure the printf() results make it to standard output.

Now, nvvp instruments your execution in some way (probably just by having the profiler running - but nvprof instruments the execution through the hooks which are the CUDA runtime API calls). So, the behavior is different when you run your program that way.


Somewhat-related question: The behavior of stream 0 (default) and other streams .

How to view print messages in Windbg when Windbg is in local kernel debugging mode?

obviously you need a two computer setup

local kernel debugging is similar to a memory snapshot it is somewhat stale or maybe intermittantly refreshed

you can try using a vm running inside a physical machine which you can kernel debug over network or serial VMware player does the job most of the time

Edit:

you cannot attach in kernel mode (attach is a term normally associated with debugging an already running user mode process )
under kernel mode debugging ([two computer] ,[ vm,vm ] , [physical,vm], [physical hypervisor] ) you can inspect state and set breakpoints on a live
kernel module

a critical kernel module like ntos , hal etc is associated with all the processes

Local kernel debugging does not do anything live

it is like a specimen testing on a pathological lab

(like a syringe full of blood to diagnose if there is a strain of some pandemical organisms you cannot treat the blood you need the physical person to treat the diagnosed problem)

you cannot step ,

you cannot view registers,

you cannot run

you cannot pagein

you cannot set breakpoints

I can't get output numbers with ctypes cuda

cudaMemcpy is expecting pointers for dst and src.

In your function resoult, h_answer is a pointer to a long allocated by the caller.

Since it's already the pointer where the data should be copied to, you should use it as is and not take it's address by using &h_answer.

Therefore you need to change your cudaMemcpy from:

cudaMemcpy(&h_answer, d_answer, sizeof(long), cudaMemcpyDeviceToHost);

To:

cudaMemcpy(h_answer, d_answer, sizeof(long), cudaMemcpyDeviceToHost);

DbgView on Vmware Workstation 12 not capturing output

By default DbgPrint() outputs are disabled from Vista/Server2008 onwards. You may have to do below settings in registry to enable the debug prints.

  1. Open up the registry.
  2. Go to path, “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Debug Print Filter”. If "Debug Print Filter" is not present then create it.
  3. Add value “DEFAULT” : REG_DWORD : 0xFFFFFFFF and then reboot.

After rebooting you will be able to get debug prints.

My code caused the kernel to restart. Why is the kernel restarting?


>    def countdown(n):
> for n in range(0,5):
> print(n)
> countdown(n-1)
> countdown(2)

In your code above, each function call will call itself recursively 5 times. So the first call is 5, second call there will be 25 calls, third call 125 calls, and the recursive calls went exponentially high, causing the Kernel to restart.

If you use recursive function, there must be an exit condition. There are a few ways to achieve your goal:

(1) recursive with if-condition exit (this is your successful code)

def countdown(n):
if n == 0:
print("blast off!")
else:
print(n)
countdown(n-1)
countdown(5)

(2) recursive with while-condition exit

def countdown(n):
while n != 0:
print(n)
countdown(n-1)
return
print("blast off!")
countdown(5)

(3) for loop (no need recursive)

def countdown(n):
for i in range(n, 0, -1):
print(i)
print("blast off!")
countdown(5)

Output:

5
4
3
2
1
blast off!


Related Topics



Leave a reply



Submit