MDB-Cheats
Contents
MDB-Cheats
A collection of commands for kernel dump analyzing. There is a lot more you could do with mdb, perhaps I will update that page soon.
# cd /var/crash/`uname -n` # mdb 0
or on a running Solaris
# mdb -k /dev/ksyms
> ::msgbuf -> look at the message buffer > $<msgbuf -> also seen, sometimes the syntax doesn’t work > ::cpuinfo –v -> what’s running? > ::ps -> processes > ::ptree -> processes in a tree
A process panicked the machine?
> panic_thread/K
panic_thread:
panic_thread: 2a10001bd40
> 2a10001bd40$<thread
[...]
0x2a10001be50: lwp procp audit_data
0 1438788 0
[...]
> 1438788$<proc2u
[...]
0x30003e15971: psargs /usr/ccs/bin/sparcv9/nm /dev/ksyms\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0
[...]
Real memory consumption
> ::memstat Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 1947820 7608 47% ZFS File Data 1448500 5658 35% Anon 131224 512 3% Exec and libs 29826 116 1% Page cache 237040 925 6% Free (cachelist) 8618 33 0% Free (freelist) 339957 1327 8% Total 4142985 16183 Physical 4054870 15839 >
Change values
# mdb -kw Loading modules: [ unix krtld genunix ip logindmux ptm nfs ipc lofs ] > maxusers/D maxusers: maxusers: 495 > maxusers/W 200 maxusers: 0x1ef = 0x200 > $q
Do not forget how to quit the mdb ;)
> ::quit
Debugging core file
# echo ::status | mdb core
Solaris 11.4 example
who paniced my box? which files where opened? In that example I hit a bug where solaris could not handle to many procfs interests on the same target - so monitoring killed the server...
vmcore.0> $c
vpanic(10817d08, 2a13c710340, 0, 18403ca7b0dc0, 18403ca7b0dc1, 0)
turnstile_will_prio+0x210(18403ca7b0dc0, 1, 0, 18403a2f056c0, 31, 208c9c00)
turnstile_block+0x168(184039bf68c50, 0, 18403f73272b0, 2012d188, 0, 0)
mutex_vector_enter+0x3f0(4, 18403a2f056c0, 18403a2f056c0, 2012d208, 18403f73272b0, 209092e8)
holdf+4(18403f73272b0, 0, 0, 10, 18403a46563c0, 1)
flist_fork+0x12c(18403d1112198, 184042321d408, 1, 7fffffff, 1ff, 184042321c560)
cfork+0xa50(0, 1, 18403f4b244c0, 3, 1, 184042321c560)
spawn+0x1b0(ffffffff4e6d50d0, 18403f4b244c8, ac, 18403f4b244c0, 1224, 0)
syscall_trap+0x238(ffffffff4e6d50d0, ffffffff7f501100, ac, ffffffff7f504000, 1224, 1018d54ac)
vmcore.0>
vmcore.0> ::msgbuf
[...]
panic[cpu111]/thread=18403ca7b0dc0:
Deadlock: cycle in blocking chain
000002a131201530 genunix:turnstile_will_prio+210 (18403ca7b0dc0, 1, 0, 18403a2f056c0, 31, 208c9c00)
%l0-3: 0000000000000000 00000000208c9e48 0000000000000000 0000000000000000
%l4-7: 0000000000000000 0000000000000000 0000000000000000 00018403ca7b0dc0
000002a1312015e0 genunix:turnstile_block+168 (184039bf68c50, 0, 18403f73272b0, 2012d188, 0, 0)
%l0-3: 0000000000069286 0000000000000001 0000000000000001 00018403d11112f0
%l4-7: 00018403d643f160 00018403ca7b0dc0 000184039bf68c80 0000000000000001
000002a131201690 unix:mutex_vector_enter+3f0 (4, 18403a2f056c0, 18403a2f056c0, 2012d208, 18403f73272b0, 209092e8)
%l0-3: 00022ecb6ef08e34 0000000000000000 0000000000000000 000000002012d1f0
%l4-7: 00018403a2f056c0 0000000000000001 0000000000000000 0000000000000000
000002a131201740 genunix:holdf+4 (18403f73272b0, 0, 0, 10, 18403a46563c0, 1)
%l0-3: 000184038cc06500 ffffffffffffffff 0000000000000000 00000300000dc000
%l4-7: 00018404dfcdb148 0001840391137388 00018403faaf2150 0000000000000001
000002a1312017f0 genunix:flist_fork+12c (18403d1112198, 184042321d408, 1, 7fffffff, 1ff, 184042321c560)
%l0-3: 0000000000000001 0000000000000001 0000000000000000 0000000000000100
%l4-7: 00018403c5658000 00018403f73272b0 00018403a465a000 00000000000001ff
000002a1312018a0 genunix:cfork+a50 (0, 1, 18403f4b244c0, 3, 1, 184042321c560)
%l0-3: 00018403d11112f0 0000000000000000 00018404867ef2b0 000002a131201978
%l4-7: 000184038cc06500 00018403ca7b0dc0 00000000209ce978 00000000209ce930
000002a1312019c0 genunix:spawn+1b0 (ffffffff4e6d50d0, 18403f4b244c8, ac, 18403f4b244c0, 1224, 0)
%l0-3: 0000000000000002 000000007fffffff 0000000097c818a3 0000000097c818a2
%l4-7: 0000000097c96f17 0000000000000000 0001840455e61800 000184046431a000
syncing file systems...
done
vmcore.0> panic_thread/K
panic_thread:
panic_thread: 18403ca7b0dc0
vmcore.0> 18403ca7b0dc0$<thread ! grep procp
t_procp = 0x18403d11112f0
vmcore.0> 0x18403d11112f0$<proc2u ! grep psargs
p_user.u_psargs = [ "/opt/IBM/ITM/sol296/ux/bin/kuxagent" ]
vmcore.0> 0x18403d11112f0::pfiles
FD TYPE VNODE INFO
0 CHR 00018403530daa00 /zoneHome/z03a/root/dev/null
1 REG 00018403b90a2480 /zoneHome/z03a/root/opt/IBM/ITM/logs/z03a_ux_1571165952.log
2 REG 00018403b90a2480 /zoneHome/z03a/root/opt/IBM/ITM/logs/z03a_ux_1571165952.log
4 DOOR 00018403c0d9ab00 /zoneHome/z03a/root/system/volatile/name_service_door [door to 'nscd' (proc=18403c6119210)]
5 SOCK 000184036b8e7780 socket: AF_INET 0.0.0.0 17658 listen backlog: 8
6 SOCK 00018403b4fec400 socket: AF_INET 0.0.0.0 33005 listen backlog: 8
7 SOCK 00018403a49cd980 socket: AF_INET 127.0.0.1 62718 remote: AF_INET 127.0.0.1 1920
8 SOCK 00018403b5a08880 socket: AF_INET 127.0.0.1 14856 listen backlog: 49
9 SOCK 00018403b13c4e80 socket: AF_INET 127.0.0.1 22620 remote: AF_INET 127.0.0.1 3661
10 SOCK 00018403d14c9580 socket: AF_INET 127.0.0.1 14856 remote: AF_INET 127.0.0.1 13294
11 SOCK 00018403a4583240 socket: AF_INET 127.0.0.1 13294 remote: AF_INET 127.0.0.1 14856
12 SOCK 00018403afd68b80 socket: AF_UNIX remote: AF_?? (0)
13 SOCK 00018403ac6e4940 socket: AF_UNIX remote: AF_?? (0)
14 REG 000184036ad61a40 /zoneHome/z03a/root/opt/IBM/ITM/auditlogs/root.z03a_ux_audit.log
15 SOCK 00018403b4906640 socket: AF_UNIX /opt/IBM/ITM/sol296/ux/bin/pasipc/.pas_sock listen backlog: 16
16 CHR 00018403bafb8980 /zoneHome/z03a/root/dev/kstat
17 SOCK 0001840373fb8c00 socket: AF_INET 192.168.111.146 15797 remote: AF_INET 192.168.100.97 63358
18 SOCK 00018403cbe12780 socket: AF_INET 0.0.0.0 18302 listen backlog: 8
19 SOCK 0001840373eaaf00 socket: AF_INET 192.168.111.146 23827 remote: AF_INET 192.168.100.97 1918
20 SOCK 00018403a0cc9880 socket: AF_INET 127.0.0.1 14856 remote: AF_INET 127.0.0.1 16927
21 SOCK 00018403a1667700 socket: AF_INET 127.0.0.1 16927 remote: AF_INET 127.0.0.1 14856
22 REG 00018403ac48f240 /zoneHome/z03a/root/opt/IBM/ITM/logs/z03a_ux_kuxagent_5da61702-02.log
23 DIR 00018403b775fb80 /zoneHome/z03a/root/opt/IBM/ITM/tmp/osfcp
24 FIFO 00018403b31b3e80
25 DIR 00018403a2ee2b80 /zoneHome/z03a/root/proc
26 PROC 000184050cf70980 /zoneHome/z03a/root/proc/67521/psinfo (proc=184042fc42788)
27 FIFO 000184052c1df340
28 FIFO 00018403b5e37e80
256 DIR 00018403b8634780 /zoneHome/z03a/root/proc/67521
vmcore.0>
vmcore.0> ::ps ! grep 67521
R 67521 1 67521 67521 1001 0x4a004400 000184042fc42788 oracle
vmcore.0> ::ps -zf ! grep 67521
R 67521 1 67521 67521 1 1001 0x4a004400 000184042fc42788 oracleSID_1 (LOCAL=NO)
vmcore.0>
Some mdb commands
Kernel cage - all memory ended up in the Kernel bucket:
::fed -v
USER MN T TOTAL 8k 64k 4m 256m
credit 0 u - [-] [-] [-] [-]
fed 0 u - - [-] [-] -
total 0 u - - - - -
credit 1 u - [-] [-] [-] [-]
fed 1 u - [-] [-] [-] [-]
total 1 u - - - - -
credit 2 u - [-] [-] [-] [-]
fed 2 u - - [-] [-] [-]
total 2 u - - - - -
total u - - - - -
KCAGE MN T TOTAL 8k 64k 4m 256m
credit 0 k 12.6m 56 - 3 [-]
fed 0 k 317.8g 27b6b73 - 2b [-]
total 0 k 317.8g 27b6bc9 - 2e -
credit 1 k 21.7m 4f6 bd [-] [-]
fed 1 k 2.3g 16e44 880 17d [-]
total 1 k 2.3g 1733a 93d 17d -
credit 2 k 3.8m 1e9 - [-] [-]
fed 2 k 13.9g 1bd3f5 - 16 [-]
total 2 k 14.0g 1bd5de - 16 -
total k 334.2g 298b4e1 93d 1c1 - <---- TOTAL 334 GB!
reserve k 4.6g 94d71 - - -
rsrvhigh k 2.3g 4a6b8 - - -
In this case, ::memstat still reports plenty of available memory:
Page Summary Pages Bytes %Tot ---------------------------- ---------------- ---------------- ---- Kernel 5691438 43.4G 4% Kernel (ZFS ARC excess) 2452 19.1M 0% Defdump prealloc 2029989 15.4G 1% ZFS Kernel Data 2043300 15.5G 1% ZFS Metadata 376392 2.8G 0% ZFS File Data 1406002 10.7G 1% Anon 15523780 118.4G 10% Exec and libs 299223 2.2G 0% Page cache 1339587 10.2G 1% OSM 86447616 659.5G 54% Free (cachelist) 16284 127.2M 0% Free (freelist) 43814273 334.2G 28% Total 158990336 1.1T
Access to this memory is much slower because a user process requesting a page will need to first "uncage" this page. This will manifest itself in the random delays... That's a bug and should be fixed in 11.4+
You might see blocked threads in a waiting queue for memory (empty output is good)
echo "::walk fed_blocked |::stacks" | mdb -k
It should look like:
> ::fed -v
USER MN T TOTAL 8k 64k 4m 256m 2g 16g
credit 0 u 11.4m 519 14 - [-] [-] [-]
fed 0 u 5.4g 9ec51 61f 6b - - [-]
total 0 u 5.4g 9f16a 633 6b - - -
credit 1 u 25.0m 3ca 98 2 - [-] [-]
fed 1 u 5.2g a1a72 2ff 20 - - [-]
total 1 u 5.2g a1e3c 397 22 - - -
credit 2 u 34.4m 6a6 93 3 [-] [-] [-]
fed 2 u 903.6m 131de cbf 16 - [-] [-]
total 2 u 938.1m 13884 d52 19 - - -
total u 11.6g 15482a 171c a6 - - -
KCAGE MN T TOTAL 8k 64k 4m 256m 2g 16g
credit 0 k 22.6m 454 20 3 - - -
fed 0 k 3.3g 663d2 - 34 - - -
total 0 k 3.4g 66826 20 37 - - -
credit 1 k 12.6m 44e - 1 - - -
fed 1 k 3.9g 76a05 - 49 - - -
total 1 k 4.0g 76e53 - 4a - - -
credit 2 k 18.7m 558 1 2 - - -
fed 2 k 6.5g cd4dd - 2a - - -
total 2 k 6.5g cda35 1 2c - - -
total k 14.0g 1ab0ae 21 ad - - -
reserve k 4.7g 97a00 - - - - -
rsrvhigh k 2.3g 4bd00 - - - - -
>