MDB-Cheats
Contents
MDB-Cheats
A collection of commands for kernel dump analyzing. There is a lot more you could do with mdb, perhaps I will update that page soon.
# cd /var/crash/`uname -n` # mdb 0
or on a running Solaris
# mdb -k /dev/ksyms
> ::msgbuf -> look at the message buffer > $<msgbuf -> also seen, sometimes the syntax doesn’t work > ::cpuinfo –v -> what’s running? > ::ps -> processes > ::ptree -> processes in a tree
A process panicked the machine?
> panic_thread/K panic_thread: panic_thread: 2a10001bd40 > 2a10001bd40$<thread [...] 0x2a10001be50: lwp procp audit_data 0 1438788 0 [...] > 1438788$<proc2u [...] 0x30003e15971: psargs /usr/ccs/bin/sparcv9/nm /dev/ksyms\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0 [...]
Real memory consumption
> ::memstat Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 1947820 7608 47% ZFS File Data 1448500 5658 35% Anon 131224 512 3% Exec and libs 29826 116 1% Page cache 237040 925 6% Free (cachelist) 8618 33 0% Free (freelist) 339957 1327 8% Total 4142985 16183 Physical 4054870 15839 >
Change values
# mdb -kw Loading modules: [ unix krtld genunix ip logindmux ptm nfs ipc lofs ] > maxusers/D maxusers: maxusers: 495 > maxusers/W 200 maxusers: 0x1ef = 0x200 > $q
Do not forget how to quit the mdb ;)
> ::quit
Debugging core file
# echo ::status | mdb core
Solaris 11.4 example
who paniced my box? which files where opened? In that example I hit a bug where solaris could not handle to many procfs interests on the same target - so monitoring killed the server...
vmcore.0> $c vpanic(10817d08, 2a13c710340, 0, 18403ca7b0dc0, 18403ca7b0dc1, 0) turnstile_will_prio+0x210(18403ca7b0dc0, 1, 0, 18403a2f056c0, 31, 208c9c00) turnstile_block+0x168(184039bf68c50, 0, 18403f73272b0, 2012d188, 0, 0) mutex_vector_enter+0x3f0(4, 18403a2f056c0, 18403a2f056c0, 2012d208, 18403f73272b0, 209092e8) holdf+4(18403f73272b0, 0, 0, 10, 18403a46563c0, 1) flist_fork+0x12c(18403d1112198, 184042321d408, 1, 7fffffff, 1ff, 184042321c560) cfork+0xa50(0, 1, 18403f4b244c0, 3, 1, 184042321c560) spawn+0x1b0(ffffffff4e6d50d0, 18403f4b244c8, ac, 18403f4b244c0, 1224, 0) syscall_trap+0x238(ffffffff4e6d50d0, ffffffff7f501100, ac, ffffffff7f504000, 1224, 1018d54ac) vmcore.0> vmcore.0> ::msgbuf [...] panic[cpu111]/thread=18403ca7b0dc0: Deadlock: cycle in blocking chain 000002a131201530 genunix:turnstile_will_prio+210 (18403ca7b0dc0, 1, 0, 18403a2f056c0, 31, 208c9c00) %l0-3: 0000000000000000 00000000208c9e48 0000000000000000 0000000000000000 %l4-7: 0000000000000000 0000000000000000 0000000000000000 00018403ca7b0dc0 000002a1312015e0 genunix:turnstile_block+168 (184039bf68c50, 0, 18403f73272b0, 2012d188, 0, 0) %l0-3: 0000000000069286 0000000000000001 0000000000000001 00018403d11112f0 %l4-7: 00018403d643f160 00018403ca7b0dc0 000184039bf68c80 0000000000000001 000002a131201690 unix:mutex_vector_enter+3f0 (4, 18403a2f056c0, 18403a2f056c0, 2012d208, 18403f73272b0, 209092e8) %l0-3: 00022ecb6ef08e34 0000000000000000 0000000000000000 000000002012d1f0 %l4-7: 00018403a2f056c0 0000000000000001 0000000000000000 0000000000000000 000002a131201740 genunix:holdf+4 (18403f73272b0, 0, 0, 10, 18403a46563c0, 1) %l0-3: 000184038cc06500 ffffffffffffffff 0000000000000000 00000300000dc000 %l4-7: 00018404dfcdb148 0001840391137388 00018403faaf2150 0000000000000001 000002a1312017f0 genunix:flist_fork+12c (18403d1112198, 184042321d408, 1, 7fffffff, 1ff, 184042321c560) %l0-3: 0000000000000001 0000000000000001 0000000000000000 0000000000000100 %l4-7: 00018403c5658000 00018403f73272b0 00018403a465a000 00000000000001ff 000002a1312018a0 genunix:cfork+a50 (0, 1, 18403f4b244c0, 3, 1, 184042321c560) %l0-3: 00018403d11112f0 0000000000000000 00018404867ef2b0 000002a131201978 %l4-7: 000184038cc06500 00018403ca7b0dc0 00000000209ce978 00000000209ce930 000002a1312019c0 genunix:spawn+1b0 (ffffffff4e6d50d0, 18403f4b244c8, ac, 18403f4b244c0, 1224, 0) %l0-3: 0000000000000002 000000007fffffff 0000000097c818a3 0000000097c818a2 %l4-7: 0000000097c96f17 0000000000000000 0001840455e61800 000184046431a000 syncing file systems... done vmcore.0> panic_thread/K panic_thread: panic_thread: 18403ca7b0dc0 vmcore.0> 18403ca7b0dc0$<thread ! grep procp t_procp = 0x18403d11112f0 vmcore.0> 0x18403d11112f0$<proc2u ! grep psargs p_user.u_psargs = [ "/opt/IBM/ITM/sol296/ux/bin/kuxagent" ] vmcore.0> 0x18403d11112f0::pfiles FD TYPE VNODE INFO 0 CHR 00018403530daa00 /zoneHome/z03a/root/dev/null 1 REG 00018403b90a2480 /zoneHome/z03a/root/opt/IBM/ITM/logs/z03a_ux_1571165952.log 2 REG 00018403b90a2480 /zoneHome/z03a/root/opt/IBM/ITM/logs/z03a_ux_1571165952.log 4 DOOR 00018403c0d9ab00 /zoneHome/z03a/root/system/volatile/name_service_door [door to 'nscd' (proc=18403c6119210)] 5 SOCK 000184036b8e7780 socket: AF_INET 0.0.0.0 17658 listen backlog: 8 6 SOCK 00018403b4fec400 socket: AF_INET 0.0.0.0 33005 listen backlog: 8 7 SOCK 00018403a49cd980 socket: AF_INET 127.0.0.1 62718 remote: AF_INET 127.0.0.1 1920 8 SOCK 00018403b5a08880 socket: AF_INET 127.0.0.1 14856 listen backlog: 49 9 SOCK 00018403b13c4e80 socket: AF_INET 127.0.0.1 22620 remote: AF_INET 127.0.0.1 3661 10 SOCK 00018403d14c9580 socket: AF_INET 127.0.0.1 14856 remote: AF_INET 127.0.0.1 13294 11 SOCK 00018403a4583240 socket: AF_INET 127.0.0.1 13294 remote: AF_INET 127.0.0.1 14856 12 SOCK 00018403afd68b80 socket: AF_UNIX remote: AF_?? (0) 13 SOCK 00018403ac6e4940 socket: AF_UNIX remote: AF_?? (0) 14 REG 000184036ad61a40 /zoneHome/z03a/root/opt/IBM/ITM/auditlogs/root.z03a_ux_audit.log 15 SOCK 00018403b4906640 socket: AF_UNIX /opt/IBM/ITM/sol296/ux/bin/pasipc/.pas_sock listen backlog: 16 16 CHR 00018403bafb8980 /zoneHome/z03a/root/dev/kstat 17 SOCK 0001840373fb8c00 socket: AF_INET 192.168.111.146 15797 remote: AF_INET 192.168.100.97 63358 18 SOCK 00018403cbe12780 socket: AF_INET 0.0.0.0 18302 listen backlog: 8 19 SOCK 0001840373eaaf00 socket: AF_INET 192.168.111.146 23827 remote: AF_INET 192.168.100.97 1918 20 SOCK 00018403a0cc9880 socket: AF_INET 127.0.0.1 14856 remote: AF_INET 127.0.0.1 16927 21 SOCK 00018403a1667700 socket: AF_INET 127.0.0.1 16927 remote: AF_INET 127.0.0.1 14856 22 REG 00018403ac48f240 /zoneHome/z03a/root/opt/IBM/ITM/logs/z03a_ux_kuxagent_5da61702-02.log 23 DIR 00018403b775fb80 /zoneHome/z03a/root/opt/IBM/ITM/tmp/osfcp 24 FIFO 00018403b31b3e80 25 DIR 00018403a2ee2b80 /zoneHome/z03a/root/proc 26 PROC 000184050cf70980 /zoneHome/z03a/root/proc/67521/psinfo (proc=184042fc42788) 27 FIFO 000184052c1df340 28 FIFO 00018403b5e37e80 256 DIR 00018403b8634780 /zoneHome/z03a/root/proc/67521 vmcore.0> vmcore.0> ::ps ! grep 67521 R 67521 1 67521 67521 1001 0x4a004400 000184042fc42788 oracle vmcore.0> ::ps -zf ! grep 67521 R 67521 1 67521 67521 1 1001 0x4a004400 000184042fc42788 oracleSID_1 (LOCAL=NO) vmcore.0>
Some mdb commands
Kernel cage - all memory ended up in the Kernel bucket:
::fed -v USER MN T TOTAL 8k 64k 4m 256m credit 0 u - [-] [-] [-] [-] fed 0 u - - [-] [-] - total 0 u - - - - - credit 1 u - [-] [-] [-] [-] fed 1 u - [-] [-] [-] [-] total 1 u - - - - - credit 2 u - [-] [-] [-] [-] fed 2 u - - [-] [-] [-] total 2 u - - - - - total u - - - - - KCAGE MN T TOTAL 8k 64k 4m 256m credit 0 k 12.6m 56 - 3 [-] fed 0 k 317.8g 27b6b73 - 2b [-] total 0 k 317.8g 27b6bc9 - 2e - credit 1 k 21.7m 4f6 bd [-] [-] fed 1 k 2.3g 16e44 880 17d [-] total 1 k 2.3g 1733a 93d 17d - credit 2 k 3.8m 1e9 - [-] [-] fed 2 k 13.9g 1bd3f5 - 16 [-] total 2 k 14.0g 1bd5de - 16 - total k 334.2g 298b4e1 93d 1c1 - <---- TOTAL 334 GB! reserve k 4.6g 94d71 - - - rsrvhigh k 2.3g 4a6b8 - - -
In this case, ::memstat still reports plenty of available memory:
Page Summary Pages Bytes %Tot ---------------------------- ---------------- ---------------- ---- Kernel 5691438 43.4G 4% Kernel (ZFS ARC excess) 2452 19.1M 0% Defdump prealloc 2029989 15.4G 1% ZFS Kernel Data 2043300 15.5G 1% ZFS Metadata 376392 2.8G 0% ZFS File Data 1406002 10.7G 1% Anon 15523780 118.4G 10% Exec and libs 299223 2.2G 0% Page cache 1339587 10.2G 1% OSM 86447616 659.5G 54% Free (cachelist) 16284 127.2M 0% Free (freelist) 43814273 334.2G 28% Total 158990336 1.1T
Access to this memory is much slower because a user process requesting a page will need to first "uncage" this page. This will manifest itself in the random delays... That's a bug and should be fixed in 11.4+
You might see blocked threads in a waiting queue for memory (empty output is good)
echo "::walk fed_blocked |::stacks" | mdb -k
It should look like:
> ::fed -v USER MN T TOTAL 8k 64k 4m 256m 2g 16g credit 0 u 11.4m 519 14 - [-] [-] [-] fed 0 u 5.4g 9ec51 61f 6b - - [-] total 0 u 5.4g 9f16a 633 6b - - - credit 1 u 25.0m 3ca 98 2 - [-] [-] fed 1 u 5.2g a1a72 2ff 20 - - [-] total 1 u 5.2g a1e3c 397 22 - - - credit 2 u 34.4m 6a6 93 3 [-] [-] [-] fed 2 u 903.6m 131de cbf 16 - [-] [-] total 2 u 938.1m 13884 d52 19 - - - total u 11.6g 15482a 171c a6 - - - KCAGE MN T TOTAL 8k 64k 4m 256m 2g 16g credit 0 k 22.6m 454 20 3 - - - fed 0 k 3.3g 663d2 - 34 - - - total 0 k 3.4g 66826 20 37 - - - credit 1 k 12.6m 44e - 1 - - - fed 1 k 3.9g 76a05 - 49 - - - total 1 k 4.0g 76e53 - 4a - - - credit 2 k 18.7m 558 1 2 - - - fed 2 k 6.5g cd4dd - 2a - - - total 2 k 6.5g cda35 1 2c - - - total k 14.0g 1ab0ae 21 ad - - - reserve k 4.7g 97a00 - - - - - rsrvhigh k 2.3g 4bd00 - - - - - >