omm 1970-01-01 00:00

最近线上MongoDB服务总是异常停止。查看MongoDB的log也没发现什么异常。网上找了很久,别人都提到了是不是被oom kill了。

为什么会出现omm kill呢?

这通常是因为某时刻应用程序大量请求内存导致系统内存不足造成的,这通常会触发 Linux 内核里的 Out of Memory (OOM) killer,OOM killer 会杀掉某个进程以腾出内存留给系统用,不致于让系统立刻崩溃。

于是我查看system log,(ubuntu下面是/var/log/system.log) 发现了下面的关键log。

Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.758208] Out of memory: Kill process 26379 (tmux) score 6 or sacrifice child
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.758279] Killed process 26551 (bash) total-vm:23220kB, anon-rss:2768kB, file-rss:8kB
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859171] mongod invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859177] mongod cpuset=/ mems_allowed=0
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859180] Pid: 1067, comm: mongod Not tainted 3.2.0-67-generic #101-Ubuntu
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859183] Call Trace:
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859193]  [<ffffffff8111d381>] dump_header+0x91/0xe0
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859196]  [<ffffffff8111d735>] oom_kill_process+0x85/0xb0
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859199]  [<ffffffff8111dada>] out_of_memory+0xfa/0x220
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859203]  [<ffffffff811234fc>] __alloc_pages_nodemask+0x8dc/0x8f0
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859207]  [<ffffffff8115a8a6>] alloc_pages_current+0xb6/0x120
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859213]  [<ffffffff8104552b>] pte_alloc_one+0x1b/0x50
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859217]  [<ffffffff8113d952>] __pte_alloc+0x32/0x160
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859222]  [<ffffffff81053a92>] ? ttwu_queue+0x92/0xd0
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859225]  [<ffffffff81141556>] handle_mm_fault+0x356/0x370
Aug 11 03:48:24 iZ23syelc0bZ kernel: [4612204.859230]  [<ffffffff8166723e>] do_page_fault+0x17e/0x540

可以看到在03:52的时候mongod进程被kill掉了。

Aug 11 03:52:07 iZ23syelc0bZ kernel: [4612427.964874] Out of memory: Kill process 19716 (mongod) score 6 or sacrifice child
Aug 11 03:52:07 iZ23syelc0bZ kernel: [4612427.965110] Killed process 19716 (mongod) total-vm:135707920kB, anon-rss:103312kB, file-rss:0kB

下次服务异常停止,记得查看是否被系统kill了。

关于oom killer 可以参考这篇文章