OOM Killer原理

时间:2015-08-19

最近线上一台机器的Redis进程频繁被系统自动Kill掉,syslog(/var/log/syslog)记录了当时的日志:

Aug 19 11:16:31 jsldg kernel: [4836881.950256] Out of memory: Kill process 11063 (redis-cli) score 538 or sacrifice child
Aug 19 11:16:31 jsldg kernel: [4836881.950338] Killed process 11063 (redis-cli) total-vm:4649480kB, anon-rss:3098176kB, file-rss:4kB

这段日志表示Redis触发了OOM(Out of Memory) Killer机制。

当系统资源不足时,Linux会调用out_of_memory函数,out_of_memory会选出系统中占用资源过大的进程并kill掉,其中out_of_memory会调用select_bad_process函数,选出占用资源最大的进程。

OOM Kill机制相关代码在/mm/oom_kill.c:

p = select_bad_process(&points, totalpages, mpol_mask, force_kill);
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!p) {
	dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
	panic("Out of memory and no killable processes...\n");
}
if (p != (void *)-1UL) {
	oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,
			 nodemask, "Out of memory");
	killed = 1;

当找到占用资源最大的进程后,函数调用oom_kill_process来结束进程。

select_bad_process函数负责选举要被kill的进程,首先遍历进程,然后找出资源占用最大的进程。实现如下:

static struct task_struct *select_bad_process(unsigned int *ppoints,
		unsigned long totalpages, const nodemask_t *nodemask,
		bool force_kill)
{
	struct task_struct *g, *p;
	struct task_struct *chosen = NULL;
	unsigned long chosen_points = 0;

	rcu_read_lock();
	for_each_process_thread(g, p) {
		unsigned int points;

		switch (oom_scan_process_thread(p, totalpages, nodemask,
						force_kill)) {
		case OOM_SCAN_SELECT:
			chosen = p;
			chosen_points = ULONG_MAX;
			/* fall through */
		case OOM_SCAN_CONTINUE:
			continue;
		case OOM_SCAN_ABORT:
			rcu_read_unlock();
			return (struct task_struct *)(-1UL);
		case OOM_SCAN_OK:
			break;
		};
		points = oom_badness(p, NULL, nodemask, totalpages);
		if (!points || points < chosen_points)
			continue;
		/* Prefer thread group leaders for display purposes */
		if (points == chosen_points && thread_group_leader(chosen))
			continue;

		chosen = p;
		chosen_points = points;
	}
	if (chosen)
		get_task_struct(chosen);
	rcu_read_unlock();

	*ppoints = chosen_points * 1000 / totalpages;
	return chosen;
}

其中调用了oom_badness函数来给每个进程打分,根据point的分数大小决定哪个进程被杀。