Linux 内存安全机制汇总

1. 0x01 应用层
2. 0x02 checksec 原理
3. 0x03 内核
4. 0x04 资源推荐

操作系统要让程序健壮地运行，避免受到漏洞攻击，需要从系统内核、应用层、编译器这三个角度来为程序提供基础安全，所以这里不区分安全机制是由内核提供的还是编译器提供的，因为它们存在的目的始终都是为了让程序在内存中安全地运行。

1. 0x01 应用层

1.1. ASLR

ASLR（Address Space Layout Randomization，地址布局随机化），一种缓解缓冲区溢出的保护机制，可使程序运行时的内存地址给随机化。

下面这段 C 代码可以查看是否开启了 ASLR：

#include <stdio.h>

int main(void) {
  int *p;
  p = (int*)&p + sizeof(int);
  printf("%x\n", p);
  return 0;
}

反复运行编译后的程序，如果每次输出的内存地址不同，说明开启了 ASLR。是否开启随机地址布局取决于内核变量 randomize_va_space，它定义在内核代码 mm/memory.c 中：

int randomize_va_space __read_mostly =
#ifdef CONFIG_COMPAT_BRK
                                        1;
#else
                                        2;
#endif

为什么这里要用变量而不是宏定义呢——因为这个值是写到 /proc/sys/kernel/randomize_va_space 中的，是可以修改的，如果用宏就无法修改了。

紧接着变量下面，是禁用函数的定义：

static int __init disable_randmaps(char *s)
{
        randomize_va_space = 0;
        return 1;
}
__setup("norandmaps", disable_randmaps);

比如 arch_align_stack（arch/x86/kernel/process.c）使用 ASLR：

unsigned long arch_align_stack(unsigned long sp)
{
        if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
                sp -= get_random_int() % 8192;
        return sp & ~0xf;
}

1.1.1. 禁用和启用 ASLR

执行下面这条命令就可以关闭 ASLR：

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

或：

sudo /sbin/sysctl -w kernel.randomize_va_space=0

取值说明如下：

值	说明
0	No randomization. Everything is static.
1	Conservative randomization. Shared libraries, stack, mmap(), VDSO and heap are randomized.
2	Full randomization. In addition to elements listed in the previous point, memory managed through brk() is also randomized.

1.2. Stack Canary

Canary 这个术语来源于矿工利用金丝雀来确认是否有气体泄漏，如果金丝雀中毒死了，表明有气体泄露；在编译时，会在栈中插入 Canary 值，当栈结束时会检查 Canary 是否被修改，若是被修改就表明栈被溢出了。这个安全机制最初来自 StackShield（1997 年发布，详细见：https://marc.info/?m=88255929032288）。

从 GCC 4.1 开始已内置，名为 GCC SSP（Stack Smashing Protector）。启用栈保护只需添加如下参数：

-fstack-protector-all

关闭保护：

-fno-stack-protector

当启用栈保护后，GCC 会修改栈的组织（包括局部变量的顺序），在栈下面加上 canary word，当函数返回之前，判断 canary word 是否被覆盖，如果被覆盖了就跳转到 __stack_chk_fail 函数，这个函数是 GCC 定义的，__stack_chk_fail 函数定义在 libssp/ssp.c 里：

void
__stack_chk_fail (void)
{
  const char *msg = "*** stack smashing detected ***: ";
  fail (msg, strlen (msg), "stack smashing detected: terminated");
}

加上栈保护后的反汇编代码：

0x0000000000400566 <+0>:     push   %rbp
0x0000000000400567 <+1>:     mov    %rsp,%rbp
0x000000000040056a <+4>:     sub    $0x10,%rsp
0x000000000040056e <+8>:     mov    %fs:0x28,%rax
0x0000000000400577 <+17>:    mov    %rax,-0x8(%rbp) ; 压栈的第一元素就是 canary word
0x000000000040057b <+21>:    xor    %eax,%eax
0x000000000040057d <+23>:    movb   $0x61,-0x10(%rbp)
0x0000000000400581 <+27>:    mov    $0x0,%eax
0x0000000000400586 <+32>:    mov    -0x8(%rbp),%rdx
0x000000000040058a <+36>:    xor    %fs:0x28,%rdx
0x0000000000400593 <+45>:    je     0x40059a <main+52> ; 这里判断是否溢出
0x0000000000400595 <+47>:    callq  0x400440 <__stack_chk_fail@plt>
0x000000000040059a <+52>:    leaveq
0x000000000040059b <+53>:    retq

触发异常将提示：

$ ./a.out 12345534902039092391900000000000000000000000000000000000000000000000234
*** stack smashing detected ***: ./a.out terminated
zsh: segmentation fault (core dumped)  ./a.out

1.3. NX

W^X（Writable xor eXecutable）机制，最早实现于 OpenBSD 中，后来 Linux 有了 NX（Non-Executable Memory，不可执行内存），而 Windows 中也有 DEP（Data Execution Prevention）机制，它们都在内存段中设置“不可执行”标记位，当 shellcode 填充到这段内存空间后，如果程序尝试执行这段内存中的指令，会被抛出异常。

GCC 编译后默认栈中不可执行代码：

$ readelf -l a.out| fgrep stack -i -A1
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10

上面栈信息中，标志位“RW”表示栈可读、可写，但不可执行。

如果想关闭 DEP、让栈可执行指令，有两个办法：

方法1：gcc 编译时指定：

$ gcc hello.c -z execstack

方法2：使用 execstack 命令：

$ execstack -s a.out

一旦允许栈执行之后，就可在栈的标志位看到“E”：

GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
               0x0000000000000000 0x0000000000000000  RWE    10

1.4. Fortify

如果用 GCC 编译 C/C++ 代码时指定了 _FORTIFY_SOURCE宏，GCC 会对一些能导致缓存溢出的函数调用进行修改。详细可见 man feature_test_macros。比如下面这段代码，当把“hello world”字符串用 strcpy 函数复制到一个只能容纳 5 个字符的数组中，是明显会导致溢出的：

#include <string.h>

int main() {
  char string[5];
  strcpy(string, "hello world");
  return 0;
}

当编译时定义 _FORTIFY_SOURCE 宏时，GCC 会发出警告：

$ gcc -D_FORTIFY_SOURCE=1 test_fortify.c -O
In file included from /usr/include/string.h:635:0,
                 from test_fortify.c:1:
在函数‘strcpy’中,
    内联自‘main’于 test_fortify.c:5:3:
/usr/include/bits/string3.h:110:10: 警告：对 __builtin___strcpy_chk 的调用总是导致目标缓冲区溢出
   return __builtin___strcpy_chk (__dest, __src, __bos (__dest));
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

上面的例子虽然可以在编译时静态检测出问题，但比较少见，因为多数缓存溢出是发生在不可静态检测的情况下。如下：

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
  char string[5];
  strcpy(string, argv[1]);
  printf("%s\n", string);
  return 0;
}

编译器无法通过静态分析知道程序的第一个参数是否会发生溢出，此处参数超过 5 字节会被溢出，但是否溢出取决于用户的输入。在 _FORTIFY_SOURCE 启用的情况下，发生溢出时程序会被及时中止，并给出提示：

$ gcc -D_FORTIFY_SOURCE=1 test_fortify.c -O
$ ./a.out 1111111111111111111
*** buffer overflow detected ***: ./a.out terminated
======= Backtrace: =========
/lib64/libc.so.6(+0x791fb)[0x7f1b455091fb]
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f1b455aa187]
/lib64/libc.so.6(+0x118120)[0x7f1b455a8120]
/lib64/libc.so.6(+0x117482)[0x7f1b455a7482]
./a.out[0x40057b]
/lib64/libc.so.6(__libc_start_main+0xf1)[0x7f1b454b0401]
./a.out[0x40049a]
======= Memory map: ========
00400000-00401000 r-xp 00000000 fd:00 657528                             /home/lu4nx/tmp/shellcode/test_fortify/a.out
00600000-00601000 r--p 00000000 fd:00 657528                             /home/lu4nx/tmp/shellcode/test_fortify/a.out
00601000-00602000 rw-p 00001000 fd:00 657528                             /home/lu4nx/tmp/shellcode/test_fortify/a.out
01334000-01355000 rw-p 00000000 00:00 0                                  [heap]
7f1b45279000-7f1b4528f000 r-xp 00000000 fd:00 2505812                    /usr/lib64/libgcc_s-6.3.1-20161221.so.1
7f1b4528f000-7f1b4548e000 ---p 00016000 fd:00 2505812                    /usr/lib64/libgcc_s-6.3.1-20161221.so.1
7f1b4548e000-7f1b4548f000 r--p 00015000 fd:00 2505812                    /usr/lib64/libgcc_s-6.3.1-20161221.so.1
7f1b4548f000-7f1b45490000 rw-p 00016000 fd:00 2505812                    /usr/lib64/libgcc_s-6.3.1-20161221.so.1
7f1b45490000-7f1b4564d000 r-xp 00000000 fd:00 2503196                    /usr/lib64/libc-2.24.so
7f1b4564d000-7f1b4584c000 ---p 001bd000 fd:00 2503196                    /usr/lib64/libc-2.24.so
7f1b4584c000-7f1b45850000 r--p 001bc000 fd:00 2503196                    /usr/lib64/libc-2.24.so
7f1b45850000-7f1b45852000 rw-p 001c0000 fd:00 2503196                    /usr/lib64/libc-2.24.so
7f1b45852000-7f1b45856000 rw-p 00000000 00:00 0
7f1b45856000-7f1b4587b000 r-xp 00000000 fd:00 2500300                    /usr/lib64/ld-2.24.so
7f1b45a47000-7f1b45a49000 rw-p 00000000 00:00 0
7f1b45a78000-7f1b45a7b000 rw-p 00000000 00:00 0
7f1b45a7b000-7f1b45a7c000 r--p 00025000 fd:00 2500300                    /usr/lib64/ld-2.24.so
7f1b45a7c000-7f1b45a7d000 rw-p 00026000 fd:00 2500300                    /usr/lib64/ld-2.24.so
7f1b45a7d000-7f1b45a7e000 rw-p 00000000 00:00 0
7ffd73d54000-7ffd73d75000 rw-p 00000000 00:00 0                          [stack]
7ffd73dbb000-7ffd73dbd000 r--p 00000000 00:00 0                          [vvar]
7ffd73dbd000-7ffd73dbf000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
zsh: abort (core dumped)  ./a.out 1111111111111111111

现在，我们对比下有无 _FORTIFY_SOURCE 时，程序会有什么不同。

以下，在不启用 _FORTIFY_SOURCE 时，main 函数的汇编代码：

0000000000400546 <main>:
  400546:	55                   	push   %rbp
  400547:	48 89 e5             	mov    %rsp,%rbp
  40054a:	48 83 ec 20          	sub    $0x20,%rsp
  40054e:	89 7d ec             	mov    %edi,-0x14(%rbp)
  400551:	48 89 75 e0          	mov    %rsi,-0x20(%rbp)
  400555:	48 8b 45 e0          	mov    -0x20(%rbp),%rax
  400559:	48 83 c0 08          	add    $0x8,%rax
  40055d:	48 8b 10             	mov    (%rax),%rdx
  400560:	48 8d 45 f0          	lea    -0x10(%rbp),%rax
  400564:	48 89 d6             	mov    %rdx,%rsi
  400567:	48 89 c7             	mov    %rax,%rdi
  40056a:	e8 c1 fe ff ff       	callq  400430 <strcpy@plt> ; 程序是直接调用 strcpy 函数的
  40056f:	48 8d 45 f0          	lea    -0x10(%rbp),%rax
  400573:	48 89 c7             	mov    %rax,%rdi
  400576:	e8 c5 fe ff ff       	callq  400440 <puts@plt>
  40057b:	b8 00 00 00 00       	mov    $0x0,%eax
  400580:	c9                   	leaveq
  400581:	c3                   	retq
  400582:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  400589:	00 00 00
  40058c:	0f 1f 40 00          	nopl   0x0(%rax)

启用 _FORTIFY_SOURCE 重新编译，再看 main 函数的汇编：

0000000000400566 <main>:
  400566:	48 83 ec 18          	sub    $0x18,%rsp
  40056a:	48 8b 76 08          	mov    0x8(%rsi),%rsi
  40056e:	ba 05 00 00 00       	mov    $0x5,%edx
  400573:	48 89 e7             	mov    %rsp,%rdi
  400576:	e8 e5 fe ff ff       	callq  400460 <__strcpy_chk@plt> ; 不再直接调用 strcpy，而是调用了 __strcpy_chk 函数
  40057b:	48 89 e7             	mov    %rsp,%rdi
  40057e:	e8 cd fe ff ff       	callq  400450 <puts@plt>
  400583:	b8 00 00 00 00       	mov    $0x0,%eax
  400588:	48 83 c4 18          	add    $0x18,%rsp
  40058c:	c3                   	retq
  40058d:	0f 1f 00             	nopl   (%rax)

1.4.1. 工作原理

当定义 _FORTIFY_SOURCE=1 宏时，GCC 在内部重新定义了一个新的宏 __SSP_FORTIFY_LEVEL，实现如下：

/* File: libssp/ssp/ssp.h.in */

#if _FORTIFY_SOURCE > 0 && __OPTIMIZE__ > 0 \
    && defined __GNUC__ \
    && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 1)) \
    && !defined __cplusplus
# if _FORTIFY_SOURCE == 1
#  define __SSP_FORTIFY_LEVEL 1
# elif _FORTIFY_SOURCE > 1
#  define __SSP_FORTIFY_LEVEL 2
# endif
#endif

如果 __SSP_FORTIFY_LEVEL 的值大于 0，GCC 会取消会造成缓存溢出的函数的定义，并且重新定义带安全检查的。例，对 memcpy 的重定义：

/* File: libssp/ssp/string.h */
#if __SSP_FORTIFY_LEVEL > 0

/* 取消对这些危险函数的定义 */
#undef memcpy
#undef memmove
#undef memset
#undef strcat
#undef strcpy
#undef strncat
#undef strncpy
#undef mempcpy
#undef stpcpy
#undef bcopy
#undef bzero

/* 重新实现带内存检查的 memcpy 函数 */
#define memcpy(dest, src, len) \
  ((__ssp_bos0 (dest) != (size_t) -1)                                   \
   ? __builtin___memcpy_chk (dest, src, len, __ssp_bos0 (dest))         \
   : __memcpy_ichk (dest, src, len))
static inline __attribute__((__always_inline__)) void *
__memcpy_ichk (void *__restrict__ __dest, const void *__restrict__ __src,
               size_t __len)
{
  return __builtin___memcpy_chk (__dest, __src, __len, __ssp_bos0 (__dest));
}

强烈建议编译代码时开启 _FORTIFY_SOURCE。

1.5. AAAS

AAAS（ASCII Armored Address Space）机制让共享库在加载到内存时，使用 0x00（NULL）开头的内存地址，这样可以去防止一些字符串相关的函数引起的溢出，因为 0x00 表示截断。

如下，内存地址都是 0x00 开头的：

0x00401199 <+0>:	lea    0x4(%esp),%ecx
0x0040119d <+4>:	and    $0xfffffff0,%esp
0x004011a0 <+7>:	pushl  -0x4(%ecx)
0x004011a3 <+10>:	push   %ebp
0x004011a4 <+11>:	mov    %esp,%ebp
0x004011a6 <+13>:	push   %ebx
0x004011a7 <+14>:	push   %ecx

1.6. RELRO

RELRO（RELocation Read-Only，重定位只读），用于防御 GOT 表被改写。分两种情况：

Partial RELRO：部分段开启 RELRO
Full RELRO：完全开启 RELRO

当启用了 RELRO 机制后，GOT 表变为只读，用于应对一些利用 GOT 来绕过 ASLR 等防御机制。

GCC 默认开启了 Partial RELRO，其他编译选项如下：

关闭 RELRO：
gcc -z norelro

开启 Partial RELRO：
gcc -z lazy

开启 Full RELRO：
gcc -z now

1.7. PIE

PIE（Position Independent Executables，位置无关可执行文件），在 GCC 编译时，加上以下两个参数就可以启用该机制：

gcc -fpie -pie <filename.c>

首先准备一个源码测试文件：

#include <stdio.h>

int main(void) {
  printf("hello world\n");
  return 0;
}

然后分别在不启用和启用 PIE 下编译出两个程序：

$ gcc main.c -o no_pie
$ gcc main.c -fpie -pie -o pie

接下来看看两个文件有什么区别：

$ file no_pie
no_pie: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=ecd3698edf4419692385897cdccad87fdf3c4cd8, for GNU/Linux 3.2.0, not stripped
$ file pie
pie: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=f0ebfd43c5e535b96e2fc64008c50a67f08e8608, for GNU/Linux 3.2.0, not stripped

如上显示，启用了 PIE 后，文件类型变成“pie executable”。

它们的区别在于“位置无关”，首先弄清什么是“位置有关”，先看 no_pie 这个文件的加载地址：

$ readelf -l no_pie | fgrep LOAD | head -1
 LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000

表示程序运行时，是从地址 0x0000000000400000 开始加载的，因此这就叫做“位置有关”，它有固定的内存地址。程序还调用了 printf 函数，反汇编看看：

$ objdump -d no_pie | fgrep puts@plt
40112f:       e8 fc fe ff ff          callq  401030 <puts@plt>

调用的 puts 地址也是固定好的。

同样的方法，再看看启用 PIE 后的 pie 文件，首先是加载地址：

$ readelf -l pie | fgrep LOAD | head -1
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000

加载地址是 0x0000000000000000——没有指定加载地址；再看调用的 puts 函数：

$ objdump -d pie | fgrep puts@plt
1144:       e8 e7 fe ff ff          callq  1030 <puts@plt>

如上，1030 只是个偏移地址。这时，加载地址、调用的函数地址变得和位置无关了。

因此，启用了 PIE 的程序将变得像动态链接库一样，可以加载到内存任意地址，并且调用的 PLT 函数位置也只是个偏移，配合了 ASLR 机制后，程序的内存地址将变得随机，哪怕 ret2plt 技术也变得无效。

2. 0x02 checksec 原理

pwntools、PEDA 都集成了 checksec 功能，用于检查二进制文件是否启用了 RELRO、Canary、NX 和 PIE。

实际上都是通过读取 ELF 文件信息获得的，以 PEDA 的为例，根据 readelf 命令执行后的输出来做判断，关键代码如下：

def checksec(self, filename=None):
    result = {}
    result["RELRO"] = 0
    result["CANARY"] = 0
    result["NX"] = 1
    result["PIE"] = 0
    result["FORTIFY"] = 0

    # 执行 /usr/bin/readelf 命令
    out =  execute_external_command("%s -W -a \"%s\" 2>&1" % (config.READELF, filename))

    for line in out.splitlines():
        if "GNU_RELRO" in line:
            result["RELRO"] |= 2
        if "BIND_NOW" in line:
            result["RELRO"] |= 1
        if "__stack_chk_fail" in line:  # 判断是否包含了 __stack_chk_fail 函数
            result["CANARY"] = 1
        if "GNU_STACK" in line and "RWE" in line:  # 判断栈空间是否有可执行权限
            result["NX"] = 0
        if "Type:" in line and "DYN (" in line:
            result["PIE"] = 4
        if "(DEBUG)" in line and result["PIE"] == 4:
            result["PIE"] = 1
        if "_chk@" in line:
            result["FORTIFY"] = 1

    if result["RELRO"] == 1:
        result["RELRO"] = 0
    return result

3. 0x03 内核

3.1. Zero Address

由于 0x00 内存位于用户态空间，当内核出现引用 NULL 指针的情况下，内核将访问到零地址；而用户态可以从 0x0 开始部署 shellcode，导致出现内核提权漏洞。

Linux 内核从 2007 年 7 月发布的 2.6.22 开始，增加了 vm.mmap_min_addr 内核选项，限制了 mmap 最小的映射范围，用来防止 NULL 解引用攻击。

如下，我当前系统（Fedora 31）的默认值：

$ sysctl vm.mmap_min_addr
vm.mmap_min_addr = 65536

3.2. SMAP/SMEP

早期的内核漏洞可以通过访问用户态空间来完成用户态的提权（usercopy），这种攻击被称作 ret2usr。开启了 SMAP/SMEP 后，便使得这种攻击无效了。

SMAP（Supervisor Mode Access Prevention，管理模式访问保护）：禁止内核空间访问用户空间的内存数据，Linux 3.7 开始支持。

SMEP（Supervisor Mode Execution Prevention，管理模式执行保护)：禁止内核空间执行用户空间的代码，Linux 3.0 开始支持（Fenghua Yu 给 2.6.39 内核提交的补丁）。

这两个特性需要 CPU 的支持，可以从 /proc/cpuinfo 中查看是否支持：

$ grep 'sm[ae]p' /proc/cpuinfo

如果要禁用这两个功能，在启动系统到 GRUB 界面时，编辑内核启动参数，追加：

nosmap nosmep

ARM 处理器也有类似的实现，叫做 PXN（Privilege Execute Never）和 PAN（Privileged Access Never）。

3.3. Fortify

内核的字符串函数也有 Fortify 机制，如果触发了溢出（如 memcpy），将捕获到以下 panic 信息：

[ 1073.003328] detected buffer overflow in memcpy
[ 1073.003353] ------------[ cut here ]------------
[ 1073.003358] kernel BUG at lib/string.c:1072!
[ 1073.003373] invalid opcode: 0000 [#4] SMP PTI
[ 1073.003382] CPU: 0 PID: 2082 Comm: perl Tainted: P      D    OE     4.19.0-6-686-pae #1 Debian 4.19.67-2+deb10u2
[ 1073.003386] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[ 1073.003398] EIP: fortify_panic+0xe/0x19

内核最终执行了 fortify_panic 这个函数，它定义在 lib/string.c 中：

void fortify_panic(const char *name)
{
        pr_emerg("detected buffer overflow in %s\n", name);
        BUG();
}
EXPORT_SYMBOL(fortify_panic);

开启内核 Fortify，只需要在编译内核的时配置，在“Security options”中勾选“Harden common str/mem functions against buffer overflows”。

一旦开启了 Fortify，内核中字符串函数将调用安全的版本，如 memcpy （更多见 include/linux/string.h）：

__FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
{
        size_t p_size = __builtin_object_size(p, 0);
        size_t q_size = __builtin_object_size(q, 0);
        if (__builtin_constant_p(size)) {
                if (p_size < size)
                        __write_overflow();
                if (q_size < size)
                        __read_overflow2();
        }
        if (p_size < size || q_size < size)
                fortify_panic(__func__);
        return __builtin_memcpy(p, q, size);
}

fortify_panic 函数就是在检测到溢出时调用的。

3.4. KASLR

KASLR（Kernel Address Space Layout Randomization，内核地址空间布局随机化），Linux 3.14 引进该机制，可以在内核中配置：

Processor type and features > Randomize the address of the kernel image (KASLR)

编译内核支持后，如果没有默认生效，在内核启动参数中追加 kaslr 即可（禁用就加 nokaslr），如：

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.19.0-6-686-pae root=UUID=245a4418-e066-4e6c-ad25-de8bf896ae94 ro quiet nosmap nosmep kaslr

启用 KASLR 后，/proc/kallsyms 中函数符号的地址在每次启动都会不一样：

$ sudo fgrep commit_creds /proc/kallsyms # 第一次启动
d408f8d0 T commit_creds

$ sudo fgrep commit_creds /proc/kallsyms # 第二次启动
de08f8d0 T commit_creds

$ sudo fgrep commit_creds /proc/kallsyms # 第三次启动
cf08f8d0 T commit_creds

3.5. KADR

KADR（Kernel Address Display Restriction），在一些内核提权的 exploit 代码中，经常可以看到类似下方 main 函数中实现的代码，从 /proc/kallsyms 中读取 prepare_kernel_cred 和 commit_creds 地址来调用：

#include <stdio.h>
#include <string.h>

void *(*prepare_kernel_cred)(void*) __attribute__((regparm(3)));
void *(*commit_creds)(void*) __attribute__((regparm(3)));


int main(void) {
  FILE *fd;
  int ret = 0;
  unsigned long addr;
  char dummy;
  char sname[512];

  fd = fopen("/proc/kallsyms", "r");
  if (fd == NULL) {
    perror("open /proc/kallsyms failed\n");
    return -1;
  }

  while (ret != EOF) {
    ret = fscanf(fd, "%p %c %sn", (void **)&addr, &dummy, sname);
    if(prepare_kernel_cred && commit_creds)
      break;

    if(!strncmp(sname, "prepare_kernel_cred", 512))
      prepare_kernel_cred = (void*)addr;
    if(!strncmp(sname, "commit_creds", 512))
      commit_creds = (void*)addr;
  }

  fclose(fd);
  fprintf(stdout, "commit_creds at %p\n", (void*)commit_creds);
  fprintf(stdout, "prepare_kernel_cred at %p\n", (void**)prepare_kernel_cred);
  return 0;
}

该机制用于防止内核符号地址泄露，比如非特权帐号读取 /proc/kallsyms，将无法显示函数符号地址，可用于缓解 exploit：

$ head /proc/kallsyms
0000000000000000 A fixed_percpu_data
0000000000000000 A __per_cpu_start
0000000000000000 A cpu_debug_store
0000000000000000 A irq_stack_backing_store
...

实际上，内核中用“%pK”打印地址时，为了防止内存地址泄露，输出的地址会受到 /proc/sys/kernel/kptr_restrict 的值影响：

值	作用
0	禁用，无论特权还是普通用户都打印地址
1	用 0 替换地址，只有特权用户才显示符号地址
2	特权和普通用户都只显示 0 地址

具体的实现可见内核源码 lib/vsprintf.c：

static noinline_for_stack
char *restricted_pointer(char *buf, char *end, const void *ptr,
                         struct printf_spec spec)
{
  switch (kptr_restrict) {
  case 0:
    /* Always print %pK values */
    break;
  case 1: {
    const struct cred *cred;

    /*
     * kptr_restrict==1 cannot be used in IRQ context
     * because its test for CAP_SYSLOG would be meaningless.
     */
    if (in_irq() || in_serving_softirq() || in_nmi()) {
      if (spec.field_width == -1)
        spec.field_width = 2 * sizeof(ptr);
      return string(buf, end, "pK-error", spec);
    }

    /*
     * Only print the real pointer value if the current
     * process has CAP_SYSLOG and is running with the
     * same credentials it started with. This is because
     * access to files is checked at open() time, but %pK
     * checks permission at read() time. We don't want to
     * leak pointer values if a binary opens a file using
     * %pK and then elevates privileges before reading it.
     */
    cred = current_cred();
    if (!has_capability_noaudit(current, CAP_SYSLOG) ||
        !uid_eq(cred->euid, cred->uid) ||
        !gid_eq(cred->egid, cred->gid))
      ptr = NULL;
    break;
  }
  case 2:
  default:
    /* Always print 0's for %pK */
    ptr = NULL;
    break;
  }

  return pointer_string(buf, end, ptr, spec);
}

但是，在新一点的内核中会发现，即便 kptr_restrict 为 0，普通用户读取 /proc/kallsyms 时，函数地址仍旧为 0。这里，我们分析 4.19 的内核来找原因，kallsyms 的实现在 kernel/kallsyms.c 中：

static inline int kallsyms_for_perf(void)
{
#ifdef CONFIG_PERF_EVENTS
  extern int sysctl_perf_event_paranoid;
  if (sysctl_perf_event_paranoid <= 1)
    return 1;
#endif
  return 0;
}

/*
 * We show kallsyms information even to normal users if we've enabled
 * kernel profiling and are explicitly not paranoid (so kptr_restrict
 * is clear, and sysctl_perf_event_paranoid isn't set).
 *
 * Otherwise, require CAP_SYSLOG (assuming kptr_restrict isn't set to
 * block even that).
 */
int kallsyms_show_value(void)
{
  switch (kptr_restrict) {
  case 0:
    if (kallsyms_for_perf())
      return 1;
    /* fallthrough */
  case 1:
    if (has_capability_noaudit(current, CAP_SYSLOG))
      return 1;
    /* fallthrough */
  default:
    return 0;
  }
}

kallsyms_show_value 函数中，当 kptr_restrict 为 0 时，调用了 kallsyms_for_perf，从 kallsyms_for_perf 的实现来看，只有满足了内核配置了 CONFIG_PERF_EVENTS，并且 perf_event_paranoid 的值小于等于 1 时才会显示内存地址。

检查当前内核是否配置 CONFIG_PERF_EVENTS：

$ fgrep CONFIG_PERF_EVENTS= /boot/config-`uname -r`
CONFIG_PERF_EVENTS=y

查看并设置 perf_event_paranoid：

$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = 0
$ sudo sysctl -w kernel.perf_event_paranoid=1
kernel.perf_event_paranoid = 1

然后用普通用户身份读取 /proc/kallsyms：

$ fgrep commit_creds /proc/kallsyms
ffffffff9c10ae50 T commit_creds

3.6. dmesg restrict

将 /proc/sys/kernel/dmesg_restrict 设置为 1，dmesg 消息就被认为是敏感信息，非特权用户将不能用 dmesg 命令查看硬件消息。

$ sudo sysctl kernel.dmesg_restrict=1
kernel.dmesg_restrict = 1
$ dmesg
dmesg: read kernel buffer failed: Operation not permitted

3.7. CPU 漏洞缓解技术

有一些处理器级别的漏洞，虽然无法对已有硬件修复，但也会从软件层面用一些技术做缓解，路径 /sys/devices/system/cpu/vulnerabilities/ 下列出了所有针对 CPU 漏洞的缓解技术：

$ ls /sys/devices/system/cpu/vulnerabilities/
itlb_multihit  l1tf  mds  meltdown  spec_store_bypass  spectre_v1  spectre_v2  tsx_async_abort
$ cat /sys/devices/system/cpu/vulnerabilities/*
KVM: Mitigation: Split huge pages
Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
Mitigation: Clear CPU buffers; SMT vulnerable
Mitigation: PTI
Mitigation: Speculative Store Bypass disabled via prctl and seccomp
Mitigation: usercopy/swapgs barriers and __user pointer sanitization
Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling
Not affected

每个文件对应一个漏洞，并且文件内容的字段有三种：

Vulnerable：CPU 存在该漏洞，并且没有缓解技术
Not affected：该漏洞不存在
Mitigation：漏洞存在，并使用了缓解技术（对应的值就是具体技术）

比如 Meltdown 漏洞：

$ cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: PTI

表示使用了 PTI 缓解技术，实际上就是 KPTI，表示我当前的系统已经启用了 KPTI。

4. 0x04 资源推荐

最新的 Linux 内核安全改进/加强资讯：https://outflux.net/blog/
《Privilege Separation and Pledge》，这份 PPT 详细介绍了 OpenBSD 的各种安全机制，可以做一个参考。PPT 下载：https://www.openbsd.org/papers/dot2016.pdf