Emacs构造过程

Emacs架构分两层:

编译Emacs时,是先将C语言实现的编译成二进制文件“temacs”,再启动“temacs”来加载Emacs Lisp写的核心代码,最后借助“temacs”将加载到内存的代码以及本身写到二进制文件“emacs”“中。

src/Makefile里可看到temacs是如何产生的:

temacs$(EXEEXT): $(LIBXMENU) $(ALLOBJS) \
		 $(lib)/libgnu.a $(EMACSRES) ${charsets} ${charscript}
	$(AM_V_CCLD)$(CC) $(ALL_CFLAGS) $(TEMACS_LDFLAGS) $(LDFLAGS) \
	  -o temacs $(ALLOBJS) $(lib)/libgnu.a $(W32_RES_LINK) $(LIBES)
	$(MKDIR_P) $(etc)
说明:

$(EXEEXT)用于定义扩展名,可以忽略它。

其他都是编译一些选项,最重要的是ALLOBJS,它定义了要被编译的目标:

ALLOBJS = $(FIRSTFILE_OBJ) $(VMLIMIT_OBJ) $(obj) $(otherobj)

### 依赖关系如下:
# NS_OBJC_OBJ可忽略
obj = $(base_obj) $(NS_OBJC_OBJ)

base_obj = dispnew.o frame.o scroll.o xdisp.o menu.o $(XMENU_OBJ) window.o \
	charset.o coding.o category.o ccl.o character.o chartab.o bidi.o \
	$(CM_OBJ) term.o terminal.o xfaces.o $(XOBJ) $(GTK_OBJ) $(DBUS_OBJ) \
	emacs.o keyboard.o macros.o keymap.o sysdep.o \
	buffer.o filelock.o insdel.o marker.o \
	minibuf.o fileio.o dired.o \
	cmds.o casetab.o casefiddle.o indent.o search.o regex.o undo.o \
	alloc.o data.o doc.o editfns.o callint.o \
	eval.o floatfns.o fns.o font.o print.o lread.o $(MODULES_OBJ) \
	syntax.o $(UNEXEC_OBJ) bytecode.o \
	process.o gnutls.o callproc.o \
	region-cache.o sound.o atimer.o \
	doprnt.o intervals.o textprop.o composite.o xml.o $(NOTIFY_OBJ) \
	$(XWIDGETS_OBJ) \
	profiler.o decompress.o \
	$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \
	$(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ)

现在,来看emacs是如何编译的:

emacs$(EXEEXT): temacs$(EXEEXT) \
		lisp.mk $(etc)/DOC $(lisp) $(leimdir)/leim-list.el \
		$(lispsource)/international/charprop.el ${charsets}
ifeq ($(CANNOT_DUMP),yes)
	ln -f temacs$(EXEEXT) $@
else
	LC_ALL=C $(RUN_TEMACS) -batch -l loadup dump
	$(PAXCTL_if_present) -zex $@
	ln -f $@ bootstrap-emacs$(EXEEXT)
endif

最重要的是这句:

$(RUN_TEMACS) -batch -l loadup dump

先启动temacs,再加载lisp/loadup.el,loadup.el加载了核心的Lisp代码之后调用dump-emacs函数将当前内存中的所有Emacs Lisp对象以及temacs本身写入到了新的二进制文件中,这个二进制文件就是emacs:

(if (member (car (last command-line-args)) '("dump" "bootstrap"))
    (progn
      (message "Dumping under the name emacs")
      (condition-case ()
	  (delete-file "emacs")
	(file-error nil))
      ;; We used to dump under the name xemacs, but that occasionally
      ;; confused people installing Emacs (they'd install the file
      ;; under the name `xemacs'), and it's inconsistent with every
      ;; other GNU program's build process.
      (dump-emacs "emacs" "temacs")
      (message "%d pure bytes used" pure-bytes-used)
      ;; Recompute NAME now, so that it isn't set when we dump.
      (if (not (or (eq system-type 'ms-dos)
		   ;; Don't bother adding another name if we're just
		   ;; building bootstrap-emacs.
		   (equal (last command-line-args) '("bootstrap"))))
	  (let ((name (concat "emacs-" emacs-version))
		(exe (if (eq system-type 'windows-nt) ".exe" "")))
	    (while (string-match "[^-+_.a-zA-Z0-9]+" name)
	      (setq name (concat (downcase (substring name 0 (match-beginning 0)))
				 "-"
				 (substring name (match-end 0)))))
	    (setq name (concat name exe))
	    (message "Adding name %s" name)
	    ;; When this runs on Windows, invocation-directory is not
	    ;; necessarily the current directory.
	    (add-name-to-file (expand-file-name (concat "emacs" exe)
						invocation-directory)
			      (expand-file-name name invocation-directory)
			      t)))
      (kill-emacs)))

dump-emacs函数是用C语言实现的,定义在src/emacs.c中的:

DEFUN ("dump-emacs", Fdump_emacs, Sdump_emacs, 2, 2, 0,
       doc: /* Dump current state of Emacs into executable file FILENAME.
Take symbols from SYMFILE (presumably the file you executed to run Emacs).
This is used in the file `loadup.el' when building Emacs.

You must run Emacs in batch mode in order to dump it.  */)
  (Lisp_Object filename, Lisp_Object symfile)
{
  ...省略...

  alloc_unexec_pre ();

  unexec (SSDATA (filename), !NILP (symfile) ? SSDATA (symfile) : 0);

  alloc_unexec_post ();

  ...省略...
  return unbind_to (count, Qnil);
}

关键函数是unexec,unexec负责将内存的导出并生成二进制文件。

Emacs启动那么快是因为事先就将Lisp代码写入到了二进制文件中,运行时就一同加载到了内存中,而不是启动时才逐个文件加载的。temacs在C和Lisp之间起到很好的桥梁作用。