From 7f46a240b0a1797eb641c046d445f026563463d4 Mon Sep 17 00:00:00 2001
From: Rob Landley <rob@landley.net>
Date: Mon, 7 Nov 2005 01:01:09 -0800
Subject: [PATCH] ramfs, rootfs, and initramfs docs

Docs for ramfs, rootfs, and initramfs.

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
 .../filesystems/ramfs-rootfs-initramfs.txt         | 195 +++++++++++++++++++++
 1 file changed, 195 insertions(+)
 create mode 100644 Documentation/filesystems/ramfs-rootfs-initramfs.txt

(limited to 'Documentation/filesystems/ramfs-rootfs-initramfs.txt')

diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
new file mode 100644
index 000000000000..b3404a032596
--- /dev/null
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
@@ -0,0 +1,195 @@
+ramfs, rootfs and initramfs
+October 17, 2005
+Rob Landley <rob@landley.net>
+=============================
+
+What is ramfs?
+--------------
+
+Ramfs is a very simple filesystem that exports Linux's disk caching
+mechanisms (the page cache and dentry cache) as a dynamically resizable
+ram-based filesystem.
+
+Normally all files are cached in memory by Linux.  Pages of data read from
+backing store (usually the block device the filesystem is mounted on) are kept
+around in case it's needed again, but marked as clean (freeable) in case the
+Virtual Memory system needs the memory for something else.  Similarly, data
+written to files is marked clean as soon as it has been written to backing
+store, but kept around for caching purposes until the VM reallocates the
+memory.  A similar mechanism (the dentry cache) greatly speeds up access to
+directories.
+
+With ramfs, there is no backing store.  Files written into ramfs allocate
+dentries and page cache as usual, but there's nowhere to write them to.
+This means the pages are never marked clean, so they can't be freed by the
+VM when it's looking to recycle memory.
+
+The amount of code required to implement ramfs is tiny, because all the
+work is done by the existing Linux caching infrastructure.  Basically,
+you're mounting the disk cache as a filesystem.  Because of this, ramfs is not
+an optional component removable via menuconfig, since there would be negligible
+space savings.
+
+ramfs and ramdisk:
+------------------
+
+The older "ram disk" mechanism created a synthetic block device out of
+an area of ram and used it as backing store for a filesystem.  This block
+device was of fixed size, so the filesystem mounted on it was of fixed
+size.  Using a ram disk also required unnecessarily copying memory from the
+fake block device into the page cache (and copying changes back out), as well
+as creating and destroying dentries.  Plus it needed a filesystem driver
+(such as ext2) to format and interpret this data.
+
+Compared to ramfs, this wastes memory (and memory bus bandwidth), creates
+unnecessary work for the CPU, and pollutes the CPU caches.  (There are tricks
+to avoid this copying by playing with the page tables, but they're unpleasantly
+complicated and turn out to be about as expensive as the copying anyway.)
+More to the point, all the work ramfs is doing has to happen _anyway_,
+since all file access goes through the page and dentry caches.  The ram
+disk is simply unnecessary, ramfs is internally much simpler.
+
+Another reason ramdisks are semi-obsolete is that the introduction of
+loopback devices offered a more flexible and convenient way to create
+synthetic block devices, now from files instead of from chunks of memory.
+See losetup (8) for details.
+
+ramfs and tmpfs:
+----------------
+
+One downside of ramfs is you can keep writing data into it until you fill
+up all memory, and the VM can't free it because the VM thinks that files
+should get written to backing store (rather than swap space), but ramfs hasn't
+got any backing store.  Because of this, only root (or a trusted user) should
+be allowed write access to a ramfs mount.
+
+A ramfs derivative called tmpfs was created to add size limits, and the ability
+to write the data to swap space.  Normal users can be allowed write access to
+tmpfs mounts.  See Documentation/filesystems/tmpfs.txt for more information.
+
+What is rootfs?
+---------------
+
+Rootfs is a special instance of ramfs, which is always present in 2.6 systems.
+(It's used internally as the starting and stopping point for searches of the
+kernel's doubly-linked list of mount points.)
+
+Most systems just mount another filesystem over it and ignore it.  The
+amount of space an empty instance of ramfs takes up is tiny.
+
+What is initramfs?
+------------------
+
+All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is
+extracted into rootfs when the kernel boots up.  After extracting, the kernel
+checks to see if rootfs contains a file "init", and if so it executes it as PID
+1.  If found, this init process is responsible for bringing the system the
+rest of the way up, including locating and mounting the real root device (if
+any).  If rootfs does not contain an init program after the embedded cpio
+archive is extracted into it, the kernel will fall through to the older code
+to locate and mount a root partition, then exec some variant of /sbin/init
+out of that.
+
+All this differs from the old initrd in several ways:
+
+  - The old initrd was a separate file, while the initramfs archive is linked
+    into the linux kernel image.  (The directory linux-*/usr is devoted to
+    generating this archive during the build.)
+
+  - The old initrd file was a gzipped filesystem image (in some file format,
+    such as ext2, that had to be built into the kernel), while the new
+    initramfs archive is a gzipped cpio archive (like tar only simpler,
+    see cpio(1) and Documentation/early-userspace/buffer-format.txt).
+
+  - The program run by the old initrd (which was called /initrd, not /init) did
+    some setup and then returned to the kernel, while the init program from
+    initramfs is not expected to return to the kernel.  (If /init needs to hand
+    off control it can overmount / with a new root device and exec another init
+    program.  See the switch_root utility, below.)
+
+  - When switching another root device, initrd would pivot_root and then
+    umount the ramdisk.  But initramfs is rootfs: you can neither pivot_root
+    rootfs, nor unmount it.  Instead delete everything out of rootfs to
+    free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
+    with the new root (cd /newmount; mount --move . /; chroot .), attach
+    stdin/stdout/stderr to the new /dev/console, and exec the new init.
+
+    Since this is a remarkably persnickity process (and involves deleting
+    commands before you can run them), the klibc package introduced a helper
+    program (utils/run_init.c) to do all this for you.  Most other packages
+    (such as busybox) have named this command "switch_root".
+
+Populating initramfs:
+---------------------
+
+The 2.6 kernel build process always creates a gzipped cpio format initramfs
+archive and links it into the resulting kernel binary.  By default, this
+archive is empty (consuming 134 bytes on x86).  The config option
+CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices
+in menuconfig, and living in usr/Kconfig) can be used to specify a source for
+the initramfs archive, which will automatically be incorporated into the
+resulting binary.  This option can point to an existing gzipped cpio archive, a
+directory containing files to be archived, or a text file specification such
+as the following example:
+
+  dir /dev 755 0 0
+  nod /dev/console 644 0 0 c 5 1
+  nod /dev/loop0 644 0 0 b 7 0
+  dir /bin 755 1000 1000
+  slink /bin/sh busybox 777 0 0
+  file /bin/busybox initramfs/busybox 755 0 0
+  dir /proc 755 0 0
+  dir /sys 755 0 0
+  dir /mnt 755 0 0
+  file /init initramfs/init.sh 755 0 0
+
+One advantage of the text file is that root access is not required to
+set permissions or create device nodes in the new archive.  (Note that those
+two example "file" entries expect to find files named "init.sh" and "busybox" in
+a directory called "initramfs", under the linux-2.6.* directory.  See
+Documentation/early-userspace/README for more details.)
+
+If you don't already understand what shared libraries, devices, and paths
+you need to get a minimal root filesystem up and running, here are some
+references:
+http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
+http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
+http://www.linuxfromscratch.org/lfs/view/stable/
+
+The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
+designed to be a tiny C library to statically link early userspace
+code against, along with some related utilities.  It is BSD licensed.
+
+I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
+myself.  These are LGPL and GPL, respectively.
+
+In theory you could use glibc, but that's not well suited for small embedded
+uses like this.  (A "hello world" program statically linked against glibc is
+over 400k.  With uClibc it's 7k.  Also note that glibc dlopens libnss to do
+name lookups, even when otherwise statically linked.)
+
+Future directions:
+------------------
+
+Today (2.6.14), initramfs is always compiled in, but not always used.  The
+kernel falls back to legacy boot code that is reached only if initramfs does
+not contain an /init program.  The fallback is legacy code, there to ensure a
+smooth transition and allowing early boot functionality to gradually move to
+"early userspace" (I.E. initramfs).
+
+The move to early userspace is necessary because finding and mounting the real
+root device is complex.  Root partitions can span multiple devices (raid or
+separate journal).  They can be out on the network (requiring dhcp, setting a
+specific mac address, logging into a server, etc).  They can live on removable
+media, with dynamically allocated major/minor numbers and persistent naming
+issues requiring a full udev implementation to sort out.  They can be
+compressed, encrypted, copy-on-write, loopback mounted, strangely partitioned,
+and so on.
+
+This kind of complexity (which inevitably includes policy) is rightly handled
+in userspace.  Both klibc and busybox/uClibc are working on simple initramfs
+packages to drop into a kernel build, and when standard solutions are ready
+and widely deployed, the kernel's legacy early boot code will become obsolete
+and a candidate for the feature removal schedule.
+
+But that's a while off yet.
-- 
cgit v1.2.3