Linux Foundation
This set of notes were taken from the Linux Foundation Course: Introduction to Linux (LFS101x) as well as some enrichments from linux.vbird.org
About Linux¶
Linux History¶
Linux's inception was in 1991, created by Linus Torvalds and lead maintainer Greg Kroah-Hartman. Linux is initially developed on and for Intel x86-based personal computers. It has been subsequently ported to an astoundingly long list of other hardware platforms.
In 1992, Linux was re-licensed using the General Public License (GPL) by GNU (a project of the Free Software Foundation or FSF, which promotes freely available software), which made it possible to build a worldwide community of developers.
The Linux distributions created in the mid-90s provided the basis for fully free computing (in the sense of freedom, not zero cost) and became a driving force in the open source software movement.
The success of Linux has catalyzed growth in the open source community, demonstrating the commercial efficacy of open source and inspiring countless new projects across all industries and levels of the technology stack.
Today, Linux powers more than half of the servers on the Internet, the majority of smartphones, consumer products, automobiles, and all of the world’s most powerful supercomputers.
Linux Philosophy¶
Linux borrows heavily from the well-established UNIX operating system. It was written to be a free and open source system to be used in place of UNIX, which at the time was designed for computers much more powerful than PCs and was quite expensive.
Files are stored in a hierarchical filesystem, with the top node of the system being the root or simply "/". Whenever possible, Linux makes its components available via files or objects that look like files.
Processes, devices, and network sockets are all represented by file-like objects, and can often be worked with using the same utilities used for regular files.
Linux is a fully multitasking, multiuser operating system, with built-in networking and service processes known as daemons in the UNIX world.
Linux stands for Linux is not UNIX.
Linux Terminology¶
- kernel - brain of the Linux OS, controls hardware and let applications interacts with hardware
- distribution (Distro) - collection of programs combined with Linux kernel to make up a Linux-based OS
- boot loader - a program boots the OS, i.e. GRUB, ISOLINUX
- service - a program runs as a background process, i.e. httpd, nfsd, ntpd, ftpd, named
- filesystem - the method for storing and organizing files in Linux, i.e. ext3, ext4, FAT, XFS, Btrfs
- X Window System - provides standard toolkit and protocal to build graphical UI on all Linux Distro
- desktop environment - a graphical user interface on top of the OS, i.e. GNOME, KDE, Xfce, Fluxbox
- command line - interface for typing commands on top of OS
- shell - command line interpreter that interprets the command line input and instructs the OS to perform tasks, i.e. bash, tcsh, zsh
Linux Distributions¶
Linux is constantly evolving, both at the technical level (including kernel features) and at the distribution and interface level.
A full Linux distribution consists of the kernel plus a number of other software tools for file-related operations, user management, and software package management. Linux distributions may be based on different kernel versions.
Examples of other essential tools and ingredients provided by distributions include the C/C++ compiler, the gdb debugger, the core system libraries applications need to link with in order to run, the low-level interface for drawing graphics on the screen, as well as the higher-level desktop environment, and the system for installing and updating the various components, including the kernel itself.
Three widely used Linux distributions (all distributions found here):
- Red Hat Enterprise Linux (RHEL) Family - CentOS, Fedora, Oracle Linux
- SUSE Family - SLES, openSUSE
- Debian Family - Ubuntu, Linux Mint
All major distributors provide update services for keeping your system primed with the latest security and bug fixes, and performance enhancements, as well as provide online support resources.
RHEL¶
RHEL is the most popular Linux distribution in enterprise environments. Some facts:
- Fedora is opensource version of RHEL, shipped with lots more software, and serves as an upstream testing platform for RHEL.
- CentOS is a close clone of RHEL now owned by Red Hat, while Oracle Linux is mostly a copy with some changes
- A heavily patched version 3.10 kernel is used in RHEL/CentOS 7, while version 4.18 is used in RHEL/CentOS 8.
- It supports hardware platforms such as
Intel x86
,Arm
,Itanium
,PowerPC
, andIBM System z
. - It uses the
yum
anddnf
RPM-based yum package managers to install, update, and remove packages in the system.
CentOS is a popular free alternative to RHEL and is often used by organizations that are comfortable operating without paid technical support.
SUSE¶
SUSE is an acronym for Software- und System-Entwicklung (Software and Systems Development). And SLES stands for SUSE Linux Enterprise Server. Some facts:
- SLES is upstream for openSUSE.
- Kernel version 4.12 is used in openSUSE Leap 15.
- It uses the RPM-based
zypper
package manager to install, update, and remove packages in the system. - It includes the
YaST
(Yet Another Setup Tool) application for system administration purposes. - SLES is widely used in retail and many other sectors.
Debian¶
Debian provides by far the largest and most complete software repository to its users of any Linux distribution, and has a strong focus on stability. Some facts:
- The Debian family is upstream for Ubuntu, and Ubuntu is upstream for Linux Mint and others.
- Debian is a pure open source community project not owned by any corporation.
- Kernel version 4.15 is used in Ubuntu 18.04 LTS.
- It uses the DPKG-based
APT
package manager (usingapt
,apt-get
,apt-cache
, etc.) to install, update, and remove packages in the system. - Ubuntu has been widely used for cloud deployments.
- While Ubuntu is built on top of Debian and is GNOME-based under the hood, it differs visually from the interface on standard Debian, as well as other distributions.
Ubuntu and Fedora are widely used by developers and are also popular in the educational realm.
How Linux Works¶
Boot Process¶
The Linux boot process is the procedure for initializing the system, from pressing the power switch to a fully operational user interface.
Power ON
-> BIOS
--> Master Boot Record (MBR)
---> Boot Loader
----> Kernel
-----> Initial RAM disk
------> /sbin/init (parent process)
-------> Command Shell using getty
--------> X Windows System (GUI)
BIOS¶
BIOS stands for Basic Input/Output System. It runs and initializes the I/O hardware such as screen and keyboard, and tests the main memory, a process called POST (Power On Self Test).
BIOS software is stored on a ROM chip on the motherboard.
MBR and Boot Loader¶
After POST, control is passed to the boot loader, usually stored on one of the hard disks either in the boot sector (MBR) or the EFI/UEFI partition ((Unified) Extensible Firmware Interface).
Date, time, and other peripherals are loaded from the CMOS values (from a battery-powered memory store which allows the machine track date and time when powered off).
MBR¶
MBR is just 512 bytes in size which holds the boot loader. The boot loader examines the partition table and finds a bootable partition, then search for a second stage boot loader and loads it into RAM.
EFI/UEFI¶
EFI/UEFI boot method has firmware reads its Boot Manager data and determine the UEFI application to launch and the disk and partition to launch it from.
Second stage boot loader¶
The second stage boot loader resides under /boot
. Common boot loaders:
- GRUB - GRand Unified Boot Loader, on most machines and Linux distributions
- ISOLINUX - for booting from removable media
- DAS U-Boot - for booting on embedded devices and appliances
When booting Linux, the boot loader is responsible for loading and uncompress the kernel image and load the initial RAM disk, filesystem, or drivers into memory. Most boot loaders provide an UI for choosing boot options or other OS to boot into.
Initial RAM Disk¶
The initramfs filesystem image is a RAM-based filesystem which contains programs and binary files that perform all actions needed to provide kernel functionality, locating devices and drivers, mount the proper root filesystem, and check for filesystem errors.
The mount program instructs the OS that a filesystem is ready for use, and associates it with a particular point in the overall hierarchy of the filesystem (the mount point).
Then initramfs is cleared from RAM and /sbin/init
is executed, which handles the mounting and pivoting over to the final real root filesystem. It ten starts a number of text-mode login prompts (ttys) which allow you to type in username and password to get a command shell.
Most distributions start six text terminals and one graphics terminal, and swith with CTRL-ALT + F1~F7
. The default command shell is bash
(the GNU Bourne Again Shell).
Linux Kernel¶
When the kernel is loaded in RAM, it immediately initializes and configures the computer’s memory and all the hardware attached to the system, and loads some necessary user space applications.
/sbin/init
is only ran after kernel set up all hardware and mounted the root filesystem. It is the origin of all non-kernel processes and is responsible for keeping the system running and for clean shutdowns.
SysVinit¶
In older distros, this process startup follows a System V UNIX convention (aka SysVinit) where the system pass through a serial process of runlevels containing collections of scripts that start and stop services. Each runlevel supports a different mode of running the system. Within each runlevel, individual services can be set to run, or to be shut down if running. This startup method is slow and does NOT use the parallel processing benefit from multi-core processors.
systemd¶
Major recent distros have moved away from runlevels and use systemd and Upstart. Upstart was developed by Ubuntu in 2006, adopted in Fedora 9 in 2008 and RHEL 6. systemd was adopted by Fedora in 2011, by RHEL 7 and SUSE, and by Ubuntu 16.04.
All distros now use systemd. Complicated startup shell scripts are replaced with simpler configuration files, which enumerate what has to be done before a service is started, how to execute service startup, and what conditions the service should indicate have been accomplished when startup is finished. /sbin/init
just points to /lib/systemd/systemd
.
- Starting, stopping, restarting a service
$ sudo systemctl start|stop|restart nfs.service
- Enabling or disabling a system service from starting up at system boot
$ sudo systemctl enable|disable nfs.service
.service
can be omitted.
Linux Filesystem¶
A filesystem is a method of storing/finding files on a hard disk (usually in a partition). A partition is a physically contiguous section of a disk. Partition is like a container in which a filesystem resides.
By dividing the hard disk into partitions, data can be grouped and separated as needed. When a failure or mistake occurs, only the data in the affected partition will be damaged, while the data on the other partitions will likely survive.
Different types of filesystems supported by Linux:
- Conventional disk filesystems: ext2, ext3, ext4, XFS, Btrfs, JFS, NTFS, etc.
- Flash storage filesystems: ubifs, JFFS2, YAFFS, etc.
- Database filesystems
- Special purpose filesystems: procfs, sysfs, tmpfs, squashfs, debugfs, etc.
- other filesystems: ntfs, FAT, vfat, hfs, hfs+
It is often the case that more than one filesystem type is used on a machine, based on considerations such as the size of files, how often they are modified, what kind of hardware they sit on and what kind of access speed is needed, etc.
Linux filesystem use a standard layout, Filesystem Hierarchy Standard (FHS), which uses /
to separate paths and does not have drive letters. File names are case-sensitive.
Multiple drives and/or partitions are mounted as directories in the single filesystem, called mount points. Mount points are usually empty. The mount
command is used to attach a filesystem somewhere within the filesystem tree, i.e. sudo mount /dev/sda5 /home
. umount
does the opposite.
fstab
at /etc/fstab
can be used to configure auto mount disks at system start up. df -Th
can be used to display information about mounted filesystems, type, and usage statistics.
In Linux, all open files are represented internally by what are called file descriptors. Simply put, these are represented by numbers starting at zero. stdin is file descriptor 0, stdout is file descriptor 1, and stderr is file descriptor 2. Typically, if other files are opened in addition to these three, which are opened by default, they will start at file descriptor 3 and increase from there.
NFS¶
A network filesystem (NFS) may have all its data on one machine or have it spread out on more than one network node. It is used to share data across physical systems which may be either in the same location or anywhere that can be reached by the Internet.
NFS can be started as daemon with sudo systemctl start nfs
. The text file /etc/exports
configures the directories and permissions that a host is sharing with other systems over NFS. After updating the config file while nfs is running, use exportfs -av
to notify NFS to re-apply the configuration.
# example entry in /etc/exports
/projects *.example.com(rw)
# mount /projects using NFS with read and write permissions
# and share within the example.com domain
Client machine can mount the remote directory via NFS by
mkdir -p /mnt/nfs/projects
sudo mount <server_hostname/IP>:/projects /mnt/nfs/projects
# let system boot auto mount the remote directory
# add to /etc/fstab
<server_hostname/IP>:/projects /mnt/nfs/projects nfs defaults 0 0
Directories under /¶
directory | function | examples |
---|---|---|
/bin (might link to /usr/bin) | essential commands used to boot the system or in single-user mode, and required by all system users | cat, cp, ls, mv, ps, rm |
/sbin (might link to /usr/sbin) | essential binaries related to system administration | fsck, ip |
/proc (a type of pseudo-filesystem) | no permanent presence on the disk, contains virtual files (in memory) for constantly changing runtime system information | system memory, devices mounted, hardware configs |
/dev | contains device nodes, pseudo-file that is used by most hardware and software devices | sda1 (first partition on the first hard disk), lp1 (second printer), random (source of rangom numbers), null (special file to safely dump unwanted data) |
/var | contains files that are expected to change in size and content as the system is running | log (system log files), lib (packages and database files), spool (print queue), tmp (temporary files), ftp (FTP service), www (HTTP web service) |
/var/cache | program execution generated temp cache file | |
/var/lib | data file when program executes | |
/var/lock | lock files for programs to prevent simultaneous modification of files | |
/var/log | log files, including the login record for who used this system | |
/var/mail | personal mail | |
/var/run | storing PIDs after process started | |
/var/spool | stores some queue data, that is something queued up for process to use in order. Often deleted after use. | |
/etc | home for system configuration files or scripts, only for the superuser | passwd, shadow, group (for managing user accounts), resolv.conf (DNS settings) |
/boot | essential files needed to boot the system | vmlinuz (compressed Linux kernel), initramfs/initrd (initial RAM filesystem), config (kernel config file), System.map (kernel symbol table), grub.conf (boot loader config) |
/lib and /lib64 (might link to /usr/lib) | contains kernel modules and common code shared by applications and needed for them to run, mostly known as dynamically loaded libraries (aka Shared Objects) | libncurses.so.5.9 |
/media, /run, /mnt | either one can be used for mounting removable media onto the system | NFS, loopback filesystems, USB drive |
/opt | optional application software packages | |
/sys | virtual pseudo-filesystem giving information about the system and the hardware | |
/srv | site-specific data served up by the system | |
/tmp | temporary files; on some distributions erased across a reboot and/or may actually be a ramdisk in memory | |
/usr | stands for Unix Software Resource, for sharing in the Linux's multi-user setup | applications, utilities and data, mostly static files |
/usr/bin | This is the primary directory of executable commands on the system | |
/usr/include | Header files used to compile applications | |
/usr/lib | Libraries for programs in /usr/bin and /usr/sbin | |
/usr/lib64 | 64-bit libraries for 64-bit programs in /usr/bin and /usr/sbin | |
/usr/local | Data and programs specific to the local machine. Subdirectories include bin, sbin, lib, share, include, etc. | |
/usr/sbin | Non-essential system binaries, such as system daemons | |
/usr/share | Shared data used by applications, generally architecture-independent | |
/usr/src | Source code, usually for the Linux kernel |
Some directories to consider for larger space allocation via partitioning:
- /
- /usr
- /home
- /var
- /tmp
- Swap
Use basename
on a file to get the file's name; use dirname
on a file to get the full path to this file's belonging directory
inode and block¶
- superblock: records this filesystem's information, including number of inode/block, used amount, remaining amount, filesystem format, etc.
- inode: records file-specific properties and records the block number of this record
- each record use one inode
- each 128 bytes
- for a large file, its inode records one block number for which that block records twelve additional direct block numbers, one redirect block number, and one triple-redirect block number. P247 Book.
- block: records the actual content of the file, may span to multiple blocks for larger files
- knowing an inode can know its block number.
- this way, data saved onto multiple continuous blocks can be read in sequence within a short amount of time, this is called localization
- to compensate possible large file system's performance, block group is used to divide the storage into block groups, for each having a separate inode/block/superblock system.
data block stores file and data. block size: 1K, 2K or 4K
- small block size may cause larger file use more block and inodes
- large block size may create many blocks not fully utilized
inode/block bitmaps records and track used and unused blocks and inodes which gives fast lookup and fast search for unused block/inode
i.e. the process of reading a file at /etc/passwd
:
- filesystem find
/
inode - filesystem locate
/
block and look foretc
inode - find
etc
inode and check whether current user hasrx
access - find
etc
block and look forpasswd
inode - find
passwd
inode and check whether current user hasr
access - read
passwd
block content
Journaling filesystem: during sudden power outage during writing to the disk, disk data and real data can be inconsistent.
To deal with this and prevent a whole scan of the filesystem, a journaling filesystem helps by:
- record each write to the filesystem in a log.
- preparation, writing, and completion are supposed to be recorded for each write.
- if anything happens, can quickly check the journal to find which file is wrong.
This is available in ex3 filesystem on Linux. It can help servers recover faster from power outage.
Commonly seen devices¶
Device | name in Linux |
---|---|
IDE | /dev/hd[a-d] |
SCSI/SATA/USB | /dev/sd[a-p] |
ROM | /dev/fd[0-1] |
printer | /dev/lp[0-2] or /dev/usb/lp[0-15] |
mouse | /dev/usb/mouse[0-15] or /dev/psaux |
CDROM/DVDROM | /dev/cdrom |
current mouse | /dev/mouse |
tape | /dev/ht0 (IDE) or /dev/st0 (SCSI) |
Searching commands¶
file
command gives information on what kind of file it is
which
command gives exact path location of the command inspected
whereis
and locate
can be used to find files. These two commands use database mapping to lookup and is therefore faster
find
can search files physically in the harddrive, can be slow and expensive
find [PATH] [option] [action]
- some freq used options
-mtime n
: n is a number, means day. It makes a huge difference between adding[+]
or[-]
before the number: + means older than n days, - means within past n days, and neither, means exact n days ago.-newer
:find /dir1 -newer /dir1/file
finds files newer than /dir1/file-atime
,-ctime
similar as-mtime
-perm
: find files with/above/below certain access rights
Comparing Files¶
diff
is used to compare files and directories. cmp
can be used fro comparing binary files. You can compare three files at once using diff3
, which uses one file (second file argument) as the reference basis for the other two.
Some common diff options
diff Option | Usage |
---|---|
-c | Provides a listing of differences that include three lines of context before and after the lines differing in content |
-r | Used to recursively compare subdirectories, as well as the current directory |
-i | Ignore the case of letters |
-w | Ignore differences in spaces and tabs (white space) |
-q | Be quiet: only report if files are different without listing the differences |
Many modifications to source code and configuration files are distributed utilizing patches with the patch
program. A patch file contains the deltas (changes) required to update an older version of a file to the new one. Use `
diff -Nur originFile newFile > patchFile # create a patch file
patch -p1 < patchFile # apply a patch file to an entire directory tree
patch originFile patchFile # apply patch on one file
In Linux, a file's extension does not categorize it, most applications directly examine a file's contents to see what kind of object it is rather than relying on an extension. Use file
utility to assert the real nature of a file.
Backing up Data¶
While simple cp
can help back up files or entire directory, rsync
is more robust to synchronize directory trees, using the -r
option, i.e. rsync -r sourceDir destinationDir
.
rsync
checks if the file being copied already exists and skips copy if there is no change in size or modification time, therefore avoids unnecessary operations and saves time. Furthermore, rsync
only copies the parts of files that actually changed and is very fast. rsync
can also copy files from one machine to another in the form of user@host:filepath
. A good combination of options rsync --progress -avrxH --delete sourceDir destinationDir
Note that rsync
could be destructive if not used properly, as a lot of files could be created at the target and it might use up all the space. Always use the -dry-run
option to know what will be done before executing it.
The Disk-to-Disk Copying program dd
is very useful for making exact copies of raw disk space. Mostly used to backup a MBR, create a disk image, or install and OS, i.e. dd if=/dev/sda of=sda.mbr bs=512 count=1
Compressing Data¶
File data is often compressed to save disk space and reduce the time it takes to transmit files over networks. Some good compression programs:
- gzip - the most frequently used Linux compression utility
gzip
to compress andgunzip
orgzip -d
to decompress- compresses very well and is very fast, produces
.gz
files
- bzip2 - produces files significantly smaller than those produced by gzip but takes longer
bzip2
to compress andbunzip2
orbzip2 -d
to decompress- produces
.bz2
files
- xz - the most space-efficient compression utility used in Linux
xz
to compress andxz -d
to decompress- used to store archives of Linux kernel
- zip - often required to examine and decompress archives from other operating systems
zip
to compress andunzip
to decompress
- tar - group files in an archive and then compress the whole archive at once
tar czf
to compress withgzip
and givesxxx.tar.gz
tar cjf
to compress withbz2
and givesxxx.tar.bz2
tar cJf
to compress withxz
and givesxxx.tar.xz
tar xf
to decompress, no need to pass the option to tell it which format- mostly used to archive files to a magnetic tape
Generally, the more space-efficient techniques take longer. Decompression time does NOT vary as much across different methods.
du
can be used to check file sizes and total size for a directory. Use du -shc [list of files and dirs]
to get a quick overview of selected files/dirs sizes.
Use utilities such as zcat, zless, zdiff, zgrep
to work directly with compressed files
X Window System¶
The X Window System (aka X) is loaded as one of the final steps in the boot process. It can also be started from text-mode by the startx
command, or commands to start the display manager gdm, lightgdm, kdm, xdm
.
A service called the Display Manager keeps track of the displays being provided and loads the X server. The display manager also handles graphical logins and starts the appropriate desktop environment after a user logs in.
A desktop environment consists of a session manager, which starts and maintains the components of the graphical session, the window manager, which controls the placement and movement of windows, window title-bars, and controls, and a set of utilities.
X is old software and has deficiencies in security. A newer system, Wayland, is superseding it and is used on Fedora, RHEL 8, and other Distros.
For Distros using gnome-based X winodw manager, use gnome-tweak-tool
to customize and remap keys.
Package Management Systems¶
A Package Management System distributes packages that each contains the files and other instructions needed to make one software component work well and cooperate with the other components that comprise the entire system. Two broad families of package managers: Debian and RPM.
A package management system operates on two levels:
- low-level tool, such as
dpkg, rpm
, is responsible for unpacking individual packages, running scripts, getting the software installed correctly - high-level tool, such as
apt, yum, dnf, zypper
, works with groups of packages, downloads packages from the vendor, and figures out dependenciesapt
stands for Advanced Packaging Tool, used on Debian-based systemsyum
stands for Yellowdog Updater Modified, is an open source tool for RPM-compatible Distrosdnf
aka Dandified YUM, is also RPM-based and used on Fedora and RHEL 8 systemszypper
is RPM-based and used on openSUSE
Operation | RPM | debian |
---|---|---|
Install package | rpm -i foo.rpm | dpkg --install foo.deb |
Install package, dependencies | yum install foo | apt-get install foo |
Remove package | rpm -e foo.rpm | dpkg --remove foo.deb |
Remove package, dependencies | yum remove foo | apt-get autoremove foo |
Update package | rpm -U foo.rpm | dpkg --install foo.deb |
Update package, dependencies | yum update foo | apt-get install foo |
Update entire system | yum update | apt-get update && apt-get upgrade or apt-get dist-upgrade |
Show all installed packages | rpm -qa or yum list installed | dpkg --list |
Get information on package | rpm -qil foo | dpkg --listfiles foo |
Show packages named foo | yum list "foo" | apt-cache search foo |
Show all available packages | yum list | apt-cache dumpavail foo |
What package is <file> part of? | rpm -qf <file> | dpkg --search <file> |
Package documentation is directly pulled from the upstream source code and placed under the /usr/share/doc
directory, grouped in subdirectories named after each package, perhaps including the version number in the name.
Linux Documentation¶
info
pages¶
info
is the other form of documentation pages besides man
.
Navigation within info
:
Action | Keys |
---|---|
Next Line | CTRL-n or |
Previous Line | CTRL-p or |
Beginning of Line | CTRL-a or |
End of Line | CTRL-e or |
Forward 1 Character | CTRL-f or |
Backward 1 Character | CTRL-b or |
Forward 1 Word | ALT-f or CTRL- |
Backward 1 Word | ALT-b or CTRL- |
Beginning of Node | ALT-< |
End of Node | ALT-> |
End of Current Node | e Quit q |
Action | Keys |
---|---|
Next Node | n |
Previous Node | p |
Up a Node | u |
Last Node Viewed | l |
Top Node | t |
Directory Node | d |
First Node | < |
Last Node | > |
Global Next Node | ] |
Global Previous Node | [ |
Action | Keys |
---|---|
Search Forward | s(string) or /(string) |
Search Backwards | ?(string) |
Search Case-Sensitive | S(string) |
Next word in Search | n |
Next word Case-Sensitive | N |
Interactively Search Forwards | CTRL-s(string) |
Interactively Search Backwards | CTRL-r(string) |
Index Search | i(string) |
Next Index Search | , |
Ask the man
¶
man
is short for "manual". man pages are present on all Linux distributions and offer in-depth documentation about many programs and utilities, as well as other topics, including configuration files, and programming APIs for system calls, library routines, and the kernel.
The man pages are divided into chapters numbered 1 through 9. In some cases, a letter is appended to the chapter number to identify a specific topic.
It is common to have multiple pages across multiple chapters with the same name, especially for names of library functions or system calls. The chapter number can be used to force man to display the page from a particular chapter, i.e. man 2 socket
. Display all pages with -a
option.
The man
program searches, formats, and displays the information contained in the man page system. To list all pages on the topic, use -f
option (same result as whatis
). To list all pages that discuss a specified topic, use the –k
option (same result as apropos
).
Man page number¶
Numb | Meaning |
---|---|
1 | shell executables or commands (important) |
2 | functions for kernels |
3 | library or libc functions |
4 | device manuals, often under /dev |
5 | setting file or format (important) |
6 | games |
7 | protocols |
8 | system administrator's commands (important) |
9 | kernel files |
navigation (less
)¶
Keys | Functions |
---|---|
[Space] | next page |
[PageDown] | next page |
[PageUp] | next page |
[Home] | first page |
[End] | last page |
/string | search string after current position |
?string | search string before current position |
n, N | when searching, find next matching entry |
q | quit |
search man pages¶
- To search for a specific man page, use
man -f command_name
- To find any man page related to a term, use
man -k searching_term
, which will return all man-pages contain this phrase
Info Page¶
Info Page is a Linux specific feature that displays help doc like small paragraphs(pages), like a web-page. Use info command
There are lots of information about the page displayed, including the progress of viewing the entire doc.
Keys | Functions |
---|---|
[Space] | next page |
[PageDown] | next page |
[PageUp] | next page |
[Home] | first page |
[End] | last page |
[b] | move cursor to the first node in current screen |
[e] | move cursor to the last node in current screen |
[n] | next node |
[p] | previous node |
[u] | upper layer |
[s] or [/] | search in current info page |
[h] | show help |
[?] | view commands |
[q] | exit |
Additionally, /usr/share/doc/
usually contains many documentation docs
GNU Info System¶
This is the GNU project's standard documentation format, which it prefers as an alternative to man
. The Info System is free-form, and its topics are connected using links.
You can view help for a particular topic by typing info <topic name>
, or view a top level index of available topics. The system then searches for the topic in all available info files. The topic which you view in an info page is called a node. You can move between nodes or view each node sequentially. Each node may contain menus and linked subtopics, aka items. Use n
to go to next node, p
for previous node, and u
for moving one node up in the index.
Items function like browser links and are identified by an asterisk () at the beginning of the item name. Named items (outside a menu) are identified with double-colons* (::) at the end of the item name. Items can refer to other nodes within the file or to other files.
--help option¶
Most commands have an available short description which can be viewed using the --help
or the -h
option along with the command or application, which offers a quick reference and it displays information faster than the man
or info
pages.
Process¶
A process is simply an instance of one or more related tasks (threads) executing on your computer. A single command may start several processes simultaneously. Some processes are independent of each other and others are related.
- program: usually binary program, stored within physical media like hard-drives
- process: when a program is executed, executor's access and program data being loaded into the memory and OS gets assigned a PID
fork
andexec
- system fork a parent process as temporary process to execute the child program
- a PID is assigned and PPID is the parent's PID
- temporary process exec the child program and becomes the child process
Processes use many system resources, such as memory, CPU cycles, and peripheral devices, such as network cards, hard drives, printers and displays. The OS (especially the kernel) is responsible for allocating a proper share of these resources to each process and ensuring overall optimized system utilization.
Types¶
Process Type | Description | Example |
---|---|---|
Interactive Processes | Need to be started by a user, either at a command line or through a graphical interface such as an icon or a menu selection. | bash, firefox, top |
Batch Processes | Automatic processes which are scheduled from and then disconnected from the terminal. These tasks are queued and work on a FIFO (First-In, First-Out) basis. | updatedb, ldconfig |
Daemons | Server processes that run continuously. Many are launched during system startup and then wait for a user or system request indicating that their service is required. | httpd, sshd, libvirtd |
Threads | Lightweight processes. These are tasks that run under the umbrella of a main process, sharing memory and other resources, but are scheduled and run by the system on an individual basis. An individual thread can end without terminating the whole process and a process can create new threads at any time. Many non-trivial programs are multi-threaded. | firefox, gnome-terminal-server |
Kernel Threads | Kernel tasks that users neither start nor terminate and have little control over. These may perform actions like moving a thread from one CPU to another, or making sure input/output operations to disk are completed. | kthreadd, migration, ksoftirqd |
Scheduling¶
The kernel scheduler constantly shifts processes on and off the CPU, sharing time according to relative priority, how much time is needed and how much has already been granted to a task. Some process states:
- running state means the process is either currently executing instructions on a CPU, or is waiting to be granted a share of time. All processes in this state reside on what is called a run queue. For machines with multi-core CPUs, there is a run queue on each core.
- sleep state means the process is waiting for something to happen before it can resume. It is said to be sitting on a wait queue.
- zombie state means when a child process is completed but its parent process has not asked about its state, then it still shows up in the system's list of processes but not really alive.
The OS assigns each process an unique process ID (PID) to track process state, CPU usage, memory use, precisely where resources are located in memory, and other characteristics. You can terminate a process by issuing kill -SIGKILL <pid>
, kill -9 <pid>
, or kill -SIGTERM
ID Type | Description |
---|---|
Process ID (PID) | Unique Process ID number |
Parent Process ID (PPID) | Process (Parent) that started this process. If the parent dies, the PPID will refer to an adoptive parent; on recent kernels, this is kthreadd which has PPID=2. |
Thread ID (TID) | Thread ID number. This is the same as the PID for single-threaded processes. For a multi-threaded process, each thread shares the same PID, but has a unique TID. |
Users and Groups¶
The OS identifies the user who starts a process by the Real User ID (RUID) assigned to the user. The user who determines the access rights for the users is identified by the Effective UID (EUID). EUID may not be the same as the RUID in some situations.
Users can be categorized into various groups. Each group is identified by the Real Group ID (RGID). The access rights of the group are determined by the Effective Group ID (EGID).
Priority and NICE¶
The priority (PRI) for a process can be set by specifying a nice value, aka niceness (NI). The lower the nice value, the higher the priority. Higher priority processes grep preferential access to the CPU, therefore more CPU time.
In Linux, a nice value of -20 represents the highest priority and +19 represents the lowest. This convention was adopted from UNIX.
You can view the nice values using ps -lf
and use renice +5/-5 <pid>
to set the nice value. Parent process's nice value change also affects its child process's nice value.
root can change all process NI
, while a normal user can only adjust owning process NI
within [0, 19]
and can only adjust NI
to a higher value, with command nice [-n numb] command
or adjust existing process with renice [numb] PID
. NI
adjustments will be passed from parent process to child
The load average is displayed using three numbers (i.e. 0.45, 0.17, and 0.12) with command w
, interpreted as CPU utilization within last minute, 5 minutes before, and 15 minutes before.
background process¶
You can put a job in the background by suffixing & to the command, i.e. updatedb &
. Use CTRL-Z
to suspend a foreground job and bg
to put it running in the background. Use fg
to bring a process back to foreground, and jobs
to see a list of background jobs (-l
option for showing PIDs).
ps
command¶
For the BSD variation of ps
command, use ps aux
to display all processes of all users, and use ps axo <attributes>
to specify a list of attributes to view.
For the SystemV variation of ps
command, options need the dash prefixes and are different.
Several useful ps
combination should be remembered:
ps -l
shows only your process related to this bash. Some columns explained:
F
represents process flags, means this process's access- 4 means root
- 1 means forked but not exec
S
represents Status- R: running
- S: sleep, idle, can be signaled to wakeup
- D: usually doing I/O, cannot be wakeup
- T: stop, might be under job control
- Z: zombie, process terminated but cannot be moved out of memory
C
represents CUP usage percentagePRI/NI
is short for priority/nice, means the priority for CPU to execute it. Smaller number means higher priorityADDR/SZ/WCHAN
related to memory, ADDR is a kernel function showing which part of memory; SZ means size; WCHAN means whether it is running ('-' means running)TTY
: user's terminal from logged inTIME
: CPU time usedCMD
: command
ps aux
shows all process. Some columns explained:
USER
the process belongs toPID
that process has%CPU
usage%MEM
usageVSZ
virtual memory usage (Kbytes)RSS
physical memory usage (Kbytes)TTY
from which terminal, if pts/n, means logged in from remote terminalSTAT
, status, shows the same asps -l
TIME
, actual CPU usage in time unitCOMMAND
, which command triggered
ps -axjf
shows all processes in a tree view fashion
pstree
displays the processes running on the system in the form of a tree diagram showing the relationship between a process and its parent process and any other processes that it created, and threads displayed within {}
.
pstree [-A|U] [-up]
-A
: use ASCII char to represent tree-U
: use UTF char to represent tree-p
: show process PID-u
: show process user
top
command¶
top
gives an over view of system performance live over time.
The first line of the top output displays a quick summary of what is happening in the system:
- How long the system has been up
- How many users are logged on
- What is the load average
- load average of 1.00 per CPU indicates a fully subscribed system
- if greater than 1, the system is overloaded and processes are competing for CPU time
- if very high, it indicates the system may have a runaway process (non-responding state)
The second line displays the total number of processes, the number of running, sleeping, stopped, and zombie processes.
The third line indicates how the CPU time is being divided by displaying the percentage of CPU time used for each:
- us - CPU for user initiated processes
- sy - CPU for kernel processes
- ni - niceness, CPU for user jobs running at a lower priority
- id - idle CPU
- wa - waiting, CPU for jobs waiting for I/O
- hi - CPU for harware interrupts
- si - CPU for software interrupts
- st - steal time, used with virtual machines, which has some of its idle CPU time taken for other users
The fourth and fifth lines indicate memory usage, which is divided in two categories and both displays total memory, used memory, and free space:
- Physical memory (RAM) on line 4.
- Swap space on line 5.
Once the physical memory is exhausted, the system starts using swap space (temporary storage space on the hard drive) as an extended memory pool, and since accessing disk is much slower than accessing memory, this will negatively affect system performance.
top [-d numb] | top [-bnp]
-d
: screen refresh rate at seconds-b
: exec top in order, used with data redirection-n
: used with -b, number of times top outputs-p
: specify some PID for monitoring- commands in
top
:?
: shows available commandsP
: arrange by CPU usageM
: arrange by Memory usageN
: arrange by PIDT
: arrange by CPU timek
: send one PID a signalr
: send one PID new nice valueq
: quit
free
command¶
free [-b|-k|-m|-g] [-t]
shows memory usage
-b|-k|-m|-g
, by default output shows in unit Kbytes, use this to override to bytes, Mbytes, Gbytes-t
, shows physical and swap memory as well
uname
command¶
uname [-asrmpi]
checks system and core information
-a
: all system related information will be shown-s
: system core name-r
: system core version-m
: system hardware architecture-p
: CPU type-i
: hardware platform
netstat
command¶
netstat -[atunlp]
can track network usage on a process level
-a
: show current system's all network, listening port, sockets-t
: list tcp packet data-u
: list udp packet data-n
: show service by port number-l
: list services being listened-p
: show services with PID
vmstat
¶
vmstat
can track system resource changes
-a [delay [total examine times]]
shows active/inactive replace buffer/cache info-f
show number of forks-s
show memory changes-S <unit>
useK/M
replace bytes-d
show number of disk read/write-p <partition>
show a partition read/write stats- categories shown: procs, memory, swap, io, system, cpu
procs
: the more of r and b, the busier the system- r: process waiting to run
- b: un-wakeable processes
memory
: like shown byfree
- swpd: virtual memory usage
- free: unused mem
- buff: buffer storage
- cache: high-speed cache
swap
: when si and so get larger, system is short of memory- si: amount taken from disk
- so: amount written into swap
io
: when bi and bo get larger, system is doing lots of I/O- bi: blocks read from disk
- bo: blocks written into disk
system
: when in and cs get larger, system communicates with external devices quite often- in: processes interrupted per second
- cs: context-switch times per second
cpu
:- us: non-core usage of CPU
- sy: core usage of CPU
- id: idle status
- wa: wait I/O CPU waste
- st: virtual machine CPU usage.
fuser
command¶
fuser [-umv] [-k [i] [-signal]] file/dir
can find out which process is using which file/directory, from the point of the file/directory
-u
: show both PID and process owner-m
: increase priority of the file-v
: show each file and process related-k
: show the process using this file/dir, and signal kill to the process-i
: use with -k, ask for decision before kill the process-<signal>
: send a signal code- What will be shown is
USER PID ACCESS COMMAND
- the ACCESS represents:
c
: the process is under current directorye
: can be executedf
: is an opened filer
: is the root directoryF
: the file is opened but pending completem
: sharable dynamical library
lsof
command¶
lsof [-aUu] [+d]
lists which process is using which files
-a
: show when all criteria satisfied-U
: show only Unix like system's socket files-u username
: list files opened by the user+d directory
: list files opened under a directory
pidof
command¶
pidof [-sx] program_name
list the active PIDs of a program
-s
: show only one, not all of the PIDs-x
: show also the program's possible parent PID (PPID)
Process List¶
Process list shows information about each process. By default, processes are ordered by highest CPU usage, with other information:
- PID - process id
- USER - process owner
- PR - priority
- NI - nice values
- VIRT - virtual memory
- RES - physical memory
- SHR - shared memory
- S - status
- %CPU - percentage of CPU used
- %MEM - percentage of memory used
- TIME+ - execution time
- COMMAND - command started the process
top
can be used interactively for monitoring and controlling processes
Command | Output |
---|---|
t | Display or hide summary information (rows 2 and 3) |
m | Display or hide memory information (rows 4 and 5) |
A | Sort the process list by top resource consumers |
r | Renice (change the priority of) a specific processes |
k | Kill a specific process |
f | Enter the top configuration screen |
o | Interactively select a new sort order in the process list |
Schedule Processes¶
at and sleep¶
Use at
program to execute any non-interactive command at a specified future time for once.
$ at now + 2 days
at> cat file1.txt
at> <EOT> (CTRL-D)
job 1231 at xxxx-xx-xx xx:xx
Use sleep
to delay execution of a command for a specific period.
sleep NUMBER[SUFFIX]
# SUFFIX can be s(seconds, default if not provided), m(minutes), h(hours), d(days)
cron¶
cron is a time-based scheduling utility program. It can launch routine background jobs at specific times and/or days on an on-going basis. cron is configured at /etc/crontab
(cron table) which contains the various shell commands that need to be run at the properly scheduled times.
cron can be configured with the system-wide or the user-specific crontab. each line of crontab is composed of a CRON expression and a shell command. Use crontab -e
to edit existing or add new jobs.
# CRON expression
MIN HOUR DOM MON DOW CMD
# minute(0-59), hour(0-23), day of month(1-31), month(1-12), day of week(0-6), shell command
System Services (Daemon)¶
System service programs are called daemons. usually the service name with a suffix d
Stand-alone Daemons starts without being managed by other programs. It Stays in the system memory once started, and uses resources. Fast responding to users.
Super Daemon is a single daemon to start other daemons upon request from the client. The daemons started will be closed when the client session ends. i.e. telnet
is a service managed by the super daemon
Each service maps to an unique port and this mapping is in /etc/services
file
To starting up a daemon, it requires an executable, a configuration, and an environment. They are stored at:
/etc/init.d/
: for starting up scripts/etc/sysconfig/
: for initialization environment config/etc/xinetd.conf
,/etc/xinetd.d/
: super daemon config/etc/
: services' configuration files/var/lib/
: services' database files/var/run/
: all services' PID record
service
is a command (in fact, a script) to start, terminate, and monitor any services.
service [service_name] (start|stop|restart|status|--status-all)
Job Control¶
foreground jobs are jobs actively prompting in the terminal and is interactable. Background jobs: the jobs running in the background without interaction with the user. Appending &
to commands will be thrown to the background
- switching jobs: in the middle of running a command, press
ctrl-z
to pause it and throw it to the background - use
jobs
command to check running/stopped jobs- lists process recently put into the background, with (+) means next retrieving job using
fg
and (-) means the second latest job put into hte background jobs [-lrs]
-l
: show PID-r
: show running only-s
: show stopped only
- lists process recently put into the background, with (+) means next retrieving job using
fg
to bring back a job suspended.fg %<jobnumber>
use it without jobnumber will bring back the one with (+)- can also
fg -
to bring back the one with (-)
bg
can make a stopped job running in the background againbg %<jobnumber>
will also append&
to the job command
kill
can remove jobs or restart jobskill -<signal> %<jobnumber>
the<signal>
can be a number or text:-l
: list all kill signals-1
: reload configuration files-2
: like entering ctrl-c to interrupt a process-9
: forced stop-15
: normal termination-17
: like entering ctrl-z to stop a processkill -<signal> PID
also works
killall
can work on all running processes of a command, useful if you don't want bother to lookup its PIDkillall [-iIe] [-signal] [command_name]
-i
: interactive-e
: exact, means the command_name must match-I
: command_name ignore cases
Offline Jobs¶
Notice the background from job control is not "system background", it is just a way to help you run and manage multiple things in the terminal.
If there is need to run a job even after logged out of the system, then offline jobs may help. While at
works for this case, nohup
can also work!
nohup <command>
ornohup <command> &
to run in the background
Linux Users and Groups¶
Linux is a multi-user operating system. To identify the current user, use whoami
. To list the currently logged-on users, use who
or users
. who -a
gives more detailed information.
All Linux users are assigned a unique integer user ID (uid); normal users start with a uid of 1000 or greater. Use id
to get information about current user, and id <username>
can get information from other user.
Linux uses groups for organizing users. Groups are collections of accounts with certain shared permissions, defined in the /etc/group
file. Permissions on various files and directories can be modified at the group level. Users also have one or more group IDs (gid), including a default one which is the same as the user ID.
Groups are used to establish a set of users who have common interests for the purposes of access rights, privileges, and security considerations.
Only the root user can add and remove users and groups. Adding a new user is done with useradd
and removing is done with userdel
. i.e. sudo /usr/sbin/useradd bjmoose
sets the home directory to /home/bjmoose
, populates it with some basic files (copied from /etc/skel
) and adds a line to /etc/passwd
such as: bjmoose:x:1002:1002::/home/bjmoose:/bin/bash
.
Removing a user with userdel
will leave the user home directory, and is good for a temporary inactivation. Use userdel -r
to remove the home directory too.
Similiarly, add a new group with groupadd
and remove with groupdel
. To add a user to a new group, use usermod
. i.e. usermod -aG <newgroup> <username>
. To remove a user from a group, you must give the full list of groups except the one want to remove. i.e. usermod -G <groups>... <username>
.
To temporarily become the superuser for a series of commands, you can use su
and then be prompted for the root password. To execute just one command with root privilege use sudo <command>
.
sudo access priviledge is granted per user and its configuration files are stored in the /etc/sudoers
file and in the /etc/sudoers.d/
directory. By default, the sudoers.d directory is empty.
File Ownership, Permission¶
In Linux, every file is associated with a user who is the owner and a group for whom has the right to acess it in certain ways: read(r), write(w), execute(x)
.
For a file, execute(x)
means whether it can be executed; for a directory it means whether a user can cd
into this directory as working directory. Whether a user can delete a file depends on its access right on the current directory. Must be write(w)
File permission
[-][rwx][r-x][r--]
0 123 456 789
0 - file type
123 - owner access right
456 - group access right
789 - global access right
file types:
-
regular filed
directoryl
linkb
block device file, like a hard-drive; orc
character device file, like a mouse or keyboards
socket, for network datap
pipe, FIFO, allow many process read the same file
chown
is used to change user ownership (and group) of a file or directory, chgrp
for changing group ownership. chmod
is for changing the permissions on the file at user(u) group(g) others(o)
levels.
A single digit is sufficient to specify all three types permission bits for each entity: read(4), write(2), execute(1)
which is the sum of those digits.
umask¶
umask
can be used to disable certain rights for newly created files or directories. i.e. unmask 023
means new files created will NOT have w
for groups and not have wx
for world
Hidden attributes¶
Hidden attributes on a file are useful for security reasons.
lsattr
allows you to view the hidden attributes of a filechattr
allows you to change the hidden attributes of a file-i
means let a file be unchangable-a
allows adding but not changing/deleting old portion of the file
Clean shutdown¶
- use
who
to see who is using current system. - use
netstat -a
to see Internet connections status - use
ps -aux
to see running process in the background
sync
command will sync data into hard drives. It is best to remember to run this command before reboot or shutdown the system.
shutdown
or halt
can done many things such as shutdown, reboot, or enter single-user mode
- set shutdown time, now or in the future
- set shutdown message to online users
- send warning info broadcast. Useful when need to notify others for important messages
- whether use fsck to check file system
shutdown [-t seconds] [-arkhncfF] [time] [warning_info]
- usage below:
Option | Setting |
---|---|
-t sec | shutdown in some seconds |
-k | send warning message without shutting down |
-r | reboot after system services terminate |
-h | shutdown after system services terminate |
-n | shutdown without the init process |
-f | reboot skipping fsck check |
-F | reboot force fsck check |
-c | cancel current shutdown directive |
Linux shell¶
Startup file¶
The command shell program uses one or more startup files to configure the user environment. Files in the /etc
directory define global settings for all users, while initialization files in the user's home directory can include and/or override the global settings. Things can be configured:
- Customizing the prompt
- Defining command line aliases
- Setting the default text editor
- Setting the path for where to find executable programs
Order of startup files evaluation (for user first logs onto the system): /etc/profile
, then ~/.bash_profile
or ~/.bash_login
or ~/.profile
.
Every time you create a new shell, or terminal window, etc., you do NOT perform a full system login; only a file named ~/.bashrc
file is read and evaluated.
PATH
is a variable of an ordered list of directories (the path) which is scanned when a command is given to find the appropriate program or script to run.
Use alias
with no arguments will list currently defined aliases. unalias
will remove an alias. Alias definition needs to be placed within either single or double quotes if it contains any spaces. i.e. alias ls='ls --color -l'
Prompt Statement (the PS1
variable) is used to customize your prompt string in your terminal windows to display the information you want.
Environment Variables¶
Environment variables are quantities that have specific values which may be utilized by the command shell or other utilities and applications. Some are set by the system and others are set by the user, either at the command line or within startup and other scripts.
An environment variable is actually just a character string that contains information used by one or more applications. Use set, env, export
to view the values of currently set environment variables.
Variables created within a script are only available to the current shell; child processes (sub-shells) will NOT have access to values that have been set or modified. Allowing child processes to see the values requires use of the export
command. You can also set environment variables to be fed as a one shot to a command as in: $ SDIRS=s_0* KROOT=/lib/modules/$(uname -r)/build make modules_install
.
Command History¶
bash keeps track of previously entered commands and statements in a history buffer, stored in ~/.bash_history
(each session saves the history in the very end). Recall previous commands using the arrow keys
, search with CTRL-r
, or use history
to view all and use !<number>
to re-execute a past command.
Shell shortcuts¶
Keyboard Shortcut | Task |
---|---|
CTRL-L | Clears the screen |
CTRL-D | Exits the current shell |
CTRL-Z | Puts the current process into suspended background |
CTRL-C | Kills the current process |
CTRL-H | Works the same as backspace |
CTRL-A | Goes to the beginning of the line |
CTRL-W | Deletes the word before the cursor |
CTRL-U | Deletes from beginning of line to cursor position |
CTRL-K | Deletes from cursor position to end of line |
CTRL-E | Goes to the end of the line |
Tab | Auto-completes files, directories, and binaries |
Text Manipulation¶
sed
is abbreviation for stream editor and is a powerful text processing tool and is one of the oldest, earliest and most popular UNIX utilities. It is used to modify the contents of a file or input stream, usually placing the contents into a new file or output stream.
sed can filter text, as well as perform substitutions in data streams.
sed -e command <filename>
- Specify editing commands at the command line, operate on file and put the output on standard out- specify multiple
-e command
s to use perform multiple operations
- specify multiple
sed -f scriptfile <filename>
- Specify a scriptfile containing sed commands, operate on file and put output on standard out
Basic sed substitutions:
Command | Usage |
---|---|
sed s/pattern/replace_string/ file | Substitute first string occurrence in every line |
sed s/pattern/replace_string/g file | Substitute all string occurrences in every line |
sed 1,3s/pattern/replace_string/g file | Substitute all string occurrences in a range of lines |
sed -i s/pattern/replace_string/g file | Save changes for string substitution in the same file |
awk
is used to extract and then print specific contents of a file and is often used to construct reports. It got its name from the authors, Alfred Aho, Peter Weinberger, and Brian Kernighan.
awk 'command' <filename>
- Specify a command directly at the command lineawk -f scriptfile <filename>
- Specify a file that contains the script to be executed
Basic awk usage:
Command | Usage |
---|---|
awk '{ print $0 }' /etc/passwd | Print entire file |
awk -F: '{ print $1 }' /etc/passwd | Print first field (column) of every line, separated by a space |
awk -F: '{ print $1 $7 }' /etc/passwd | Print first and seventh field of every line |
File Manipulation¶
sort
is used to rearrange the lines of a text file, in either ascending or descending order according to a sort key, or sort with respect to particular fields (columns) in a file
Syntax | Usage |
---|---|
sort <filename> | Sort the lines in the specified file, according to the characters at the beginning of each line |
cat file1 file2 | sort | Combine the two files, then sort the lines and display the output on the terminal |
sort -r <filename> | Sort the lines in reverse order |
sort -k 3 <filename> | Sort the lines by the 3rd field on each line instead of the beginning |
sort -r <filename> | Sort the lines then keep only unique lines, same as running uniq |
uniq
removes duplicate consecutive lines in a text file and is useful for simplifying the text display. It requires duplicate entries be consecutive to be removed. Use uniq -c
to only count the number of duplicate lines.
paste
can be used to combine file contents with respect to columns. paste -s
causes it to combine data like you do cat file1 file2 > file3
join
can be used when two files have shared column values that one can combine data based on that column, like you do in SQL statement.
split
is used to break up a file into equal-sized segments of new files for easier viewing and manipulation, by default 1000 lines per file segment. An optional prefix of the new files can be specified with split <file> <prefix>
grep
is extensively used as a primary text searching tool. It scans files for specified patterns and can be used with regular expressions.
Command | Usage |
---|---|
grep [pattern] <filename> | Search for a pattern in a file and print all matching lines |
grep -v [pattern] <filename> | Print all lines that do not match the pattern |
grep -C 3 [pattern] <filename> | Print context of lines (specified number of lines above and below the pattern) for matching the pattern |
strings book1.xls | grep my_string | Take text input from pipe |
strings
extracts printable character strings from binary files.
tr
is used to translate specified characters into other characters or to delete or keep some of them
Command | Usage |
---|---|
tr a-z A-Z | Convert lower case to upper case |
tr '{}' '()' < inputfile > outputfile | Translate braces into parenthesis |
echo "This is for testing" | tr [:space:] '\t' | Translate white-space to tabs |
echo "This is for testing" | tr -s [:space:] | Squeeze repetition of characters using -s |
echo "the geek stuff" | tr -d 't' | Delete specified characters using -d option |
echo "my username is 432234" | tr -cd [:digit:] | Complement the sets using -c option. Combined with -d, means only keep the characters in the set |
tr -cd [:print:] < file.txt | Remove all non-printable character from a file |
tr -s '\n' ' ' < file.txt | Join all the lines in a file into a single line |
tee
takes the output from any command, and, while sending it to standard output, it also saves to a file
wc
counts the number of lines (-l
option), words (-w
option), and characters (-c
option) in a file or list of files.
cut
is used for manipulating column-based files and is designed to extract specific columns using option -f <number>
. Default separator is tab; use cut -d ';'
to override that.
Linux Networking¶
Exchanging information across the network requires using streams of small packets, each of which contains a piece of the information going from one machine to another. These packets contain data buffers, together with headers which contain information about where the packet is going to and coming from, and where it fits in the sequence of packets that constitute the stream.
A network requires the connection of many nodes. Data moves from source to destination by passing through a series of routers and potentially across multiple networks.
IP Address¶
Devices attached to a network must have at least one unique network address identifier known as the IP (Internet Protocol) address. The address is essential for routing packets of information through the network.
IPv4 uses 32-bits for address and is older and by far the more widely used, while IPv6 uses 128-bits for addresses and is newer and designed to get past address pool limitations inherent in the older standard and furnish many more possible addresses.
NAT (Network Address Translation) enables sharing one IP address among many locally connected computers, each of which has a unique address only seen on the local network.
A 32-bit IPv4 address is divided into four 8-bit sections called octets, or bytes.
Network addresses are divided into five classes: A, B, C, D and E. Classes A, B, C
are classified into two parts: Network addresses (Net ID, for identify the network) and Host address (Host ID, for identify a host in the network). Class D
is used for special multicast applications (information is broadcast to multiple computers simultaneously) and Class E
is reserved for future use.
Class A Address¶
Class A addresses use the first octet as Net ID and use the other three as the Host ID.
The first bit of the first octet is always set to zero, so you can use only 7-bits for unique network numbers, leaving a maximum of 126 Class A networks available (the addresses 0000000 and 1111111 are reserved).
Each Class A network can have up to 16.7 million unique hosts on its network. The range of host address is from 1.0.0.0
to 127.255.255.255
.
Class B Address¶
Class B addresses use the first two octets of the IP address as their Net ID and the last two octets as the Host ID.
The first two bits of the first octet are always set to binary 10, so there are a maximum of 16384 (14-bits) Class B networks. The first octet of a Class B address has values from 128
to 191
.
Each Class B network can support a maximum of 65,536 unique hosts on its network. The range of host address is from 128.0.0.0
to 191.255.255.255
.
Class C Address¶
Class C addresses use the first three octets of the IP address as their Net ID and the last octet as their Host ID.
The first three bits of the first octet are set to binary 110, so almost 2.1 million (21-bits) Class C networks are available. The first octet of a Class C address has values from 192
to 223
. These are most common for smaller networks which don't have many unique hosts.
Each Class C network can support up to 256 (8-bits) unique hosts. The range of host address is from 192.0.0.0
to 223.255.255.255
.
IP Address Allocation¶
Typically, a range of IP addresses are requested from your Internet Service Provider (ISP) by your organization's network administrator. The class of IP address gieven depends on the size of your network and growth needs. If NAT is in operation, you only get one externally visible address.
You can assign IP addresses to computers over a network either manually (static address) or dynamically (can change when machine reboots) using Dynamic Host Configuration Protocol (DHCP).
Name Resolution¶
Name Resolution is used to convert numerical IP address values into a human-readable format known as the hostname.
The special hostname localhost is associated with the IP address 127.0.0.1, and describes the machine you are currently on.
Network Configuration¶
Network configuration files are located in the /etc directory tree. Debian family distros store them under /etc/network
, while Fedora and SUSE store under /etc/sysconfig/network
.
Network interfaces are a connection channel between a device and a network. Physically, network interfaces can proceed through a network interface card (NIC), or can be more abstractly implemented as software, and each can be activated or deactivated any time. Use ip
or ifconfig
utilities to view network interface information.
Network utils¶
ping
is used to check whether or not a machine attached to the network can receive and send data; i.e. it confirms that the remote host is online and is responding.
One can use the route
utility or the ip route
command to view or change the IP routing table to add, delete, or modify specific (static) routes to specific hosts or networks.
traceroute
is used to inspect the route which the data packet takes to reach the destination host, which makes it quite useful for troubleshooting network delays and errors. By using traceroute, you can isolate connectivity issues between hops, which helps resolve them faster.
Some other networking tools:
Networking Tools | Description |
---|---|
ethtool | Queries network interfaces and can also set various parameters such as the speed |
netstat | Displays all active connections and routing tables. Useful for monitoring performance and troubleshooting |
nmap | Scans open ports on a network. Important for security analysis |
tcpdump | Dumps network traffic for analysis |
iptraf | Monitors network traffic in text mode |
mtr | Combines functionality of ping and traceroute and gives a continuously updated display |
dig | Tests DNS workings. A good replacement for host and nslookup |
wget
is a command line utility for handling large file downloads, recursive downloads, password-protected downloads, or multi-file downloads.
curl
can be used from the command line or a script to read information about a http call, or save the contents to a file.
File Transfer Protocol (FTP) is a well-known and popular method for transferring files between computers using the Internet, built on a client-server model. All web browsers support FTP. Some cli FTP clients are ftp, sftp, ncftp, yafc
.
Secure Shell (SSH) is a cryptographic network protocol used for secure data communication (using ssh
) and remote services and other secure services between two devices on the network.
Move files securely using Secure Copy (scp
) between two networked hosts. scp uses the SSH protocol for transferring data.
Linux Security¶
User Accounts¶
The Linux kernel allows properly authenticated users to access files and applications. Each user is identified by a unique integer (UID) and a separate database associates a username with each UID. Related tools are useradd userdel
for creating and removing accounts.
Upon account creation, new user information is added to the user database and the user's home directory must be created and populated with some essential files. For each user, the following seven fields are maintained in the /etc/passwd file:
Field Name | Details | Remarks |
---|---|---|
Username | User login name | Should be between 1 and 32 characters long |
Password | User password (or the character x if the password is stored in the /etc/shadow file) in encrypted format | Is never shown in Linux when it is being typed; this stops prying eyes |
User ID (UID) | Every user must have a user id (UID) | UID 0 is reserved for root user; UID's ranging from 1-99 are reserved for other predefined accounts; UID's ranging from 100-999 are reserved for system accounts and groups; Normal users have UID's of 1000 or greater |
Group ID (GID) | The primary Group ID (GID); Group Identification Number stored in the /etc/group file | Is covered in detail in the chapter on Processes |
User Info | This field is optional and allows insertion of extra information about the user such as their name | For example: Rufus T. Firefly |
Home Directory | The absolute path location of user's home directory | For example: /home/rtfirefly |
Shell | The absolute location of a user's default shell | For example: /bin/ba |
For a safe working environment, it is advised to grant the minimum privileges possible and necessary to accounts, and remove inactive accounts. The last
utility can be used to identify potential inactive users.
root is the most privileged account on a Linux/UNIX system. This account has the ability to carry out ALL facets of system administration, and utmost care must be taken when using this account. root privilege is required for performing administration tasks such as restarting most services, manually installing packages and managing parts of the filesystem that are outside the normal user’s directories.
SUID, SGID, SBIT¶
SUID (Set owner User ID upon execution - similar to the Windows "run as" feature) is a special kind of file permission given to a file.
Use of SUID provides temporary permissions to a user to run a program with the permissions of the file owner (which may be root) instead of the permissions held by the user.
i.e., I have x
access to /usr/bin/passwd
and passwd
is owned by root. When I execute passwd
I temporarily get root access so I can change /etc/shadow
SUID can only be used on binary program, NOT on shell script, and NOT on directories.
SGID can be used on binary program and directories, NOT on shell script.
SBIT, Sticky Bit, only used on directories. When a user has wx
access on a directory and creates a file under it, only this user or root can delete that file.
How to set these bits: SUID: 4, SGID: 2, SBIT: 1
, i.e. chmod 4755 file_name
sudo¶
In Linux you can use either su
(requires root password, can be root for as long as needed, limited logging trails) or sudo
(requires the user's password, temporary access, more logging trails) to temporarily grant root access to a normal user.
sudo has the ability to keep track of unsuccessful attempts at gaining root access (usually logged in /var/log/secure
). Users' authorization for using sudo is based on configuration information stored in the /etc/sudoers
file and in the /etc/sudoers.d
directory, which should be edited with command visudo
for proper validations.
sudo commands and any failures are logged in /var/log/auth.log
under the Debian distribution family, and in /var/log/messages
and/or /var/log/secure
on other systems. A typical entry of the message for sudo contains: caller's username, terminal info, working dir, user account invoked, command & args.
sudo inherits the PATH of the user, not the full root user. So the directories `/sbin
and /usr/sbin
are not searched when a user executes a command with sudo. It is best to add these two dirs to the user's .bashrc
.
passwords¶
On modern systems, passwords are actually stored in an encrypted format in a secondary file named /etc/shadow
. Only those with root access can read or modify this file.
Most Linux distributions rely on a modern password encryption algorithm called SHA-512 (Secure Hashing Algorithm 512 bits), developed by the U.S. National Security Agency (NSA) to encrypt passwords. SHA-512 is widely used by security applications and protocols such as TLS, SSL, PHP, SSH, S/MIME and IPSec
and is one of the most tested hashing algorithms. Its CLI tool is sha512sum
.
chage
can be used to configure the password expiry for users. Pluggable Authentication Modules (PAM) can be configured to automatically verify that a password created or modified using the passwd
utility is sufficiently strong.
You can secure the boot process with a secure password to prevent someone from bypassing the user authentication step (such as editing the bootloader configuration during boot). This can work in conjunction with password protection for the BIOS (such as botting from an alternative boot media and mount the harddrives and view the contents).
You should NEVER edit /boot/grub/grub.cfg
directly; instead, you can modify the configuration files in /etc/grub.d
and /etc/defaults/grub
, and then run update-grub
, or grub2-mkconfig
and save the new configuration file.
Physical Hardware Vulnerability¶
Physical access to a system makes it possible for attackers to easily leverage several attack vectors, in a way that makes all operating system level recommendations irrelevant. Some possible attacks:
- Key logging
- Recording the real time activity of a computer user including the keys they press. The captured data can either be stored locally or transmitted to remote machines.
- Network sniffing
- Capturing and viewing the network packet level data on your network.
- Booting with a live or rescue disk
- Remounting and modifying disk content.
The guidelines of enhancing security are:
- Lock down workstations and servers.
- Protect your network links such that it cannot be accessed by people you do not trust.
- Protect your keyboards where passwords are entered to ensure the keyboards cannot be tampered with.
- Ensure a password protects the BIOS in such a way that the system cannot be booted with a live or rescue DVD or USB key.
Process Isolation¶
Linux is considered to be more secure than many other operating systems because processes are naturally isolated from each other. One process normally cannot access the resources of another process, even when that process is running with the same user privileges.
More recent additional security mechanisms that limit risks even further include:
- Control Groups (cgroups) - Allows system administrators to group processes and associate finite resources to each cgroup.
- Containers - Makes it possible to run multiple isolated Linux systems (containers) on a single system by relying on cgroups.
- Virtualization - Hardware is emulated in such a way that not only processes can be isolated, but entire systems are run simultaneously as isolated and insulated guests (virtual machines) on one physical host.
Hardware Device Access¶
Linux limits user access to non-networking hardware devices in a manner that is extremely similar to regular file access.
Applications interact with devices by engaging the filesystem layer, which opens a device special file (aka device node) under /dev
that corresponds to the device being accessed. Each device special file has standard owner, group and world permission fields. Security is naturally enforced just as it is when standard files are accessed.
Linux System Troubleshoot¶
Syslog files¶
Syslog files log the timestamp, source IP, service name, actions from users
It is useful in may ways:
- system side error debugging
- monitor service actions for abnormal activities
- fix network issues
Some mostly accessed sys logs:
/var/log/cron
: for crontab/var/log/dmesg
: core check on start up/var/log/lastlog
: last logged in for each account/var/log/maillog
: record SMTP provider's and POP3 provider's info and log/var/log/messages
: all system error info will be here/var/log/secure
: logs for any actions to do with passwords/var/log/wtmp
,/var/log/faillog
: records correct logged in users and failed log in attempts/var/log/httpd/
,/var/log/news/
,/var/log/samba/
: each service's own logs
system services related to logs
syslogd
: for logging system and network infoklog
: for logging anything from corelogrotate
: for switching and getting rid of old large log files
Other Misc. Linux Utilities¶
Printing¶
Printing itself requires software that converts information from the application you are using to a language your printer can understand. The Linux standard for printing software is the Common UNIX Printing System (CUPS).
CUPS uses a modular printing system which accommodates a wide variety of printers and also processes various data formats. It acts as a print server for both local and network printers. CUPS can be managed with the systemctl
utility. The CUPS web interface is available on your browser at: http://localhost:631.
How CUPS works¶
The print scheduler reads server settings from several configuration files, commonly /etc/cups/cupsd.conf
(system-wide settings, mostly related to network security, allow-listed devices), and /etc/cups/printers.conf
(printer-specific settings).
CUPS stores print requests as files under the /var/spool/cups
directory and accessible before a doc is sent to a printer. Data files are prefixed with the letter d while control files are prefixed with the letter c. Data files are removed after a printer handles a job successfully.
Log files are placed in /var/log/cups
and are used by the scheduler to record activities that have taken place.
CUPS uses filters to convert job file formats to printable formats. Printer drivers contain descriptions for currently connected and configured printers, and are usually stored under /etc/cups/ppd/
. The print data is then sent to the printer through a filter, and via a backend that helps to locate devices connected to the system.
Print from CLI¶
CUPS provides two command-line interfaces lp
(System V, actually a front-end to lpr
) or lpr
(BSD), useful in cases where printing operations must be automated. Some lp
commands
Command | Usage |
---|---|
lp <filename> | To print the file to default printer |
lp -d printer <filename> | To print to a specific printer (useful if multiple printers are available) |
program | lp or echo string | lp | To print the output of a program |
lp -n number <filename> | To print multiple copies |
lpoptions -d printer | To set the default printer |
lpq -a | To show the queue status |
lpadmin | To configure printer queues |
lpstat -p -d | To get a list of available printers, along with their status |
lpstat -a | To check the status of all connected printers, including job numbers |
cancel job-id OR lprm job-id | To cancel a print job |
lpmove job-id newprinter | To move a print job to new printer |
Print formats¶
PostScript is a standard page description language. It effectively manages scaling of fonts and vector graphics to provide quality printouts. It is purely a text format that contains the data fed to a PostScript interpreter. The format itself is a language that was developed by Adobe in the early 1980s to enable the transfer of data to printers. enscript
is a tool that is used to convert a text file to PostScript and other formats.
Postscript has been for the most part superseded by the PDF format (Portable Document Format). It can be converted from one to another format with tools like pdf2ps pdftops convert
. Some other operations such as:
- Merging/splitting/rotating PDF documents
- Repairing corrupted PDF pages
- Pulling single pages from a file
- Encrypting and decrypting PDF files
- Adding, updating, and exporting a PDF’s metadata
- Exporting bookmarks to a text file
- Filling out PDF forms
can be done with tools like qpdf pdftk gs(ghostscript)
. Some additional tools pdfinfo flpsed pdfmod
provides basic information-fetching/editing capabilities.
Tricks¶
Calculator in Terminal bc
can be a quick and light-weight calculator
- set
scale = 4
to make division precision (number of digits after decimal point) quit
to leave
check filesystem space
df
gives the overall filesystem usagedu
evaluates filesystem usage of certain directory
create partitions
fdisk
- usefdisk [-l] device_name
shows the device's partitions. without-l
will be interactive mode. (P264 for more info)df
- usedf pathname
to find the name and usage of the hosting device- It is best to do partition in single-user mode
disk check
fsck
is a serious command to use when filesystem has problems- actually calling
e2fsck
- must be used when the partition inspected was unmounted
- actually calling
badblocks
can check whether the drive has broken sectorsbadblocks -[svw] device_name
End of File
[Ctrl]+[d]
means End of File, End of Input. Can be used in the place of enteringexit
command
Format a partition
mkfs
- to format and make a filesystem- use
mkfs [-t filesystem_format] device_name
- do
mkfs[tab][tab]
will give you a list of supported filesystem format
- use
mke2fs
- a very detailed and sophisticated command- can set filesystem label, block size, inode per N bytes, journal system configuration
- i.e.
mke2fs -j -L "vbird_logical" -b 2048 -i 8192 /dev/hdc6
Linux X Window and Terminal Switching
[Ctrl]+[Alt]+[F1]~[F6]
are pre-loaded tty1 ~ tty6 Terminal workspaces[Ctrl]+[Alt]+[F7]
switch back to X Window interface- if started without X Window, can start it using command
startx
To change run levels, change /etc/inittab
mount/unmount a partition
- Things to ensure before mounting
- single filesystem should not be mounted to different mounting points
- single directory should not be mounting multiple filesystems
- directories mouting filesystems should be originally empty
mount
mount -l
shows mounted infomount -a
mounts all unmounted filesystemsmount [-t filesystem] [-L Label_name] [-o otheroptions] device_name mounting_point
typical use of commandmount -o remount,rw,auto /
when root became read-only, use this to remount and make it writable again (saves a reboot)
unmount
unmount [-fn] device_name[or]mounting_point
Mount at boot time
- Some limitations:
- root '/' must be the first to mount
- other mount point must be existing directory
- all mount points can be used only once
- all partition can be mounted only once
/etc/fstab
file- contents listed in order:
- Device_label Mount_point filesystem parameters dump fsck
- device_label can be checked using
dumpe2fs
Softlink vs. hardlink
- use
ln
to make hard links - use
ln -s
to make hard links - hardlink to a file shares the original's inode
- hardlink has the same access rights of the original
- original inode exists as long as there is pointer to this inode
- content not lost if original file is deleted
- softlink is just a pointer to another file.
- can span to different filesystem
- can work on directory
- if original file deleted, content is lost and softlink become invalid
troubleshoot file system errors
Possible causes:
- abnormal shutdown, like sudden cut off of power
- frequent Harddisk access, over-heat, high-humidity
If the error happens in partition of /dev/sda7
, then at boot time press ctrl-D to enter root password - then enter fsck /dev/sda7
to check for disk errors. If none found, enter Y to clear and reboot
If root is broken, unplug the harddisk and connect to another working Linux machine
- do not mount that drive
- login as root, execute
fsck /dev/sdb1
assumesdb1
is the broken disk - the same thing can be done using a Linux bootable USB to rescue the disk
use Single User Mode to reset forgotten root password
- reboot, when it is counting seconds, press any key to enter grub editor
- press
[e]
to enter grub editing mode - move cursor to line starting with 'kernel', add 'single' at the end of line
- press
[enter]
to save - press
[b]
to enter single user maintenance mode - enter
passwd
and enter new root password twice