These lecture notes are designed to support an intensive introduction to the Unix operating system
You should already be familiar by now (cf. "Administration de Windows" course), but here is a tentative definition:
That is, the operating system presents computing hardware as an extended and virtual machine
Shells are are regular, ordinary processes.
Several shells are available:
In this tutorial we will uniquely focus on the bash shell.
The prompt (description)
Commands, options, long, short options, abbreviated options
Special characters:
' " `cmd` | > ;
Exercise. Enter the following commands in the terminal and try to interpret the output. What are the following commands for?
echo hello world passwd date ls ls -l hostname uname -a dmesg | less (you may need to press q to quit) uptime w who id df du -s -h /tmp top (you may need to press q to quit) echo $SHELL echo {con,pre}-{sent,fer}/{s,ed} man "automatic door" man ls (you may need to press q to quit) man who (idem) man man (idem) clear cal 2014 bc -l (type quit or press Ctrl-d to quit) echo 5+4 | bc -l time sleep 5 history
For external commands, the main source of documentation is the manual, or the man pages. Use the command man to get access to the manual pages (manpages) of each (documented) command in the system. For instance, type:
man ls
The manual pages can be searched using the option -k (k for keyword) of the command man. For instance, search for all man pages whose description contains the keyword "search":
$ man -k search apropos(1) - search the whatis database for strings grep(1), egrep(1), ... - file pattern searcher ldapsearch(1) - LDAP search tool leaks(1) - Search a process's memory for unreferenced ... lkbib(1) - search bibliographic databases ...
The manual is organized in sections:
Section | Description |
---|---|
1 | General commands |
2 | System calls |
3 | Library functions, covering in particular the C standard library |
4 | Special files (usually devices, those found in /dev) and drivers |
5 | File formats and conventions |
6 | Games and screensavers |
7 | Conventions, Protocols, and Miscellanea |
8 | Administration and privileged commands and daemons |
An online copy of most manpages available in Linux can be found at http://linux.die.net/man/.
For internal commands (or shell builtins) use the command help.
Ordinary files
Directories
Devices
Hard and soft links
Directory | Function |
---|---|
/ | The root directory |
/bin | Essential command binaries that need to be available in single user mode; for all users, e.g., cat, ls, cp. |
/boot | Boot loader files, e.g., kernels, initrd. |
/dev | Essential devices, e.g., /dev/null. |
/etc | System-wide configuration files |
/home | Users' home directories, containing saved files, personal settings, etc. |
/lib | Libraries essential for the binaries in /bin/ and /sbin/. |
/media | Mount points for removable media such as CD-ROMs |
/mnt | Temporarily mounted filesystems. |
/opt | Optional application software packages. |
/proc | Virtual filesystem providing information about processes and kernel information as files. In Linux, corresponds to a procfs mount. |
/root | Home directory for the root user. |
/run | Information about the running system since last boot, e.g., currently logged-in users and running daemons. |
/sbin | Essential system binaries, e.g., init, ip, mount. |
/srv | Site-specific data which are served by the system. |
/tmp | Temporary files, often not preserved between system reboots. |
/usr | Secondary hierarchy for read-only user data; contains the majority of (multi-)user utilities and applications. |
/usr/bin | Non-essential command binaries (not needed in single user mode); for all users. |
/usr/include | Standard include files. |
/usr/lib | Libraries for the binaries in /usr/bin/ and /usr/sbin/. |
/usr/src | Source code, e.g., the kernel source code with its header files. |
/var | Variable files-files whose content is expected to continually change during normal operation of the system (logs, spool files, and temporary e-mail files). |
The following commands (with the exception of cat, more and less) only allow to copy, move and remove files and folders.
This is an internal command of the shell. It prints the current directory from which relative paths will be interpreted.
The current working directory is also usually visible in the shell's prompt:
cesar@gaia:/usr/bin$ pwd /usr/bin
ls [-lihdR] [path1, path2, ...] -- List directory contents
cd [path] -- Change directory
cp source target -- Copy files
mv source destination -- Move files
rm path -- Remove files
mkdir dir -- Make new directories
rmdir dir -- Remove directories
cat [path1, path2, ...] -- Concatenate and print files
more [path...] and less [path...] -- Paginate large files in the screen
For all the above commands, see Section 2.4 of http://www.doc.ic.ac.uk/~wjk/UnixIntro/Lecture2.html
The command find searches for files and directories within the tree(s) starting at one (or more) directory path(s). The argument [expression] instructs find which files are to be searched for.
Examples:
$ find /usr/ /usr/ /usr/bin /usr/bin/env /usr/bin/tzselect /usr/bin/test ... $ find /path/to/dir1 /path/to/dir2 ...
We will revisit the command find later in this chapter.
The command du (disk usage) recursively estimates the size of a the files present in the subtree pointed by the given path.
Important options: -s -h
Example:
$ du -sh /tmp 82M /tmp/
Displays the available free and and used space within a disk.
Important options: -h
Every file and directory in the file system is uniquely identified by an index node, usually abbreviated as inode. Option -i of the command ls displays such number:
$ ls -li /usr/bin/ total 19424 396972 -rwxr-xr-x 1 root root 6216 Dec 11 2012 addpart 402848 lrwxrwxrwx 1 root root 6 Jun 18 2012 apropos -> whatis 397423 -rwxr-xr-x 1 root root 122080 Oct 17 2014 apt-cache ...
In the UNIX file system it is possible to make appear the same file within two directories. This is called a hard link, and it is possible due to the internal structure in which directories are stored. We use the command ln to achieve this:
ln original destination
Obseve that this does not create a copy of the file original, it only creates a second directory entry that refers to the same original file. Both the original and destination file share the same inode, in contrast to copying a file with cp.
The UNIX file system provides a second way of creating directory entries that refer to the same file. These are called soft links:
ln -s original destination
In this case ln creates a new file (new inode, new directory entry) but the file is marked with an special flag and stores only the absolute path to access the original file. Soft links are very similar to Windows' shortcuts.
Exercise. Copy a file with cp, with ln (hard link) and with ln -s (soft link). What happens if you now remove the original file? And the destination file?
Mounting and dismouting disks allows to insert and remove subtrees from the root filesystem (main tree).
Run the command mount without arguments to display the mounted file systems:
$ mount proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=13785,mode=755) /dev/disk/by-uuid/bdc...07f on / type ext4 (rw,relatime,...) ...
The file /etc/fstab contains a list of devices and mount points known to the system. Traditionally, this file was used to know where to automatically mount removable media such as CD-ROMs or floppy disks:
$ cat /etc/fstab /dev/fd0 /mnt/floppy auto rw,user,noauto 0 0 /dev/hdc /mnt/cdrom iso9660 ro,user,noauto 0 0
mount [-t vfstype] [-o options] device dir
Attaches the file system stored in the disk designated by the file device to the directory dir (mount point).
Deattaches a currently attached filesystem, provided the mount point or the device storing the file system.
It is very important to unmount (non read-only) file systems before removing the associated device from the system (USB disk, floppy, etc).
Each file and directory in a UNIX file system belongs to a user and to a group.
It additionally contains metadata stating whether the owner user, the owner group, or any other user in the system can
Finally, two more metadata flags are associated to every file:
Only the owner of a file can update the access rights.
See Section 3.2 in http://www.doc.ic.ac.uk/~wjk/UnixIntro/Lecture3.html for details.
Relevant commands :
Important inode medata fields:
chmod mode path -- Change file mode bits
Examples:
chown cesar /tmp chown :floppy /tmp /bin/ls chown cesar:audio /bin/cp
The shell interprets certain characters in a special way. Among those we find the following:
Examples:
cat [path...] -- Concatenate and print files
file [path...] -- Guesses the type of a file
hexdump -C [path...] -- Prints in hexadecimal a binary file
head -n number [path...] -- Prints the first number lines of a file
tail -n number [-f] [path...] -- Prints the last lines of a file
wc [-c] [-w] [-l] [path...] -- Prints number of characters, words, lines in a file
sort [-n] [-r] [-u] [path...] -- Sorts the lines of a text file
Creating a tar file:
tar cvf file.tar folder1/ folder2/
This creates the file archive.tar.
Listing the contents of a tar file:
tar tvf file.tar
Extracting the files packed in a tar file:
tar xvf file.tar
This will recreate the directory structure packed into the tar file.
Any file can be compressed with the gzip and gunzip tools:
$ ls -lh total 112K -rw-r--r-- 1 cesar cesar 112K Sep 9 18:52 file.txt $ gzip file.txt $ ls -lh total 52K -rw-r--r-- 1 cesar cesar 51K Sep 9 18:52 file.txt.gz
Uncompressing:
$ gunzip file.txt.gz $ ls -lh total 112K -rw-r--r-- 1 cesar cesar 112K Sep 9 18:52 file.txt
The tar tool can also automatically compress and decompress tar files with the flag z. For instance, creating and compressing a tar file, and subsequently decompressing it (in the /tmp directory):
tar czvf file.tar.gz folder1/ folder2/ tar xzvf file.tar.gz -C /tmp
The commands zip and unzip can produce and unpack the usual .zip files that are popular among Windows users.
A regular expression is an expression that denotes a set of character strings. Just like an arithmetic expression is the result of combining basic operators (such as + or /) with basic terms (numbers) to denote a number, a regular expression combines basic operators (see below) with basic terms (strings) to denote a set of strings.
Often regular expressions are evaluated over the lines of a text file, with the pourpose of discarding those not denoted by the expression.
The tool grep prints the lines of a file that match a given regular expression:
grep [-iqvR] [-ABC num] [--color] expression [files...]
adduser username -- Interactively adds a user to the system
addgroup group -- Adds a group to the system
passwd -- Modifies user's password
To remove users and groups, use the commands deluser and delgroup.
Processes thus form a tree, where the root of the tree is the so-called init process, with PID 1.
Here are the main attributes that each process has:
Field | Description |
---|---|
PID | Process ID, an integer number that uniquely identifies the process |
PPID | Process ID of the parent process |
State | Scheduling state, essentially runnable, sleeping, or stopped (see below) |
Priority | Positive or negative integer affecting the process scheduling |
UID | ID of the user owning the process |
File descriptor table | Table describing the files currently opened by the process |
Environment variables | List of pairs variable=value communicated to the process from the parent process |
PGID | Process Group ID, an integer number uniquely identifying the process group to which the process belongs |
SID | Session ID, unique integer identifying the session |
Controlling Terminal | The terminal from which the process was started. Relevant for signal management. |
The attribute state can have the following values:
State | Description |
---|---|
R | Running or runnable (on run queue) |
D | Uninterruptible sleep (usually IO) |
S | Interruptible sleep (waiting for an event to complete) |
T | Stopped, either by a job control signal or because it is being traced. |
The priority is an integer ranging from -20 to 19. The (numerically) lower this value is, the most favourable the operating system scheduler will be to run the process in the presence of other processes in the runnable state.
The UID is used to determine the process permission to, for instance, access files.
The content of the file descriptor table and the environment variables for a given process can be retrieved from the /proc file system (as well as the values of most its the attributes):
See the manpages proc(5) and ps(1) for more information.
A process belongs to a proces group, identified by the attribute PGID. A session is a set of process groups. Each process group is in at most one session. A session is associated with zero or one controlling terminal (every terminal is associated with exactly one session).
These three notions are mechanisms offered by Unix to implement useful features of the system. For instance:
A process (beloging to a group which is in a session) without controlling terminal is called a daemon.
Whenever a process opens a file, the operating system assigns a number to the file and returns it to the process, which will use it to identify the opened file in subsequent operations. This integer is called the file descriptor.
File descriptors are indexes of a table which maps them to opened files. The same file can be opened twice, thus being accessible via two file descriptors. Such table is the file descriptor table.
By default, the first three file descriptors (indexes 0, 1, and 2 of the table) are opened and have specific well-known names:
When the shell forks a new command, by default all three descriptors access a single file: the device file associated with the terminal, a special character-oriented device file located in the /dev/ directory. (In turn, such device file represents the keyboard and the screen for a phisical user; or the endpoints of a TCP connection for a remote user; other configurations are possible.)
The shell provides mechanisms change such default behaviour, called redirections and pipes.
Redirect standard output to file out; leave unmodified standard input and error output (i.e., they are still attached to the terminal device):
$ COMMAND > out
Read standard input from file f instead of the terminal (they keyboard); leave unmodified standard and error output:
$ COMMAND < f
Redirect standard error output:
$ COMMAND 2> out
Of course we can combine the previous operators:
$ COMMAND < in > out
Other redirections are possible:
Redirection | Explanation |
---|---|
[n]< file | Redirect file file to stdin; n is 0 if missing |
[n]> file | Truncate file file and redirect stdout to it; n is 1 if missing |
[n]>> file | Open file for writing and append stdout output to it; n is 1 if missing |
[n]>&m | Redirect output to descriptor n to descriptor m; n is 1 if missing |
(More information in the bash(1) manpage, section "Redirections".)
Apart from redirecting input or output to a file, it is also possible to redirect inputs or outputs to another process, using a pipe.
Redirect the standard output of cmd1 to the standard intput of cmd2; leave unmodified the other inputs or outputs of both processes:
$ cmd1 | cmd2
The shell will create one process for executing the program cmd1 and another one for executing cmd2. The operating system provides a specific system call (see manpage pipe(2)) giving access to a channel-like data structure allowing to implement this functionality. The shell will connect both ends of this channel to the corresponding descriptors of both processes and then will let each process execute the corresponding program (which will transparently use the pipe instad of the terminal).
UNIX provides a mechanism to communicate a number of pairs of form variable=value to a process. This is usually employed to transfer configurations to a newly created process. Usually a process receives at least the following variables:
See environ(7) for more information.
A copy of this environment table is passed to every child process forked by a process. A process can, before or after forking new processes, modify this table using specific functions of the C library.
The shell provides a mechanism to store and update the so-called shell variables, a list of pairs variable=value different from the envioronment table. You set a shell variable using the syntax:
$ TEST="hello world" $ echo $TEST hello world
The variable is subsequently available using the syntax $VAR, as shown above. A shell variable is by default not copied to the shell's environment table, and so it is not visible in the environment table if subsequently forked processes.
However, a variable can be copied to the environment table. The internal command export copies a shell variable to the shell's environment, thus modifying the environment table that subsequent commands will receive:
$ TEST="hello world" $ env | grep TEST $ export TEST $ env | grep TEST TEST=hello world $ export -n TEST $ env | grep TEST
The (external) command env, run without arguments, allows you to see the environment being passed to a command. Observe that export -n removes the variable from the shell's environment table.
It is also possible unset a shell variable (it will also be removed from the shell's environment):
$ unset TEST
ps [axuj] [PID] -- Report a snapshot of the current processes
By default ps only selects processes from the calling user that have controlling terminal.
In combination both options (ax) select all running processes. Options u and j select the columns to display:
pstree -- Visually display the tree of processes
top -- Interactively display running processes
nice -n prio command -- run a program with modified scheduling priority
kill -9 PID -- send signal 9 (KILL) to process PID
killall -9 name -- kill a process by name
sleep nr -- waits for nr seconds
true and false (internal commands)
By default, whenever the user executes a command, the shell waits for the command to finish before it shows again the prompt, asking for a new command. We say that such command is a foreground command. The shell also offers a mechanism to launch background commands. In this case, the command will be executed in the background and the shell will immediately display the prompt, potentially before the command finishes. This is useful, e.g., to run commands that take long time to execute.
The operator & tells the shell to start a background command. The internal command jobs lists the currently running commands:
$ find / > output.txt 2> errors.txt & [1] 14785 $ jobs [1]+ Running find / > output.txt 2> errors.txt & $ ps u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 1087 14785 10.4 0.2 11904 1180 pts/3 D 14:55 0:02 find / 1087 14799 0.0 0.6 79396 3040 pts/3 R+ 14:55 0:00 ps u 1087 22536 0.0 1.3 82920 6736 pts/3 Ss 11:34 0:00 -bash
Observe that immediately after entering the find command, the shell launcehs job number [1] and prints the PID of the process (wich we see below after running ps).
A command launched in background can be brought to foreground using the internal command fg:
$ jobs [1]+ Running find / > output.txt 2> errors.txt & $ fg 1 find / > output.txt 2> errors.txt
Observe that now the shell does not show the prompt and instead waits for find to finish.
It is also possible to put in the background a command launched as a foreground processes (using the bg command). To learn about it we first need to learn about suspending (temporarily stopping) the execution of commands and also about signals (next chapter).
When a process in background terminates the shell will display so in the terminal. The process could naturally terminate, or could be killed by a signal:
$ find / > output.txt 2> errors.txt & [1] 16585 $ kill 16585 [1]+ Terminated find / > output.txt 2> errors.txt
It can also be brought to foreground, with fg, and terminated with Ctrl+C.
Process groups are the enabling mechanism that UNIX provides to implement foreground and background process management. Without entering details, each shell job (foreground or background), corresponds to a process group. Recall that a command could require launching several process, for instance a command using pipes. It therefore corresponds to a process group, not only one process
The bash shell provides relevant informations about the notions learned in this chapter via certain shell (pseudo-)variables (which it automatically updates):
Examples:
$ sleep 100 & [1] 18399 $ sleep 200 & [2] 18400 $ echo $! 18400 $ false $ echo $? 1 $ true $ echo $? 0
Signals are a UNIX mechanism for asycnhronous inter-process communication. A signal is a notification sent to a process that stops its normal flow of execution to run, if registered, the signal handler associated to the signal.
Signals are sent in UNIX for various reasons, including:
The manpage signal(7) provides a general introduction to signal management in UNIX, as well as additional pointers to the manual.
Signal | Value | Action | Comment |
---|---|---|---|
SIGHUP | 1 | Term | Hangup detected on controlling terminal or death of controlling process |
SIGINT | 2 | Term | Interrupt from keyboard |
SIGQUIT | 3 | Core | Quit from keyboard |
SIGILL | 4 | Core | Illegal Instruction |
SIGABRT | 6 | Core | Abort signal from abort(3) |
SIGFPE | 8 | Core | Floating point exception |
SIGKILL | 9 | Term | Kill signal |
SIGSEGV | 11 | Core | Invalid memory reference |
SIGPIPE | 13 | Term | Broken pipe: write to pipe with no readers |
SIGALRM | 14 | Term | Timer signal from alarm(2) |
SIGTERM | 15 | Term | Termination signal |
SIGUSR1 | 30,10,16 | Term | User-defined signal 1 |
SIGUSR2 | 31,12,17 | Term | User-defined signal 2 |
SIGCHLD | 20,17,18 | Ign | Child stopped or terminated |
SIGCONT | 19,18,25 | Cont | Continue if stopped |
SIGSTOP | 17,19,23 | Stop | Stop process |
SIGTSTP | 18,20,24 | Stop | Stop typed at tty |
SIGTTIN | 21,21,26 | Stop | tty input for background process |
SIGTTOU | 22,22,27 | Stop | tty output for background process |
kill -signal pid -- send signal signal (number or name) to process pid
trap handler signal -- sets shell function handler as handler of signal signal (number of name)
We already know that typing Ctrl+C while a command executes in foreground normally terminates the command. In fact, what happens in this case is that the kernel sends a SIGTERM signal to all processes in the foreground process group associated to that terminal. The default action for SIGTERM is to terminate the process, although it is possible to define an alternative hanlder or even ignore this signal.
In contrast, the signal SIGKILL cannot be ignored or caught, and will invariably terminate the process.
Processes in UNIX can also be suspended (temporarily paused) and restarted by means of, respectively, signals SIGSTP and SIGCONT. Upon reception of SIGSTP, the process execution stops indefinitely. The process can choose to ignore or catch (define a handler for) SIGSTP, thus overriding its default behaviour. Signal SIGSTOP also stops the process, but cannot be caught or overriden. Process execution resumes upon reception of SIGCONT.
There are several ways of sending signals SIGSTP and SIGCONT to a process. One way is of course manually doing it with the kill command. UNIX (more specifically, the kernel together with the shell) provides a less cumbersome way:
Example:
$ sleep 200 ^Z [1]+ Stopped sleep 200 $ jobs [1]+ Stopped sleep 200 $ bg 1 [1]+ sleep 200 & $ fg 1 sleep 200 ^C
netstat [-tuanlp] -- Print network connections, routing tables, interface statistics, etc.
netcat -- Establish arbitrary TCP and UDP connections and listens
Can act as a TCP client and server. As a client:
netcat -v www.google.fr 80
As a server:
netcat -vnl 1234
ifconfig -a ifconfig eth0 [up|down] ifconfig eth0 192.168.0.2 netmask 255.0.0.0
route -n route add default gw 192.168.1.1 eth0
dig /etc/resolv.conf
dhclient eth0 arp -na
ssh [-Xv] [-p port] user@host
The Debian package management tools are essentially composed of two main collections of tools:
Packages are files with extension .deb. They contain
The dpkg(1) program is the actual tool that performs basic operations with the package, including
The APT toolset mainly includes two tools, apt-get(1) and apt-cache(1). While the first is the tool used to install and uninstall packages, together with the packages they depend on, the second serves to query the package list currently known by your system.
Here we give a list of the most commonly used commands:
Displaying the list of installed packages:
dpkg -l
Displaying the list of (installed) files for the package wget:
dpkg -L wget
Searching for the keyword tcp in the (local) list of candidate packages for installation:
apt-cache search tcp
Displaying the (control) information for the package netcat:
apt-cache show netcat
Updating the list of packages known by your system:
apt-get update
Installing package netcat:
apt-get install netcat
Uninstalling package python3:
apt-get remove python3
Updating to its latest version any package currently installed:
apt-get upgrade
Quite often the problem is not actually installing a package, but knowing which package provides the command you are searching for. For Debian and Ubuntu, two search-inside-the-package search engines are fortunately provided, where you can actually search for packages providing files of a given name:
Follow http://www.doc.ic.ac.uk/~wjk/UnixIntro/Lecture8.html.
Exercise. Write a script that receives only one argument, a path to a directory. Your script will determine the type of the file for each file recursively found in the directory, with the followng simplifications:
Type of file | Your script will print |
---|---|
Ordinary files | "ordinary" |
Directories | "directory" |
Device (character) | "char" |
Device (block) | "block" |
Soft link | "link" |
Any other type | "other" |
The output format of the script should look like this:
directory . ordinary ./script.sh directory ./mydir/ link ./mydir/mylink char ./mydir/ttys003 ...
On the following situations, your script should print to the standard error output an error message and terminate with exit code 1: