Linux Disk Management: Difference between revisions

Latest revision as of 08:57, 8 May 2024

Related pages

References

Benchmarks

http://www.ilsistemista.net/index.php/virtualization/47-zfs-btrfs-xfs-ext4-and-lvm-with-kvm-a-storage-performance-comparison.html?limitstart=0

LVM

LVM snapshots explained

Very detailed and practical article on how LVM snapshots work.

How to craete snapshot of LVM thin volumens using Snapper

snapper is a simple tool to manage LVM snapshot (create, compare).

SSD Management

See SSD Tuning for Linux.

Devices and Partitions

Some GUI software:

gparted

Some CLI software:

fdisk
sfdisk
parted
gdisk (to deal with new GPT partition, see this link from microsoft for more info)

Typically, to view all devices and partitions:

lsblk                                               # View ALL block devices (incl. not formatted ones)
# NAME                MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
# loop0                 7:0    0  97,9M  1 loop  /snap/core/10444
# loop1                 7:1    0  97,8M  1 loop  /snap/core/10185
# nvme0n1             259:0    0   477G  0 disk  
# ├─nvme0n1p1         259:1    0   499M  0 part  /boot/efi
# ├─nvme0n1p5         259:2    0   954M  0 part  /boot
# └─nvme0n1p6         259:3    0 475,5G  0 part  
#   └─nvme0n1p6_crypt 254:0    0 475,5G  0 crypt 
#     ├─crypt-root    254:1    0  37,3G  0 lvm   /
#     ├─crypt-swap    254:2    0   7,5G  0 lvm   [SWAP]
#     └─crypt-home    254:3    0 428,3G  0 lvm   /home
lsblk -f                                            # View ALL block devices (incl. not formatted ones)
# $ lsblk -f
# NAME                FSTYPE      LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
# loop0               squashfs                                                       0   100% /snap/core/10444
# loop1               squashfs                                                       0   100% /snap/core/10185
# nvme0n1
# ├─nvme0n1p1         vfat        BOOT  5E11-D01C                               447,4M    10% /boot/efi
# ├─nvme0n1p5         ext2        boot  47d99617-71ac-4759-853d-162da25ff551    773,3M    13% /boot
# └─nvme0n1p6         crypto_LUKS       dfce6a71-3d86-424f-8076-ca7349dea331
#   └─nvme0n1p6_crypt LVM2_member       SeZtDO-IGwe-Senj-3gsc-VJIH-iYNX-oYqR2i
#     ├─crypt-root    ext4        root  73d7c4ef-4e20-4b88-af6f-ada78f498b33      5,7G    79% /
#     ├─crypt-swap    swap              28842d82-9e34-47f0-9026-7f16c813fe70                  [SWAP]
#     └─crypt-home    ext4        home  ec59bb43-12a4-4b3d-92bd-8466f8a90c53     40,4G    85% /home

sudo fdisk -l                                       # View ALL devices and partitions
sudo sfdisk -l                                      # idem

Some examples:

$ sudo fdisk -l /dev/sda                            # Show partition table for device /dev/sda
$ sudo fdisk -l -u /dev/sda                         # ... using sector as unit
$ sudo parted -l                                    # Show partition table of all devices
$ sudo parted /dev/sda print                        # ... of only device /dev/sda
$ sudo parted /dev/sda unit cyl print               #     ... using cylinder as unit
$ sudo parted /dev/sda unit s print                 #     ... using sector as unit (more accurate)
$ sudo sfdisk -l -uS /dev/sda                       # Show partition table for device /dev/sda
$ sudo sfdisk -d /dev/sda >sda-sfdisk.dump          # Dump partition in a format that can be understood by sfdisk
$ sudo sfdisk /dev/sda <sda-sfdisk.dump             # Restore a dumped partition table
$ sudo dd if=/dev/sda of=sda.mbr bs=512 count=1     # Save the complete MBR (table + boot code)

Use partprobe to force the kernel to re-read the MBR (re-read the partition table, see [1]). Or alternatively one can use fdisk to re-rewrite the same partition and force a re-read. And that are more solutions too ([2]):

$ sudo partprobe
# Or use fdisk
$ sudo fdisk /dev/sda
Command: v
Command: w
# Or use blockdev
$ sudo /sbin/blockdev --rereadpt /dev/hda
# Or use sfdisk
$ sudo sfdisk -R /dev/sda

UUID and labels

Run sudo blkid to get the UUID number.

blkid
# /dev/sda1: LABEL="AWS_System" UUID="023C4FC93C4FB687" TYPE="ntfs" 
# /dev/sda2: LABEL="BDEdrive" UUID="7C53861201698F3D" TYPE="ntfs" 
# /dev/sda3: LABEL="BOOT" UUID="0af7ef1a-cf55-4e67-913f-e53711178a70" TYPE="ext3" 
# /dev/sda5: UUID="754ca35b-fe65-4fce-a06d-8197f9494d7a" TYPE="reiserfs"

sudo lsblk -f shows label and UUID in a tree representation (see previous section).

Note:

blkid shows the result of last execution by root. If you created/removed partitions, do:

sudo blkid -g                  # Remove devices that no longer exist
sudo blkid                     # Update uuid cache & show the uuid list

Alternatively, list /dev/disk/by-uuid/ or /dev/disk/by-label/:

ls -l /dev/disk/by-uuid
# total 0
# lrwxrwxrwx 1 root root 10 Jul  8 16:20 023C4FC93C4FB687 -> ../../sda1
# lrwxrwxrwx 1 root root 10 Jul  8 16:20 0af7ef1a-cf55-4e67-913f-e53711178a70 -> ../../sda3
# lrwxrwxrwx 1 root root 10 Jul  8 16:20 754ca35b-fe65-4fce-a06d-8197f9494d7a -> ../../sda5
# lrwxrwxrwx 1 root root 10 Jul 12 17:56 7C53861201698F3D -> ../../sda2

To change UUID of ext filesystem:

tune2fs /dev/{device} -U {uuid}           # See man tune2fs for options

On GPT systems, you can view the GUID under linux with

sudo sgdisk -i 1 /dev/sda
# Partition GUID code: C12A7328-F81F-11D2-BA4B-00A0C93EC93B (EFI System)
# Partition unique GUID: 2C47C282-EE6E-45DE-A5AD-E8658CA67DE6
# First sector: 2048 (at 1024.0 KiB)
# Last sector: 390625 (at 190.7 MiB)
# Partition size: 388578 sectors (189.7 MiB)
# Attribute flags: 1000000000000000
# Partition name: 'EFI System'

GUID is set with

sudo sgdisk -u 1:2C47C282-EE6E-45DE-A5AD-E8658CA67DE6 /dev/sda

GPT, EFI, MS reserved partition

GUID Partition Table

The GUID Partition Table (GPT) is a new partition scheme that replaces the legacy scheme called MBR.

The GPT usually has a protective MBR, which is a legacy MBR sector with a single partition (code OxEE) that spans the whole disk (or as much as possible)
GPT imposes no limit on the number of partition (but currently limited to 128 on Windows).
Partitions in the GPT are identified via their GUID

How does the GUID in the GPT relates to the one in the partition itself, like the one set by tune2fs -U <uuid>?

EFI System Partition

The EFI System Partition (ESP) contains all the files that are necessary for booting the operating system
It is usually 100MB in size.
It has a specific GUID

DEFINE_GUID (PARTITION_SYSTEM_GUID, 0xC12A7328L, 0xF81F, 0x11D2, 0xBA, 0x4B, 0x00, 0xA0, 0xC9, 0x3E, 0xC9, 0x3B)

Microsoft Specific Partition

Reserved for future use by Windows in case some extra is needed (for instance dynamic disks). When so, the partition would be reduced and a new partition is created. This is to avoid using hidden sectors.
Contains no relevant information.
It has a specific GUID

DEFINE_GUID (PARTITION_MSFT_RESERVED_GUID, 0xE3C9E316L, 0x0B5C, 0x4DB8, 0x81, 0x7D, 0xF9, 0x2D, 0xF0, 0x02, 0x15, 0xAE)

Backup GPT

Binary image, use sudo gdisk /dev/nvme0n1, then command b.
Text, use sudo sfdisk -l /dev/nvme0n1 or sudo gdisk -l /dev/nvme0n1.

Using command-line parted

parted is the command-line utility to create partition table.

parted

select /dev/sdb
mktable msdos                # Create partition table (aka disklabel)
mkpart primary ext4 0% 100%  # Create a new partition
print all
quit

Note that the new partition must still be formatted. For instance:

mkfs.ext4 -L HOME /dev/sdb1

Resizing Partitions

gparted

Probably one of the best way to edit/resize/move partition is to use the GUI tool gparted. It suports many different file systems, and allows for both resizing the file system but also updating the partition table.

If no GUI is available, here a few recipes for command-line.

fixparts

fixparts] is a specialized partitioning tool:

Remove stray GUID Partition Table (GPT) data.
Repair mis-sized extended partitions.
Change primary partitions into logical (extended) partitions or vice-versa.

Reiserfs

Use resize_reiserfs to resize the partition, and get the new partition size

resize_reiserfs -s -4G /dev/sda6               #Must be unmount
df

Change the partition table

sudo sfdisk -d /dev/sda >sda-sfdisk.dump          # Edit sda-sfdisk.dump

Run reiserfsck

sudo reiserfsck --rebuild-sb
sudo reiserfsck --fix-fixable

Repair Master Boot Record

Using package mbr [3],Linux Commands:

sudo apt-get install mbr
sudo install-mbr -i n -p D -t 0 /dev/sda

Using lilo [4]:

sudo apt-get install lilo
sudo lilo -M /dev/sda mbr

Using syslinux [5]:

sudo apt-get install syslinux
sudo dd if=/usr/lib/syslinux/mbr.bin of=/dev/sda

Using ms-sys (might be dangerous):

ms-sys /dev/sda     # Inspect
ms-sys -m /dev/sda  # Write an MBR

Use a Grub CD and start Windows partition on disk and fix MBR:

chainloader (hd0,<win7 partition>)+1

Then in an administrative Windows console:

bootrec /fixmbr       # Windows 7
fdisk   /mbr          # Windows XP

Mounting Partitions

Using `/etc/fstab`

# NTFS
UUID=XXXXXXXXXXXXXXXXXXXXX /media/windows ntfs defaults,umask=007,gid=46 0 1

Partitions can then be mounted with mount <mount-point>

Using `mount`

# NTFS - mount point /media/windows must be chgrp plugdev
sudo mount -t ntfs -o defaults,umask=007,gid=46 /dev/sda1 /media/windows
# SAMBA
sudo mount -t cifs -o username=baddreams,uid=1000,gid=124 //phoenix/D$ /net/phoenix/d

Remounting root partition read-write

If /etc/fstab is corrupted, boot process might stop while root partition is mounted read-only. To remount it in read-write mode in order to fix /etc/fstab (see [6]):

mount -n -o remount,defaults /dev/sda1 /       # -n means do not update /etc/mtab (when /etc is ro)

Boost ext3/4 performance by enabling data writeback and disabling atime

Data writeback leads to faster performance on ext3/4 filesystem, at the cost of possible loss of new data in case of system crash (old data magically reappear) (see [7]). To enable it simply add data=writeback to mount options in /etc/fstab. Also disable update of atime (access time):

/dev/hda1 / ext3 defaults,errors=remount-ro,noatime,data=writeback 0 1

Unmount partition first! Either unmount the partition, or first run tune2fs to update the current mount flag:

tune2fs -o journal_data_writeback /dev/sda1

Tips / How-o

Clone a root partition / disk

We can use sfdisk and tar to clone locally a root partition. This can be used for instance to create a new root image (eg. to shrink a VM footprint).

# We assume source disk /dev/sda, and target dist /dev/sdb
# We assume /dev/sda1 = root, /dev/sda5 = swap
sudo su -
sfdisk -l                                     # Check everything's ok
dd if=/dev/sda of=/dev/sdb bs=512 count=2048  # Copy grub
sfdisk -d /dev/sda | sfdisk /dev/sdb
blkid                                         # Write down UUID of root and swap partition
mkfs.ext4 -L root -U "..." /dev/sdb1          # Use same UUID as reported by blkid
mkswap -U "..." /dev/sdb5                     # Use same UUID as reported by blkid
mkdir -p /mnt/sda1 /mnt/sdb1
mount /dev/sda1 /mnt/sda1
mount /dev/sdb1 /mnt/sdb1
cd /mnt/sda1
tar cf - --one-file-system . | tar xvCf /mnt/sdb1 -

Benchmark a HDD / SDD on Linux

https://askubuntu.com/questions/87035/how-to-check-hard-disk-performance

Read benchmark

Use hdparm:

sudo hdparm -Tt /dev/sda
# /dev/sda:
#  Timing cached reads:   11584 MB in  2.00 seconds = 5805.22 MB/sec
#  Timing buffered disk reads: 1306 MB in  3.00 seconds = 434.93 MB/sec

Use gnome-disks (app Disks, Manage Drives and Media)

Read / Wrte benchmark

For Write access, ???

I/O

For I/O speed, https://support.binarylane.com.au/support/solutions/articles/1000055889-how-to-benchmark-disk-i-o

Clone

Here we list tools for copying complete disks or partitions, either file-level or block-level. We exclude Backup software (like BackupPC or BorgBackup), which are meant to be run regularly.

References

[1] — A comprehensive analysis Backing up Linux and other Unix(-like) systems
Recommends DAR. tar, rsync, rdiff-backup are also options
[2] — [8] for using cpio in order to preserve hardlinks

dd, cat, pv

dd, cat or pv are all tools for doing byte-level copies of files or block devices.

Local copy

The best tools are either cat or pv. dd can also be used but is only useful to copy specific fraction of a file or block device.

dd if=/dev/sda of=/dev/sdb                   # Use default block size 512, very slow
dd if=/dev/sda of=/dev/sdb bs=16M            # Faster. Override default block size
dd if=/dev/sda of=/dev/sdb bs=512 count=2048 # Copy a fraction of the input block device
cat </dev/sda >/dev/sdb                      # FASTEST. Use optimal block size
pv </dev/sda >/dev/sdb                       # ... idem, but also show progress

Benchmarks ([9], [10]) indicate that the choice of block size for dd matters, and cat automatically finds the best way to make a fast copy. dd was only slightly faster when copying files on a same disk. pv also checks for the fastest speed and then proceeds on cloning.

So dd if=/dev/sdb of=/dev/sdc is a just complicated, error-prone, slow way of writing cat /dev/sdb >/dev/sdc. While dd still useful for some relatively rare tasks, it is a lot less useful than the number of tutorials mentioning it would let you believe. There is no magic in dd, the magic is all in /dev/sdb.

Some remarks:

Typical writing speed [11]:

Connected Device  -  Connection Type  -  Speed (Write Speed)
  USB 2.0                 USB 2.0              25 MB/s
  USB 3.0                 USB 2.0              35 MB/s
  USB 3.0                 USB 3.0              73 MB/s
  eSata                   eSata                80 MB/s
  Sata 2G HDD             Sata 2G              120 MB/s
  Sata 3G HDD             Sata 2G              140 MB/s
  Sata 3G HDD             Sata 3G              190 MB/s
  Sata 2G SDD             Sata 2G              170 MB/s
  Sata 3G SDD             Sata 2G              210 MB/s
  Sata 3G SDD             Sata 3G              550 MB/s

Block size: for the fastest results your block size should be half the lowest write speed you typically receive.
To view progress for dd, send a SIGUSR1 to the process:

sudo pkill -SIGUSR1 dd

To view progress for cat, look at the position of its input or output file descriptor:

cat /proc/1234/fdinfo/0
# pos:    64155648 
# flags:  0100000

Some tips to speed-up further dd ([12]):

Use seperate dd invocations for reading and writing and use a pipe to connect them

dd if=/dev/sda bs=1M | dd of=/dev/sdb bs=1M

Make sure both invocations share the same block size.

Copying over network

Use nc (netcat) for copying over the network [13]. For instance:

# On Host A (receiver):
nc -l 2222 > /dev/sdb
# On Host B (sender):
nc hosta 2222 < /dev/sda

When copying the network, use bzip2 to compress binary data:

# On Host A (receiver):
nc -l 2222 | bzip2 -d > /dev/sdb
# On Host B (sender):
bzip2 -c /dev/sda | nc hosta 2222

Combine with pv to monitor progress: Benchmarks indicate that using pv is the fastest method (achieving 111MB on 1GB ethernet cable, on both SSD disks) [14]:

# On Host A (receiver):
nc -l 2222 > /dev/sdb
# On Host B (sender):
pv < /dev/sda | nc hosta 2222        # Achieves 111MB over 1Gb eth, SSD hard disks

cp

cp can preserve all metadata, ownership, permissions, etc, as long as the user has the necessary rights and metadata are supported by the destination file system [15].

cp -a src dst              # GNU cp -a copies recursively preserving as much structure and metadata as possible.
sudo cp -a src dst         # ... running as root to preserve ownership

rsync

rsync can preserve all metadata, ownership, permissions, etc, as long as the user has the necessary rights and metadata are supported by the destination file system [16].

Advantages over cp ([17]):

Only copy updated parts of an updated file (handy for incremental copies)

See --inplace option, and original paper [18]

has a --delete option
Use encryption / decryption (handy over network)

rsync -a src dst           # -a, --archive         archive mode; equals -rlptgoD (no -H,-A,-X)
rsync -aH src dst          # ... -H, --hard-links  preserve hard links
rsync -aHA src dst         # ... -A, --acls        preserve ACLs (implies -p)
rsync -aHAX src dst        # ... -X, --xattrs      preserve extended attributes

Pro	Con
Standard. Partition to partition cloning. Network support (using ssh). File-level backup, so can copy to a different filesystem type and size.	Doest not create a single archive. Metadata stored as metadata in destination filesystem. Complex set of options.

rsync goal is to synchronize 2 remote file systems over the network

My set of command line options (sudo pre-activation credits to [(credit http://crashingdaily.wordpress.com/2007/06/29/rsync-and-sudo-over-ssh/)] and [19])

#If needed, pre-activate sudo on remote system. Flag -t required to solve 'sudo: no tty present and no askpass program specified'
#
# Also, this requires the following line in /etc/sudoers:
#
#    Defaults     !tty_tickets
#
stty -echo; ssh -t user@server sudo -v; stty echo

sudo rsync -aHAXS --delete --rsync-path "sudo rsync" --numeric-ids -h -v --exclude='lost+found' user@server:/remote/path  /local/dest
# This will copy /remote/path on remote server as /local/dest/path on local machine.
#
# -a, --archive           aka. preserve almost everything (equiv. to -rlptgoD, i.e. --recursive, --links, --perms, --times, 
#                         --group, --owner, --devices, --specials)
# -H, --hard-links        preserve hardlinks
# -A, --acls              preserve ACLs (implies --perms)
# -X, --xattrs            preserve extended attributes
# -S, --sparse            handle sparse file efficiently
# --delete                delete extraneous files from the receiving side
# --rsync-path            command executed on remote system
# --numeric-ids           use gid / uid instead of user/group name for file permissions
# -v, --verbose           display file while transfering
# --exclude='lost+found'  useful on ext3/ext4

Some options to consider adding

# -z, --compress          might increase txf speed on slow network (internet)
# -h, --human-readable
# --stats
# -P                      equiv. to --partial --progress (quite verbose)
# -v -v                   more verbose

Tips

Use --list-only to get a list of files instead of copying them. Handy to test exclude rules:

rsync -aHAXS --exclude='this' --exclude='and/that' --list-only user@server:/remote/path  /local/dest

Add a trailing slash / to source name to transfer a directory content, and not the directory itself

rsync -aHAXS user@server:/remote/path  /local/dest      # Will create a directory /local/dest/path
rsync -aHAXS user@server:/remote/path/ /local/dest      # Will copy content of /remote/path into /local/dest

This is particular important when using a filter rule with a slash:

rsync -aHAXS --exclude '/backup'     user@server:/remote/path/ /local/dest   # Will skip /remote/path/backup
rsync -aHAXS --exclude 'path/backup' user@server:/remote/path  /local/dest   # Idem
rsync -aHAXS --exclude '/backup'     user@server:/remote/path  /local/dest   # Likely WRONG

More examples

How to use rsync

tar

Pro	Con
Standard. Easy to use. Perfect file-level backup, can copy to a different filesystem type and size, can make a perfect clone of root / partition. Produce a single (compressed) archive file, or can be piped into other tool (eg. for over-the-network clone using SSH).	No incremental backup.

tar can preserve all metadata, ownership, permissions, etc, as long as the user has the necessary rights and metadata are supported by the destination file system [20]. tar is then perfectly suitable to clone the root partition from one system to another.

(cd src;tar cf - --one-file-system .) | (mkdir dst;cd dst;tar xf -)    # create src as before
tar cf - --one-file-system src | tar xCf dst -                         # Shorter

Note that second variant is not 100% identical. The path in the archive will contain src, whereas in the first variant the path will start with ..

tar can be used to copy partition over the network. For instance, to copy a directory /mnt/root from remote server to /mnt/root locally:

# This requires the following line in /etc/sudoers:
#    Defaults     !tty_tickets
#
stty -echo; ssh -t user@server sudo -v; stty echo
ssh user@server "(cd /mnt/root; sudo tar cf - --one-file-system .)" | sudo tar xvCf /mnt/root -

or to simply backup the root partition:

cd /mnt/root; sudo tar -czf - --one-file-system . | ssh user@server "cat > rootfs.tgz"

To preverse integrity of the backup, it is best to mount the filesystem read-only while the backup is done. For the root partition, this requires either to boot the system on a Live CD, or to use snapshot tools (like LVM snapshots, or Dattobd)

pax (POSIX tar)

pax can preserve all metadata, ownership, permissions, etc, as long as the user has the necessary rights and metadata are supported by the destination file system [21].

mkdir dst
pax -rw src dst          # Same as tar, but pack and unpack in a single process

CloneZilla

Pro	Con
	Not a standalone program (only bootable cd / usb)

cpio

Some standard tool (see also [2] above).

DAR

Pro	Con
Single archive (containing all metadata)	transfer over network not easy. No immediate support for pipes (like ntfsclone)

DAR is recommended by [1] above. I personally tried the transfer through netword capability, but without success (broken image)

dump / restore

Backup tool for ext2/ext3 (/ext4 ?). See Backup or snaphot tool for ext4, but requires LVM2 for snapshot.

FSArchiver

Pro	Con
	* Does not support archiving through network (pipe). So one cannot save a partition, and restore it immediately on another machine through network for instance.

See this tutorial.

fsarchiver -v savefs /mnt/backupdrive/my-backup.fsa /dev/sda4
fsarchiver restfs -v /mnt/backupdrive/my-backup.fsa id=0,dest=/dev/sda4
sudo mount -o remount,ro /dev/sda4                                         # To remount read-only if complain it is mounted already

ntfsclone

Pro	Con
Simple and fast output/input through pipes (for compression, network transfer) partition-2-partition cloning

Simply the best for ntfs backup (support partition-2-partition backup through network).

Partclone, PartImage

Pro	Con
Single archive	No support for pipes (compression, transfer via network) Destination partition must be same size as source partition Lack support for some FS (partimage does not support ext4, partclone does not support reiserfs, despites what manual says)

PartImage is another solution, but it does not support ext4.

ddrescue / gddrescue

A block-level copy utility that focuses on backing up damaged disks.

RAMFS / TMPFS

References:

Using RAMFS and TMPFS you can allocate part of the physical memory to be used as a partition. This partition can be mounted as a regular hard disk partition to accelerate tasks that requires heavy disk access (this partition could store for instance a database, or a version control repository...)

Access Control

References:

Using SGID bit to Control Group Ownership

SGID bit allows for controlling the Group Ownership of files within a directory:

mkdir /data/testacl
chgrp git /data/testacl                # Set group to 'git'
chmod g+s /data/testacl                # Set SGID bit
cd /data/testacl
touch file                             # Now 'file' has group 'git', independently of current user primary group

This is nice, but access condition is still dependent on user's umask setting. Also, moving or copying files ignore the sticky bit.

Using ACL to set default access control

ACL must be installed:

sudo apt-get install acl

... and enabled on the target file system in /etc/fstab:

/dev/sda7    /data     ext4      defaults,acl     0     2

Now, let's say that default permission is 'rwx' for file created in our 'test' directory above:

cd
setfacl -m d:group:git:rwx /data/testacl  # By default, all members of group 'git' will have rwx access
                                          # Independently of user's umask setting
umask 022
touch /data/testacl/file022               # File 'file022' is still writable for group 'git'

However this does not work if files are copied or moved into the directory. In that case, files may either lose the group access flags, or even lose group ownership (see [22] for more). This could be a problem if for instance some application is unpacking some files in a temporary directory and then moves them to our ACL-controlled directory.

Change session primary group

We can change the primary group of the current session (and all sub-processes) so that any files created in the session belongs to some given group. This method is robust against moving / copying files into a directory, as long as these files have created in the same session. As a drawback however, it requires to first run a command to do the group switch:

See commands sg or newgrp.

efs attributes

See command lsattr and chattr (for instance the i, immutable, attribute).

Set default ACL for /www folder

# Set default access condition to rwxr-xr-x / www / www-data
cd /
sudo chgrp -R www-data www
find www -type d -print0|sudo xargs -0 chmod g+s
find www -type d -print0|sudo xargs -0 setfacl -m d:group:www-data:r-x
find www -type d -print0|sudo xargs -0 setfacl -m d:user:www:r-x           # TODO: this one does not work with root...

Secure Wipe

Easiest and fastest method, use shred with one random pass and one zero pass (from []). This is safe enough according to this article:

sudo shred -v -z -n 1 /dev/sda

S.M.A.R.T.

SMART is a system that monitors the health conditions of hard drives and report failures when detected [23], [24].

Installation

On Debian/Ubuntu

sudo apt-get install smartmontools

Capabilities - Initial tests

sudo smartctl -i /dev/sda                        # Query the device
sudo smartctl -s on -o on -S on /dev/sda         # Turn on some features
sudo smartctl -H /dev/sda                        # Check overall health
sudo smartctl -c /dev/sda                        # Get SMART capabilities
sudo smartctl -t short /dev/sda                  # Run the short self-test
sudo smartctl -l selftest /dev/sda               # Get the test result, if available
sudo smartctl -t long /dev/sda                   # Run the long self-test

Installing the daemon smartd

Edit /etc/smartd.conf:

--- a/smartd.conf
+++ b/smartd.conf
@@ -18,7 +18,7 @@
 # Directives listed below, which will be applied to all devices that
 # are found.  Most users should comment out DEVICESCAN and explicitly
 # list the devices that they wish to monitor.
-DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
+# DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
 
 # Alternative setting to ignore temperature and power-on hours reports
 # in syslog.
@@ -110,6 +110,10 @@ DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smar
 #/dev/sdd -d hpt,1/4/1 -a -s L/../../2/01
 #/dev/sdd -d hpt,1/4/2 -a -s L/../../2/03
 
+/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/share/smartmontools/smartd-runner
+# Short-test: daily at 02:00 am
+# Long-test: every saturday, at 03:00 am
+
 # HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE.
 # PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS
 #

Edit /etc/default/smartmontools:

 # uncomment to start smartd on system startup
-#start_smartd=yes
+start_smartd=yes

Test smartd

sudo /etc/init.d/smartmontools restart

On failure, smartd will send an email to root. To test email configuration, add -M test to line above in /etc/smartd.conf. Then restart the daemon:

sudo /etc/init.d/smartmontools restart

Verify that the mail is sent. Then remove -M test afterwards.

Understanding SMART report

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       175616936
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   097   097   020    Old_age   Always       -       3901
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail  Always       -       377515785
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5683
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       24
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   056   031   045    Old_age   Always   In_the_past 44 (69 149 44 39)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       10
193 Load_Cycle_Count        0x0032   039   039   000    Old_age   Always       -       122697
194 Temperature_Celsius     0x0022   044   069   000    Old_age   Always       -       44 (0 18 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       72610717110240
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       14056648957
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       26560954159

lime-technology.com gives some explanations:

Most value are on an normalized scale from 100, very good, to 1, very bad. When the value exceeds 100, it means the drive behaves better than expectation, standard.
VALUE show the current value for the attribute. WORST shows the worst value obtained for this attributes in past scan. When the disk is aging, WORST typically goes down to 1. When it goes below the THRESH value, it is reported as a failure in the self-test.
Exception to rule above are the read attributes 1 and 7, and temperature attributes 190 and 194.
Attributes in the category Pre-fail are considered critical. If the attribute VALUE is below THRESH, then the drive fails overall SMART health test, and failure may be imminent. The Old_age term means that the attribute is related to normal aging, normal wear and tear of the drive.

Snapshots

See

LVM.
Dattobd

For snapshoting at block device level.

Blktrace

User and kernel code to trace into block device performance. Used here to do a block-device snapshot.

SSD

See SSDOptimization.

Mainly, enable TRIM:

sudo systemctl enable fstrim.timer
sudo systemctl start fstrim.timer

To check that TRIMming works:

sudo fstrim -va

If the drive shows issues with the queued TRIMming, add to kernel cmdline:

libata.force=noncq

LVM

References

LVM Basics

To install lvm2:

sudo apt-get install lvm2

Some frequently used commands:

sudo pvs                   # List 'physical volumes'
sudo gvs                   # List 'volume groups' and stats (available space, number of lv)
sudo lvs                   # List 'logical volumes' and stats (size, etc)
sudo vgdisplay             # Detailed information on 'volume groups'
sudo lvdisplay             # Detailed information on 'logical volumes'

To create an LVM2 system, first, in gparted, create an LVM partition (type lvm2 pv), say /dev/sda1. This partition must have the flag lvm checked. Then:

sudo pvcreate /dev/sda1                # Identifies /dev/sda1 as a physical volume
sudo vgcreate vg /dev/sda1             # Create a group 'vg', and add pv /dev/sda1 to it
sudo lvcreate -n root -L 20g vg        # Create a logical volume named 'root', of size 20GB, in group 'vg'
sudo lvcreate -n swap -L 4g vg         # Create a logical volume named 'swap', of size 4GB, in group 'vg'
sudo lvcreate -n home -l 100%FREE vg   # Create a logical volume named 'home', taking all the remaining free space, in group 'vg'

Next each partition must be formatted:

sudo mkfs.reiserfs -l root /dev/vg/root      # Format lv 'root' as reiserfs
sudo mkswap /dev/vg/swap                     # Setup a linux swap area in lv 'swap'
sudo mkfs.ext4 -L home /dev/vg/home          # Format lv 'home' as ext4

Some administration commands:

sudo pvchange -a y                           # Activate all known volume groups in the system
sudo pvchange -a n                           # Deactivate all known volume groups in the system
ls /dev/mapper                               # View all activated logical volumes

Reduce / extend a logical volume / partition

One use lvreduce to resize an existing logical volume. Before doing so, the filesystem in that volume must be shrunk. To avoid data loss, we proceed in three steps [25], [26]:

# First umount fs and reduce fs to 90% of target size
umount /home
resize2fs /dev/crypt/home 385g

# Then resize logical volume
lvreduce -L -2.5g /dev/crypt/home

# Grow fs to match container volume size, and remount
resize2fs /dev/crypt/home
mount /home

The 10% margin is used to make sure that the LV resize won't truncate the underlying FS, which would lead to permanent data loss.

Extension works the same with lvextend:

lvextend -L +2.5g /dev/crypt/home

LVM snapshots

We can use lvcreate to create snapshots, for instance for backup or sandboxing. LVM snapshots are basically image of the original LV from the time the snapshot was created. Snapshots can be read-only (backup), or read-write (sandboxing). Snapshots can be merged back to the original LV, or dropped completely.

To create a snapshot, we need some free space in the VG. The size of the snapshot defines how many COW (copy-on-write) blocks we have, which limits the changes that can occur in either the original LV or the snapshot LV. Note that this is even true for read-only snapshots (since in case of changes, the old blocks are assigned to the snapshot).

lvs
# LV   VG    Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
# home crypt -wi-ao---- <428.32g
# root crypt -wi-ao----   37.25g
# swap crypt -wi-ao----   <7.45g
vgs
#  VG    #PV #LV #SN Attr   VSize    VFree
#  crypt   1   3   0 wz--n- <475.52g 2.50g

We create the snapshot with lvcreate and option -s.

lvcreate -L1G -s -p r -n root-snap /dev/crypt/root       # -p r: read-only
#  Logical volume "root-snap" created.
lvs
#  LV        VG    Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#  home      crypt -wi-ao---- <428.32g
#  root      crypt owi-aos---   37.25g
#  root-snap crypt sri-a-s---    1.00g      root   0.01
#  swap      crypt -wi-ao----   <7.45g

New devices were created in the device mapper

dmsetup table
# nvme0n1p6_crypt: 0 997232640 crypt aes-xts-plain64 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 259:3 4096 1 allow_discards
# crypt-root--snap: 0 78118912 snapshot 254:4 254:5 P 8
# crypt-root-real: 0 78118912 linear 254:0 2048
# crypt-home: 0 898244608 linear 254:0 93743104
# crypt-swap: 0 15622144 linear 254:0 78120960
# crypt-root: 0 78118912 snapshot-origin 254:4
# crypt-root--snap-cow: 0 2097152 linear 254:0 991987712

We mount the snapshot as usual

mount /dev/crypt/root-snap /mnt/snap/root
# mount: /mnt/snap/root: WARNING: device write-protected, mounted read-only.

Writing data to the original LV will consume space on the snapshot

dd if=/dev/zero of=zeroes bs=1M count=512
lvs
#  LV        VG    Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#  home      crypt -wi-ao---- <428.32g
#  root      crypt owi-aos---   37.25g
#  root-snap crypt sri-aos---    1.00g      root   51.83
#  swap      crypt -wi-ao----   <7.45g

Writing too much data will invalidate the snapshot, which will then be unmounted automatically

dd if=/dev/zero of=morezeroes bs=1M count=512
lvs
#  LV        VG    Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
#  home      crypt -wi-ao---- <428.32g
#  root      crypt owi-aos---   37.25g
#  root-snap crypt sri-I-s---    1.00g      root   100.00
#  swap      crypt -wi-ao----   <7.45g
dmesg | tail
# ...
# [126953.996761] device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception.

We remove the snapshot with lvremove

lvremove /dev/crypt/root-snap
# Do you really want to remove active logical volume crypt/root-snap? [y/n]: y
#   Logical volume "root-snap" successfully removed

References

Full disk encryption (DM-Crypt)

References

There are several ways to setup full disk using DM-Crypt:

Use Ubuntu installer — the easiest method, but creates only a single partition.
Setup custom partition and encrypt each with DM-crypt
Use LVM2

Using the Ubuntu instaler is the easiest solution, but it only creates a single partition.

The second method consists in creating separate partion for say /boot, /, /home and swap and encrypt each (except /boot with cryptsetup luksFormat. The problem is that by default it requires to enter as many passphrases as there are encrypted partitions. This can be changed by adding keyfile to the encrypted partitions such that only the / partition is mounted with a passphrase, and the /home and swap partitions are mounted with a keyfile stored on the root partition (using /etc/crypttab) [27]. Obviously the keyfiles must only be readable to root. The drawback of this method is that keyfiles may be the target of eavesdrop attack.

The third method is a bit more complicated, but only requires to enter a single passphrase at boot, and there is no need to store extra keyfiles.

DM-Crypt basics

Some frequently used commands:

sudo cryptsetup luksOpen /dev/sda2 root      # Open encrypted partition on /dev/sda2 and named it 'root'
sudo mount /dev/mapper/root /mnt/root        # Mount opened encrypted partition as /mnt/root
sudo cryptsetup luksClose root               # Close encrypted partition 'root'
sudo cryptsetup luksChangeKey root           # Change passphrase for partition 'root'

Note that an encrypted partition may have several passphrases and/or keyfiles associated to it. One such passphrase or keyfile is necessary to open the partition. In that case, create file etc/crypttab to have the passphrase automatically mounted at boot

If the encrypted partition is an LVM physical volume, the corresponding volume group will be activated as well. The logical volume in that group are visible in /dev/mapper or with lvscan. Also note that the crypt partition can only be unmounted after disabling the volume group:

sudo sfdisk -l                               # List available partitions
sudo cryptsetup luksOpen /dev/sda2 volume    # Mount encrypted PV, this will activate the vg

sudo lvscan                                  # Scan for LVM volumes, or ...
#  ACTIVE              '/dev/ubuntu-vg/root' [20.00GiB] inherit
#  ACTIVE              '/dev/ubuntu-vg/swap_1' [4.00GiB] inherit
#  ACTIVE              '/dev/ubuntu-vg/home' [214.23GiB] inherit

sudo lvs                                     # ... Info on logical volumes, or
ls /dev/mapper                               # ... View active LV in the group

sudo mount /dev/mapper/volume-vg-root /mnt/root   # Mount the volume using dev-mapper id, or...
sudo mount /dev/ubuntu-vg/root /mnt/root          # ... or using LVM id

# Unmount the encrypted LVM pc
sudo vgchange -a n                           # First disable all groups in the system
sudo cryptsetup luksClose volume

Setting up DM-Crypt encryption with LVM

In gparted, create

/dev/sda1, type ext2, label boot.
/dev/sda2, type lvm2 pv, label volume.

sudo cryptsetup luksFormat /dev/sda2             # Enter passphrase
sudo cryptsetup luksOpen /dev/sda2 crypt         # Enter passphrase
sudo pvcreate /dev/mapper/crypt
sudo pvs                                         # List available physical volume and stats (size...)
sudo vgcreate vg /dev/mapper/crypt               # Create group 'vg' and add pv /dev/mapper/crypt to it
sudo gvs                                         # List available groups and stats
sudo lvcreate -n root -L 20g vg                  # Create a 20g volume named 'root' on group 'vg'
sudo lvcreate -n swap -L 4g vg                   # Create a  4g volume named 'swap' on group 'vg'
sudo lvcreate -n home -l 100%FREE vg             # Create a volume named 'home' on group 'vg', taking all remaining free space
sudo lvs                                         # List available logical volumes and stats
sudo mkfs.reiserfs -l root /dev/vg/root          # Format lv 'root' as reiserfs
sudo mkswap /dev/vg/swap                         # Create swap on lv 'swap'
sudo mkfs.ext4 -L home /dev/vg/home              # Format lv 'home' as ext4

From now on, partitions can be used as usual. We still must configure /etc/fstab, /etc/crypttab, and initramfs and grub. First get the relevant UUID:

sudo blkid

Now we build a chroot environment:

sudo mount /dev/mapper/vg-root /mnt/root
sudo mount /dev/mapper/vg-home /mnt/root/home
sudo mount /dev/sda1 /mnt/root/boot
for a in dev proc sys run; do sudo mount --bind /$a /mnt/root/$a; done
                                               # Run needed to recover /etc/resolv.conf
sudo chroot /mnt/root
sudo apt-get install lvm2 cryptsetup
sudo vi /etc/fstab
sudo vi /etc/crypttab

Edit /etc/fstab as follows:

# <file system>                            <mount point> <type>   <options>                           <dump>  <pass>
/dev/mapper/ubuntu--vg-root                /             reiserfs notail,noatime,acl                  0       1
UUID=3e697768-238a-4210-9ad9-5e7e3ae1b4ce  /boot         ext2     defaults                            0       2
/dev/mapper/ubuntu--vg-home                /home         ext4     defaults,noatime,data=writeback,acl 0       2
/dev/mapper/ubuntu--vg-swap_1              none          swap     sw                                  0       0

Edit /etc/crypttab as follows:

sda5_crypt UUID=3c3978f1-9c51-4290-a351-94146b54dd50 none luks,discard

Then

sudo grub-install
sudo update-grub
sudo update-initramfs -u

TODO

Update hibernate [28]
backup LUKS header

Resize an encrypted partition (LVM)

This requires several steps [29]:

If necessary, with gparted, move the encrypted partition.
Resize the encrypted partition with sfdisk. This cannot be done by gparted. Make sure to align

sudo sfdisk -d /dev/nvme0n1 > gpt
vi gpt                             # Edit partition, make sure to align on 1M
sudo sfdisk /dev/nvme0n1 < gpt

Alternatively, it seems we could resize with cryptsetup as well (option resize) [30].

Mount the resized partition:

sudo cryptsetup luksOpen /dev/nvme0n1p6 crypt

Enlarge the physical volume with pvresize.
Enlarge the logical volume with lvresize.
Enlarge the file system with resize2fs.

Reference

ResizeEncryptedPartitions — ubuntu.com.

Decrypt a partition permanently

In order to decrypt permanently an encrypted partition, the easiest is to backup the mounted partition to a separate unencrypted partition. Then:

Edit /etc/fstab.
Edit /etc/crypttab.
If root partition, reinstall grub.
If root partition, regenerate /boot (eg. by reinstalling the current kernel).

Reference

ArchLinux

Add a second passphrase

sudo cryptsetup luksAddKey /dev/nvme0n1p6

Backup

We follow the recommandation from Calum, and back up both the LUKS header and the LVM configuration. We encrypt the backup header with GPG to keep the nice property of fast wipe by overwriting the header and key slot.

# Backup LUKS header
sudo cryptsetup luksHeaderBackup /dev/nvme0n1p6 --header-backup-file=/tmp/luks-header-$HOSTNAME
sudo gpg -e /tmp/luks-header-$HOSTNAME
sudo rm /tmp/luks-header-$HOSTNAME
sudo chmod 400 /tmp/luks-header-$HOSTNAME.gpg
sudo mkdir /boot/backup
sudo cp /tmp/luks-header-$HOSTNAME.gpg /boot/backup

# Backup LVM config
sudo tar cvzf /boot/backup/etc_lvm.tgz /etc/lvm
sudo chmod 400 /boot/backup/etc_lvm.tgz

Improve performance

Check Cloudflare patches.

Troubleshooting

Some recommendation on thesimplecomputer.info:

Check the UUID in /etc/fstab and /etc/crypttab
A possible Plymouth issue [31]
Check boot flag on boot partition, and then after chrooting in the installed system:

sudo mkinitrd
sudo update-initramfs -u

File systems

ext2, ext3, ext4

To create a new ext4 partition on a brand new disk:

# Identify the new disk, say /dev/sdb
sudo fdisk -l

# Create a partition
# ... we use fdisk which now supports GPT table
sudo fdisk /dev/sdb
# g - create GPT table
# n - new partition, accept default values for size etc
# w - write the changes

# Format
sudo mkfs.ext4 /dev/sdb1

btrfs

ZFS, BTRFS, XFS, EXT4 and LVM with KVM – a storage performance comparison

zfs

No, ZFS really doesn't need a fsck

This explains how ZFS resists to hardware failure thanks to copy-on-write and micro-snapshot, even in case of sloppy hardware component.

Hacker News post on this article

vfat

To format

sudo mkfs.fat -F 32 -s 64 -S 512 -v '/dev/mmcblk0p1'
# -F 32: 32-bit fat
# -s 64: 64 sectors / cluster (32kB)
# -S 512: 512 bytes / sector

Monitoring and auditing

fsck

Running at reboot

How to force fsck to check filesystem after system reboot on Linux
Make sure fsck PASS is set to 1 in /etc/fstab (6th column)

/dev/vda1  /  ext3  errors=remount-ro  0  1

Force fsck every 15 reboots (standard is either -1 or 30):

tune2fs -l /dev/vda1 | grep -i "mount count"
# Mount count:              19
# Maximum mount count:      -1
tune2fs -c 15 /dev/vda1
tune2fs -l /dev/vda1 | grep -i "mount count"
# Mount count:              19
# Maximum mount count:      15

Running as cron job

Reddit -- Can I set fsck as a cron job?

Running fsck in a cron job is a bad idea. It can report false positive (typically orphaned inode because of deleted but still opened files) and actually wear the system. Better is to use more robust file system.

dosfsck

Use dosfsck (package dosfstools to collect advanced information on a vfat partition

sudo dosfsck -v /dev/sdb1
# fsck.fat 4.2 (2021-01-31)
# Checking we can access the last sector of the filesystem
# There are differences between boot sector and its backup.
# This is mostly harmless. Differences: (offset:original/backup)
#   65:01/00
# 1) Copy original to backup
# 2) Copy backup to original
# 3) No action
# [123?q]? 3
# Boot sector contents:
# System ID "MSWIN4.1"
# Media byte 0xf8 (hard disk)
#        512 bytes per logical sector
#      32768 bytes per cluster
#       7680 reserved sectors
# First FAT starts at byte 3932160 (sector 7680)
#          2 FATs, 32 bit entries
#    7569408 bytes per FAT (= 14784 sectors)
# Root directory start at cluster 2 (arbitrary size)
# Data area starts at byte 19070976 (sector 37248)
#    1891590 data clusters (61983621120 bytes)
# 63 sectors/track, 128 heads
#     109824 hidden sectors
#  121099008 sectors total
# Checking for unused clusters.
# Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
# 1) Remove dirty bit
# 2) No action
# [12?q]? 2
# Checking free cluster summary.
# /dev/sdb1: 3956 files, 623168/1891590 clusters

badblocks

Use badblocks to detect bad sectors on the disk.

Identify damaged files.

Error correction

On faulty disk, it might be interesting to use filesystems that are tolerant to faults (eg. XFS, BTRFS) or use tools like par2 (Linux Commands) to create error corrections files (ECC).

Directory hierarchy - Linux/POSIX model

/ and /usr are the domain of the distribution. User must not install anything there. Administrative group is root.
/usr/local is the domain of the machine admin. Administrative group is staff.
/opt/$vendor is the domain of third-party vendors.Each vendor (like Steam, Eclipse, Arduino Studio) can get its own subdirectory and its own administrative user group.