Linux Admin: Difference between revisions

From miki
Jump to navigation Jump to search
 
Line 170: Line 170:


* '''(Web-UI)''' [http://munin-monitoring.org/ <code>munin</code>]
* '''(Web-UI)''' [http://munin-monitoring.org/ <code>munin</code>]

* '''BPF-based''': [https://0x.tools/ 0x.tools], [https://github.com/elastic/otel-profiling-agent otel-profiling-agent]
:* More info on BPF: [https://brendangregg.com/ebpf.html Linux Extended BPF (eBPF) Tracing Tools]. See also [https://news.ycombinator.com/item?id=40869877 HN comments]


=== Disk / file monitoring ===
=== Disk / file monitoring ===

Latest revision as of 06:05, 5 July 2024

Documentation / Getting Help

yelp
Default Gnome help system. Contains basic documentation, manpages, and guides (which can even install applications if clicked on)
doc-base
The doc-base package implements a flexible mechanism for handling and presenting documentation. See doc-base on debian.org.
dwww
dwww is the web base documentation reader. When installed, you can browse the documentation available on your machine by opening your browser at http://localhost/dwww/. dwww has also command-line support.

Kernel

Architecture (32/64-bit)

32-bit executables can still run on 64-bit architecture (amd-64). Check package ia32-libs.

Note that 32-bit libraries are located in /usr/lib32 and not in /usr/lib

OOM Score (Out of Memory)

Kernel has an advanced algorithm to detect which process to kill when an Out of Memory occures (from [1]):

[...] But, actually, Linux doesn't just pick the process with the failed allocation to kill. Instead, when a process makes a memory request which cannot be fulfilled, the OS runs a quick calculation of the memory usage "badness" of all processes. The base of the badness score is the processes resident memory, plus the resident memory of child processes. Processes that have been "nice'd" get a score boost (on the theory they're likely to be less important), but long-running processes get a score decrease (on the theory they're likely to be more important). Superuser processes have their score decreased. Finally, processes have their scores decreased by a user-settable value in /proc//oom_adj (default is no adjustment). Also, if /proc//oom_adj is set to the constant OOM_DISABLE, then the process is not killable.

When memory runs out, Linux kills the process with the highest score. If a single ordinary user process, especially a short-lived desktop process, has consumed nearly all of the system RAM, and no one has messed with oom_adj for that process, then it WILL be the one that dies. [...]

The OOM score of each process can be obtained with:

find /proc -maxdepth 2 -name oom_score | while read i; do echo -n "$i "; cat $i; done | sort -n -k2

/etc/sudoers

The man page gives a complete but unclear description of the file specification. Here a simplified but complete version:

First the description of possible entries in the file:

# Alias
'User_Alias'  NAME '=' User...         (':' NAME '=' User...        )*
'Runas_Alias' NAME '=' Runas_Member... (':' NAME '=' Runas_Member...)*
'Host_Alias'  NAME '=' Host...         (':' NAME '=' Host...        )*
'Cmnd_Alias'  NAME '=' Cmnd...         (':' NAME '=' Cmnd...        )*

#Default_Entry
'Defaults' ('@' Host... | ':' User... | '!' Cmnd... | '>' Runas_Member...)? Parameter...

#User_Spec
User... Host... '=' Cmnd_Spec...       (':' Host... '=' Cmnd_Spec...)*

Now the description of the syntactical elements (note the description of ..., which is simply a comma-separated list):

identifier... ::= identifier (',' identifier)*

NAME          ::= [A-Z]([a-z][A-Z][0-9]_)*

User /
Runas_Member  ::= '!'* ( username | '#'uid | '%'group | '+'netgroup | Alias | 'ALL' )

Host          ::= '!'* ( hostname | ip_addr | network(/netmask)? | '+'netgroup | Alias| 'ALL' )

Cmnd          ::= '!'* ( command filename (args | '""')? | directory | "sudoedit" | Alias | 'ALL' )

Parameter     ::= Parameter '=' Value | Parameter '+=' Value | Parameter '-=' Value | '!'* Parameter

Cmnd_Spec     ::= ('(' Runas_Member...? (':' ...? ')')? ('NOPASSWD:'|'PASSWD:'|'NOEXEC:'|'EXEC:'|'SETENV:'|'NOSETENV:')* Cmnd


  • HTTP Proxy — When using a HTTP proxy defined through the variable http_proxy, you have to add/change the following lines to /etc/sudoers:
Defaults	env_reset, env_keep=http_proxy

File Systems & Backup

See page Linux Disk Management.

sysfs and configfs

Reference:

The virtual filesystem sysfs contains device and driver information as well as various kernel data. sysfs provides an interface between kernelspace and userspace.

/sys/block/
This directory contains shortcuts to the sysfs files of the selected block device.
/sys/bus/
This directory contains the sysfs files and data for the different buses on the system.
/sys/class/
This directory contains folders named by device type like "printers", "mem", "leds", "input", etc. The subdirectories then contain shortcuts to the sysfs files pertaining to the selected device. This is useful for when a user is looking for a particular device. They can then go through this route.
/sys/dev/
Inside, there are two folders - "block" and "char" which direct users to the block and character devices respectively.
/sys/devices/
Most of the symbolic/soft links (shortcuts) in the sysfs system link to devices and files here.
/sys/firmware/
Files containing information and settings for the system's firmware resides here.
/sys/fs/
Information and settings pertaining to each individual mounted filesystem is placed here by filesystem type. This means there is a folder titled "ext4" which holds the folders for each device/partition with ext4.
/sys/hypervisor/
Data pertaining to the hypervisor is stored here.
/sys/kernel/
Settings, information, and security policies reside in this folder.
/sys/module/
All of the loaded modules can be seen here. Each folder contains information and settings for that particular module.
/sys/power/
This directory contains files with information on the power state, the number of times the system hibernated/slept, etc.

configfs is also a virtual filesystem on memory. configfs is used to change, make, and delete kernel objects and data. configfs is a kernel manager in the form of a filesystem.

To mount configfs:

sudo mount -t configfs none /sys/kernel/config

Settings in configfs are usually set as (with root privilege):

echo "STRING" > /config/FILE/PATH

System

top, vmstat, lsof, tcpdump, netstat, htop, iotop, iostat, iptraf, psacct or acct, Monit, nethogs, iftop, Monitorix, Arpwatch, Suricata, VnStat PHP, NMon, Collectl.

Some more tools:

System monitoring

  • (CLI) ps — displays the processes
  • (CLI) free — Display amount of free and used memory in the system
  • (CLI) vmstat — system activity, hardware and system information
  • (CLI) strace — trace system calls and signals
  • (CLI) ltrace — trace calls to dynamic libraries
  • /proc file system — various kernel statistics
cat /proc/stat
cat /proc/$PID/stat
cat /proc/cpuinfo
cat /proc/meminfo
cat /proc/zoneinfo
cat /proc/mounts
  • (CLI) mpstat — multiprocessor usage
Part of sysstat package.
  • (CLI) pmap — report memory map of a process
  • (CLI) smem a more clever tool to monitor memory usage
Can report RSS, but also PSS (proportional set size), a more meaninful representation of memory used [2]
Nice!
  • (TUI) top — process activity command
  • (TUI) atop — AT Computing's System & Process Monitor
BEST! An even better top program, with disk io stats, busy rate... Very easy to see what's going on. Use h for help, t to trigger update, 1 to switch to value/sec.
Nice! An improved top program.
  • (CLI) sar — collect and report system activity
Part of sysstat package.
  • sysstat suite, containing sar, iostat, tapestat, mpstat, pidstat, sadf, cifsiostat.
Nice! With data collection...
  • (GUI) KDE System Guard - Real-time Systems Reporting and Graphing
  • (GUI) Gnome System Monitor - Real-time Systems Reporting and Graphing
  • (GUI) Conky — Lightweight system monitoring for X
  • (Web-UI) Nagios Server And Network Monitoring
  • (Web-UI) Cacti - Web-based Monitoring Tool

Disk / file monitoring

  • (CLI) iostat — average cpu load, disk activity
Part of sysstat package.
  • lsof — list open files
lsof -f | grep bash
lsof -p $$            # view all files opened by current shell

Network monitoring

  • (CLI) netstat — Print network connections, routing tables, interface statistics, masquerade connections, and multicast membershipsand
  • (CLI) ss — another utility to investigate sockets
ss -tupan         # Roughly equivalent to netstat -lpn
  • (CLI) iptraf — real-time network statistics
  • (CLI) tcpdump — detailed network traffic analysis
  • ifstat — shows network traffic by interface in a vmstat/iostat-like manner
  • (CLI) nethogs — Nethogs monitors traffic going to/from a machine, per process.
Nice!
  • (TUI) iotop — simple top-like I/O monitor
  • (TUI) bmon — bandwidth monitor and rate estimator
Nice! A kind of console mode darkstat.
  • (TUI) iftop — display bandwidth usage on an interface
shows network traffic by service and host
  • (GUI) darkstat— Captures network traffic, calculates statistics about usage, and serves reports over HTTP.
Breaks down traffic by host, protocol, etc. Geared towards analysing traffic gathered over a longer period, rather than `live' viewing.
Nice!
  • (Web-UI) ntopng — High-speed web-based traffic analysis and flow collection tool
See ntopng.

Network tools

  • ettercap is a network sniffer/interceptor/logger for ethernet

Other tools

  • w — find out who is logged on and what they are doing
  • uptime — tell how long the system has been running
  • uptimed and uprecords — daemon to record uptime records
uprecords -a -B
#      #               Uptime | System                                     Boot up
# ----------------------------+---------------------------------------------------
# ->   1     0 days, 00:35:30 | Linux 4.19.0-2-amd64      Wed Mar 27 09:12:36 2019
#      2     1 day , 02:42:32 | Linux 4.19.0-2-amd64      Tue Mar 26 06:29:24 2019
#      3     9 days, 09:16:49 | Linux 4.19.0-2-amd64      Sat Mar 16 21:12:05 2019

Monitor a process activity

  • strace is the tool of choice to monitor a process activity.
  • Detect library usage (to debug missing library for instance)
  • Detect all IO activity
  • And more...
strace myprogram
strace -p $PID 
strace -f myprogram          # Follow child processes
  • lsof (list open files) to view files opened by a given process, etc:
lsof -f | grep rsync
lsof -p $$
  • ldd and readelf to list libraries
ldd /bin/ls                  # List libraries loaded by given executable
readelf -a /bin/ls           # List libraries, and much more
  • Monitor access to log file. For instance with:
ls -lS /var/log/*log | head
  • See apropos process for even more tools.
apropos process
# ...
# fuser (1)            - identify processes using files or sockets
# ...
  • st (space-time) that monitors memory usage over time, and number of forks/threads.

Stop / continue a process

pkill -TSTP process     # STOP gracefully
pkill -STOP process     # STOP forcefully
pkill -CONT process     # Resume (CONT) a stopped process

IO usage tips

To view IO usage of a specific process, check /proc/self/io, /proc/...pid.../io:

cat /proc/1234/io               # View usage of process pid 1234
cat /proc/$(pgrep '^dd$')/io    # View usage of process(es) named 'dd'
cat /proc/self/io               # View udage of current process
# rchar: 2012
# wchar: 0
# syscr: 7
# syscw: 0
# read_bytes: 36864
# write_bytes: 0
# cancelled_write_bytes: 0

Memory usage tips

Free buffers and cache

To free buffers and cache [3]
free && sync && echo 3 > /proc/sys/vm/drop_caches && free

Get peak memory usage of a process

From StackExchange [4]:

  • Use cgmemtime, measures the high-water RSS+CACHE memory usage of a process and its descendant processes.
Use kernel cgroup to measure memory usage of process and descendants:
sudo ./cgmemtime --setup -g myusergroup --perm 775
# This created /sys/fs/cgroup/memory/cgmemtime
./cgmemtime ./testa x 10 20 30
tstime date
  • Use memusg, a script that does polling.
However polling is a poor solution to measure peak statistics.
  • Use standard /usr/bin/time:
/usr/bin/time -v sort myfile

Manage Out-of-Memory Killer

When the system is getting out-of-memory (OOM), the kernel triggers the OOM killer. The OOM killer computes the oom_score of every process, and kill the process with highest score. This score can be adjusted through oom_adj (ranging from -16 to +15) or oom_score_adj (rangin from -1000 to +1000)

To see the oom_score of every process [5]:

# Print directly from /proc
cat /proc/*/oom_score

# A nicer list with pid / oom_score / process name
while read -r pid comm; do 
  printf '%d\t%d\t%s\n' "$pid" "$(cat /proc/$pid/oom_score)" "$comm"
done < <(ps -e -o pid= -o comm=)

Likewise the oom_adj oom_score_adj are found in /proc:

cat /proc/*/oom_adj
cat /proc/*/oom_score_adj

To change the adjust value [6]:

# Change oom_score_adj of current process
echo 1000 > /proc/$$/oom_score_adj # OR
echo 1000 > /proc/self/oom_score_adj

# Change score on process startup
choom -n 1000 COMMAND [OPTION...]

Note that the oom_score_adj are inherited by forked processes. Also, setting negative values requires root access.

Send commands via D-BUS

See D-Bus.

Power management

See Power management.

BIOS update

Using fwupdmgr (for system BIOS, Thunderbolt docking station...).

  • Tested successfully on HP EliteBook G4 + Thunderbolt docking station.
# Refresh DB
sudo fwupdmgr refresh

# Get list of available update
sudo fwupdmgr get-upgrades

# Update all devices
sudo fwupdmgr update

Hardware

Links

Speaks about lspci but also GUI tool hardinfo.

Info / troubleshooting commands

Interesting commands:

lshw -C network
lshw -C display                  # See video controller
lshw -C display | grep driver    # ... see driver in use
  • lspci, listing all PCI devices
lspci                            
lspci | grep -i wireless         # Write down sloce id of device
lspvi | grep -i vga              # See video controller
sudo lspci -vv -nn -s 0c:        # Slot id 0c:...
  • lsmem, listing the ranges of available memory with their online status
  • lscpu, displaying information about the CPU architecture
  • inxi, Command line system information script for console and IRC
inxi -Gx                         # as current X desktop user, no sudo
  • lsmod, Show the status of modules in the Linux Kernel
lsmod | sort
  • modprobe, add and remove modules from the Linux kernel
  • modinfo, show information about a kernel module (incl. available parameters)
modinfo iwlagn
To get BIOS version information:
sudo dmidecode | grep "BIOS Information" -A3 
# BIOS Information
# 	Vendor: Hewlett-Packard
# 	Version: L71 Ver. 01.30
# 	Release Date: 12/09/2014
  • uname, print system information
uname -rm
lsusb
# Bus 008 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
# Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
# Bus 006 Device 008: ID 056e:0056 Elecom Co., Ltd 
# Bus 006 Device 006: ID 1131:1004 Integrated System Solution Corp. Bluetooth Device

lsusb -t
# /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ohci_hcd/8p, 12M
#     |__ Port 1: Dev 2, If 0, Class=HID, Driver=usbhid, 12M
# /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci_hcd/8p, 480M
USB devices are located at /dev/bus/usb, according to their bus and device id (for instance /dev/bus/usb/001/002 for bus 001, dev 002)
usb-ctrl -v                 # Get list of devices and status
usb-ctrl -b 1 -d 1 -P 1     # Shut down port 1 on bus 1, dev 1
usb-devices
# T:  Bus=01 Lev=02 Prnt=02 Port=03 Cnt=02 Dev#=  4 Spd=12  MxCh= 0
# D:  Ver= 2.00 Cls=ff(vend.) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
# P:  Vendor=413c ProdID=8197 Rev=01.12
# S:  Manufacturer=Dell Computer Corp
# S:  Product=DW380 Bluetooth Module
# S:  SerialNumber=2016D895569F
# C:  #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=0mA
# I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=01 Driver=btusb
# I:  If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=btusb
# I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
# I:  If#= 3 Alt= 0 #EPs= 0 Cls=fe(app. ) Sub=01 Prot=01 Driver=(none)
  • Debug information
dmesg | egrep -i "wlan|iwl"
  • To test availability of OpenGL:
glxinfo | grep -i direct
# The result should be: 
#   direct rendering: Yes
  • Input devices in /proc (some not listed by commands above — e.g. synaptic touchpad):
cat /proc/bus/input/devices         # List input devices
  • udevadm — Query udev database
udevadm info -q all -n /dev/bus/usb/002/001             # Can query using the device name 
udevadm info -q all -p /sys/bus/usb/devices/usb2        # Or using udev device paths (or symlink)
udevadm info -a -n /dev/bus/usb/002/040                 # Walk up the chain of parent device, and for each, show attributes
  • View DHCP lease info on client:
cat /var/lib/dhcp/dhclient.leases

WiFi

See Wifi.

Firmware

Install package firmware-linux or firmware-linux-nonfree to update hardware with latest firmware

sudo apt install firmware-linux-nonfree
# sudo apt install firmware-linux

Sometimes we get messages like the following when running apt (actually generated by update-initramfs)

update-initramfs: Generating /boot/initrd.img-5.17.0-1-amd64
W: Possible missing firmware /lib/firmware/i915/adlp_dmc_ver2_14.bin for module i915

We can find the latest available firmware on the kernel git linux-firmware.git.

To fix the message, copy the firmware to /lib/firmware (see also note below), and reconfigure the package

sudo cp *.bin /lib/firmware/i915                  # for i915 firmware
sudo dpkg-reconfigure linux-image-5.17.0-1-amd64  # Trigger initramfs again
Debian

(From Debian Wiki) firmware is sourced from the following places (see udev's /lib/udev/hotplug.functions and /lib/udev/firmware.agent):

  • /lib/firmware/$(uname -r) - Firmware provided by a package, specific for a kernel.
  • /lib/firmware/ - Firmware provided by a package, valid for all kernels.
  • /usr/local/lib/firmware - Location for manually installed firmware.
NOTE: It seems this directory is ignored, we still get the msg about missing firmware, and the message disappear if we copy it in /lib/firmware/i915
  • /usr/lib/hotplug/firmware - Firmware provided by a package, valid for all kernels

Wireless firmware

Locate your wireless card:

lspci | grep -i network
# 0c:00.0 Network controller: Intel Corporation PRO/Wireless 5300 AGN [Shiloh] Network Connection
# =======
lspci -s 0c: -v
#0c:00.0 Network controller: Intel Corporation PRO/Wireless 5300 AGN [Shiloh] Network Connection
#...
#	Kernel modules: iwlagn
#                       ======

To get driver version:

sudo lshw -C network
# configuration: broadcast=yes driver=iwlagn driverversion=2.6.32-32-generic-pae firmware=8.24.2.12 latency=0 link=yes multicast=yes wireless=IEEE 802.11abgn
#                                                                                ==================

Print module information, the loaded firmware is located at the very beginning:

modinfo iwlagn
# filename:       /lib/modules/2.6.32-32-generic-pae/updates/compat-wireless-2.6.37/iwlagn.ko
#                                                            ====================== ----------------> interesting
# description:    Intelsl(R) Wireless WiFi Link AGN driver for Linux
# ...
# firmware:       iwlwifi-4965-2.ucode                        \
# firmware:       iwlwifi-5150-2.ucode                        |
# firmware:       iwlwifi-5000-2.ucode                        |
# firmware:       iwlwifi-130-5.ucode                         |
# firmware:       iwlwifi-6000g2b-5.ucode                     |
# firmware:       iwlwifi-6000g2a-5.ucode                     |___ The firmwares used by this driver
# firmware:       iwlwifi-6050-5.ucode                        |
# firmware:       iwlwifi-6000-4.ucode                        |
# firmware:       iwlwifi-100-5.ucode                         |
# firmware:       iwlwifi-1000-3.ucode                        /

For INTEL cards:

Other links:

System information

  • lshw is available by default,
  • or use sysinfo (sudo apt-get install sysinfo),
  • or use hardinfo (sudo apt-get install hardinfo),

To dump BIOS data:

  • sudo dmidecode
  • sudo dmidecode -t system
  • sudo dmidecode -t system | grep Serial, this returns the serial number label on HP computer for instance.

udev & devfs

Reference: [7]

This chapter is about the devices in /dev. Since kernel 2.6, the content of this directory is generated by udev rules.

These rules are located at:

  • /lib/udev/rules.d
  • /etc/udev/rules.d (these can be customized)

Use udevadm to get information on a given device:

udevadm info -q path -n /dev/sda2                                     # To get the path to the device /dev/sda2
udevadm info -q -n /dev/sda2                                          # To get all the attributes of device /dev/sda2
udevadm info -a -p $(udevadm info -q path -n /dev/sda2)               # Same as above
udevadm test $(udevadm info -q path -n /dev/sda2) 2>&1 | grep OWNER   # Test the effect of a new rule on device /dev/sda2

USB Serial (FTDI, Prolific)

The usbserial module (usbserial.ko) provides a generic serial interface (aka virtual COM port in windows) for USB devices. They are basically two (low cost) chip providers on the market: FTDI and Prolific. For FTDI chips, drivers are included in kernel since 2.6.31. Module is called ftdi_sio, and exposes FTDI devices as generic usbserial device.

When connecting the device, the following can be seen in /var/log/messages

dmesg
usb 7-1: new full speed USB device using uhci_hcd and address 18
usb 7-1: configuration #1 chosen from 1 choice
ftdi_sio 7-1:1.0: FTDI USB Serial Device converter detected
usb 7-1: Detected FT232RL
usb 7-1: Number of endpoints 2
usb 7-1: Endpoint 1 MaxPacketSize 64
usb 7-1: Endpoint 2 MaxPacketSize 64
usb 7-1: Setting MaxPacketSize 64
usb 7-1: FTDI USB Serial Device converter now attached to ttyUSB0
usb 7-1: USB disconnect, address 18

Now, the device is available as /dev/ttyUSB0

Problems, Issues, Workaround
  • If you see a disconnect message, and no /dev/ttyUSB0 showing up:
ftdi_sio ttyUSB0: FTDI USB Serial Device converter now disconnected from ttyUSB0
ftdi_sio 7-1:1.0: device disconnected
The most probable cause is that you have package brltty installed. This package automatically identifies device with an FTDI chip as a braille device. The work-around is simply to uninstall the package.
  • VirtualBox might also interfere with the usb device. User can give a list of USB devices that a virtual machine should immediately connect to when the device is plugged in. If the FTDI device is listed, VirtualBox will grab the device and it will not be available to the host. Simply unlist the device to get it back available on the host.
  • Other potential conflicts around libusb, usbfs, libftdi
  • Unsufficient access rights (typically user must be member of group dialout)
ls -l /dev/ttyUSB*
# crw-rw---- 1 root dialout 188, 0 2011-09-22 17:26 /dev/ttyUSB0
sudo gpasswd -a $USER dialout
id
# uid=6659(beq06659) gid=6659(beq06659) groups=4(adm),20(dialout),...
Links


Advanced Kung-fu (Create a symlink with udev when attaching a device)
find /sys -name ttyUSB*
#Look up the attributes of the directory that should (hopefully) be listed (following is guestimate).

udevinfo -a -p /sys/class/tty/ttyUSB0
# pick a unique attribute to copy.

sudo gedit /etc/udev/rules.d/60-symlinks.rules

# Add lines:
# # Create /dev/bstamp symlink for FTDI Device
# KERNEL=="ttyUSB*", ATTRS{product}=="FT232RL", \
# SYMLINK+="bstamp"
#
# ATTRS{attr} will be the information taken from the undevinfo command.

sudo /etc/init.d/udev restart


USB Serial on cygwin

Some interesting links:

This explains that after setting up manually the COM: port number in windows (say COM16), one can access that port in cygwin by using /dev/ttyS15 or /dev/com16. However there might be a limitation that cygwin only allow up to 16 serial ports (from /dev/com1 up to /dev/com16), but maybe this limitation is not there anymore.
Some answer in the post saying that cygwin support more than 16 ports (it mentions /dev/ttyS26). However it mentions also another issue that the input blocking until a CR is received.
Some post from Corinna Vinschen, warning not to use windows names \\.\COMx or COMx in cygwin, or will not get any POSIX Serial I/O support from Cygwin.

NVidia

Check nvidia website for detailed information on nvidia drivers for linux:

Some tips:

  • Use xdpyinfo to show the current extensions. For OpenGL, it should show extension "glx" and "nv-glx".
  • Check dmesg for error messages related to nvidia
  • Force reload of module with modprobe nvidia
  • Prevent Nouveau from being loaded. Create a file /etc/modprobe.d/disable-nouveau.conf:
blacklist nouveau
options nouveau modeset=0
Note that this will not prevent the X server to load Nouveau. If loaded, this can be unloaded with modprobe -r nouveau, as long as Nouveau has been prevented from doing a kernel modeset.

My /etc/X11/xorg.conf on Maverick:

Section "Screen"
	Identifier	"Default Screen"
	DefaultDepth	24
EndSection

Section "Module"
	Load	"glx"
EndSection

Section "Device"
	Identifier	"Default Device"
	Driver	"nvidia"
	Option	"NoLogo"	"True"
EndSection

BlueTooth

  • Install BlueTooth Manager to solve bluetooth connection issue (package blueman)
    • For instance, solved issues I had with Microsoft Sculpt Mouse (pair device, then set it as trusted, then connect) (see [8], [9])

Udisks

The [udisks daemon serves as an interface to system block devices. It is responsible to mount automatically inserted DVDs.

Troubleshooting - Fix broken permissions on UDF
  • Some DVD recorder do not set correctly the directory permissions on DVDs (UDF file system) (missing 'x' flags)
  • There is a fix in udisk that force directory permissions to 0500, but only if DVD is read-only (bug 635499, see fix [10])
  • The fix does not work for DVD-RW that are not finalized yet. These discs are then not readable on Linux.
  • There is no way apparently to override the default mount options in udisks [11], [12]). So our only hope is to patch the udisks package directly.
  • The following is a patch on udisk=1.0.4-5ubuntu2.1 to force dmode=0500 for all optical discs with UDF file systems (to rebuild a package, see this page):
diff --git a/src/device.c b/src/device.c
index a7f8880..3174628 100644
--- a/src/device.c
+++ b/src/device.c
@@ -6204,10 +6204,8 @@ struct Job
   /* dynamic default options */
 
   /* some broken DVDs come with 0400 directory permissions, making them
-   * unreadable; overwrite readonly UDF media with a 0500 dmode. */
-  if (g_strcmp0 (device->priv->id_type, "udf") == 0 && device->priv->device_is_optical_disc &&
-      device->priv->drive_media != NULL && 
-      strstr(device->priv->drive_media, "_rw") == NULL && strstr(device->priv->drive_media, "_ram") == NULL)
+   * unreadable; overwrite all UDF media with a 0500 dmode. */
+  if (g_strcmp0 (device->priv->id_type, "udf") == 0 && device->priv->device_is_optical_disc)
     {
       g_ptr_array_add (options, g_strdup("dmode=0500"));
     }
  • Install the fixed package, and check that mount has the new dmode=0500 flag:
/dev/sr0 on /media/DVD VR type udf (ro,nosuid,nodev,uid=6659,gid=6659,iocharset=utf8,umask=0077,dmode=0500,uhelper=udisks)
  • Alternate patch. This force flag dmode=0500 for all UDF file system, similar to iso9660:
diff --git a/src/device.c b/src/device.c
index 2d7621b..a7f8880 100644
--- a/src/device.c
+++ b/src/device.c
@@ -5926,7 +5926,7 @@ struct Job
 
 /* ---------------------- udf -------------------- */
 
-static const char *udf_defaults[] = { "uid=", "gid=", "iocharset=utf8", "umask=0077", "dmode=0500", NULL };
+static const char *udf_defaults[] = { "uid=", "gid=", "iocharset=utf8", "umask=0077", NULL };
 static const char *udf_allow[] = { "iocharset=", "umask=", "mode=", "dmode=", NULL };
 static const char *udf_allow_uid_self[] = { "uid=", NULL };
 static const char *udf_allow_gid_self[] = { "gid=", NULL };
@@ -6203,6 +6203,15 @@ struct Job
 
   /* dynamic default options */
 
+  /* some broken DVDs come with 0400 directory permissions, making them
+   * unreadable; overwrite readonly UDF media with a 0500 dmode. */
+  if (g_strcmp0 (device->priv->id_type, "udf") == 0 && device->priv->device_is_optical_disc &&
+      device->priv->drive_media != NULL && 
+      strstr(device->priv->drive_media, "_rw") == NULL && strstr(device->priv->drive_media, "_ram") == NULL)
+    {
+      g_ptr_array_add (options, g_strdup("dmode=0500"));
+    }
+
   /* user supplied options */
   for (n = 0; given_options[n] != NULL; n++)
     {

For udisk2:

diff --git a/src/udiskslinuxfilesystem.c b/src/udiskslinuxfilesystem.c
-static const gchar *udf_defaults[] = { "uid=", "gid=", "iocharset=utf8", "umask=0077", NULL };
-static const gchar *udf_allow[] = { "iocharset=", "umask=", NULL };
+static const gchar *udf_defaults[] = { "uid=", "gid=", "iocharset=utf8", "umask=0077", "dmode=0500", NULL };
+static const gchar *udf_allow[] = { "iocharset=", "umask=", "dmode=", NULL };

Software

Packages

See page Package Management.

Libraries

See the Library HOWTO.

Static Libraries

See Library HOWTO - Static Libraries

Shared Libraries

See Library HOWTO - Shared Libraries

Path conventions according to the info:standards#Directory_Variables GNU Standards (used by developers):

  • /usr/local/lib: for all libraries when distributing source code (executables go to /usr/local/bin).

Path conventions according to the Filesystem Hierarchy Standard) (used by distributors through package management)

  • /usr/lib: for most libraries (executables go to /usr/bin, executables that users should not call directly go to /usr/libexec/).
  • /lib: for libraries needed at boot time.
  • /usr/local/lib: for libraries that are not part of the system (/usr/local/bin for executables, and /usr/local/libexec for library executable)
soname — real name — linker name
  • /usr/lib/libreadline.so.3 is a fully-qualified soname (symlinked to realname below by ldconfig)
  • /usr/lib/libreadline.so.3.0 is the realname
  • /usr/lib/libreadline.so is the linker name (symlinked to soname /usr/lib/libreadline.so.3)
Environment variables
  • LD_LIBRARY_PATH temporarily overrides the usual library path for a given executable (should only be used for debugging)
  • LD_DEBUG triggers debugging in C loader (e.g. LD_DEBUG=files /bin/ls)
Utilities
ldconfig -n directory_with_shared_libraries      #Creates soname links to realname when installing new libraries
ldd /bin/ls                                      #List shared libraries needed by a given executable
List symbols exported by libraries

From SO:

nm -g yourLib.so      # Standard tool 
nm -gC yourLib.so     # ... C++ lib, demangle the symbols
objdump -TC libz.so   # For library in ELF format - -C to demangle
readelf -Ws libz.so   # ... Another solution for ELF
readelf -a /some/exec

Dynamically Loaded (DL) Libraries

See Library HOWTO - Dynamically Loaded Libraries.

Managing Alternatives

For instance, to define the default cursor-theme, use update-alternatives:

sudo update-alternatives --config x-cursor-theme

Troubleshooting

Use LD_PRELOAD to fix symbol '...': can't resolve symbol

cat bup_intl_issue.py
# import posix1e;           # From package pylibacl
python bup_intl_issue.py
# /opt/bin/python2.7: symbol 'libintl_gettext': can't resolve symbol
# Traceback (most recent call last):
#   File "bup_intl_issue.py", line 1, in <module>
#     import posix1e;
# ImportError: unknown dlopen() error

We can fix the issue using the LD_PRELOAD trick though:

LD_PRELOAD=/opt/lib/libintl.so python2.7 bup_intl_issue.py

No such file or directory when executing a program

Bash returns an error No such file or directory even though the executable is there:

./lmgrd
# bash ./lmgrd: No such file or directory
ls -l ./lmgrd
# -rwxr-xr-x 1 superman superman 1.5 Nov 21  2016 ./lmgrd

The problem is that the executable requires a library that is not present (likely the interpreter):

readelf -a ./lmgrd | grep -i interpreter               # Could also do 'ldd ./lmgrd'
#    [Requesting program interpreter: ./lib64/ld-lsb-x86-64.so.3]
cd /lib64
sudo ln -sf /lib/x86_64-linux-gnu/ld-2.19.so ld-lsb-x86-64.so.3

Troubleshooting process using libusb

strace is the tool of choice.

strace someprocess &> strace.txt
  • We look for the application error messages in strace.txt.
  • Look at system calls that happened before that error messages. Is the errror due to bad permission, or device busy?
  • Bad permission, check device permission (usually /dev/bus/usb). Probably the device needs a udev rule to setup the permission.
  • Device busy, check for which module is grapping the device. Maybe this device must be black-listed in /etc/modprobe.d.
  • Look which libraries are loaded (calls to open). Are all libraries found?

It might be necessary to follow child processes. Use stace -f for this.

Another useful tool is strace (to trace calls to dynamically-linked libraries).

Network

See also Linux networking and Wifi.

Commands

See Hardware section above.

ZeroConfig

ZeroConfig refers to all utilities that help setting up network without any additional configuration. Somes links:

To install on Linux (Ubuntu):

sudo apt-get install libnss-mdns avahi-daemon mdns-scan
Address resolution
To be completed.
Name resolution
  • By default, any given host can be accessed via the name hostname.local without need of a local DNS server.
ping myserver.local
  • Name resolution relies on mDNS (multicast DNS) protocol. mDNS client makes a request to a well-known multicast address (224.0.0.251 for IPv4 and ff02::fb for IPv6 link-local addressing). On Linux, avahi package implements the Apple Zeroconf specification.
  • To list all names that are broadcasted:
mdns-scan
Service discovery
To be completed?

Network Manager - Search Path

See NetworkManager Ubuntu documentation for how to add a static local domain to resolv.conf search path.

Basically:

  1. In the NM applet, changed the network from DHCP (auto) to DHCP (address only)
  2. Edit the network configuration file in /etc/NetworkManager/system-connections to appear as follows:
  3. [ipv4]
    method=auto
    dns-search=domain1.com;domain2.org;domain3.edu;
    ignore-auto-routes=false
    ignore-auto-dns=false                                # !!! Set this line back to FALSE !!!
    
  4. Select the network in the wired network

Import Windows Settings for Enterprise Wireless Network (Dynamic WEP or WPA & WPA2 Enterprise, TLS)

This chapter explains how to import the network configuration settings from Windows for an enterprise wireless network using Dynamic WEP (802.1x), with TLS authentication.

  1. In Windows, export the client Authentication certificate and private key from Windows Certificate Store:
    • In Control PanelInternet OptionsContent tab, click Certificates, or
      alternatively, type Win-R, certmgr.msc.
    • In the Personal tab, select the certificate used for client authentication, and click Export.
    • In the new window, click Next, then select Yes, export the private key and click Next.
    If the option export the private key is greyed out, see Windows Administration.
    • Select format Personal information interchange - PKCS #12 (.PFX), and select Include all the certificates in the certificate path if possible and Enable strong protection.
    • Select a password, and save the file (say mywindowscert.pfx).
  2. In Ubuntu, split the exported certificate in the components CA / Cert / Private key.
  3. Now create a new wireless network connection in Ubuntu:
    • Security: Dynamic WEP (802.1x) or WPA & WPA2 Enterprise
    • Authentication: TLS
    • Identity: the account name (this is not necessarily the same as the name whom the certificate was issued to)
    • User Certificate: mycert.crt.pem
    • CA certificate: mycert.ca.pem
    • Private key: mycert.key.p12
    • Private key password: as required

More information:

Firewall

To troubleshoot firewall connection issues:

  • See firewall log (ufw.log for UFW)
  • Use netstat:
#For instance, troubleshooting Samba server firewall issues:
service smb stop
netstat -ln > netstat-ln-smb.before
service smb start
netstat -ln > netstat-ln-smb.after
diff -u netstat-ln-smb.*

Network Time Protocol (ntp)

See ntp.

Chroot

See Chroot page.

TTY

Stuff to read some day:

Printing

CUPS


Reset CUPS printer
Go to http://localhost:631 to reset CUPS printer
More troubleshooting
See [13]
Setup backend error handler
  • See [14], [15]
  • beh is already installed on Ubuntu 12.04+.
  • Another solution is to edit the policy ([16]).
  • Default policy is retry-job, but I still get error message file is rejected. I tried then abort-job.
Command line interface
# Get printers status
lpstat -p
# printer Canon_MG5300_series is idle.  enabled since Wed 25 Dec 2019 06:52:57 PM CET
# printer DESKJET-940C is idle.  enabled since Wed 25 Dec 2019 06:50:32 PM CET
# printer TS8100 is idle.  enabled since Tue 13 Aug 2019 04:22:10 PM CEST

# printer Canon_MG5300_series disabled since Wed 25 Dec 2019 06:52:37 PM CET -
# 	Paused
# printer DESKJET-940C is idle.  enabled since Wed 25 Dec 2019 06:50:32 PM CET
# printer TS8100 is idle.  enabled since Tue 13 Aug 2019 04:22:10 PM CEST

# printer Canon_MG5300_series is idle.  enabled since Wed 25 Dec 2019 06:13:46 PM CET
# printer DESKJET-940C now printing DESKJET-940C-288.  enabled since Wed 25 Dec 2019 06:47:01 PM CET
# 	Processing page 2...
# printer TS8100 is idle.  enabled since Tue 13 Aug 2019 04:22:10 PM CEST

# Enable paused printer
cupsenable DESKJET-940C

Rescue

Some tips to rescue a broken linux installation.

Using GRUB

See Grub#Rescue on how to fix a broken GRUB installation or on how to use GRUB to fix a broken linux installation.

Kernel line

To boot a minimal bash shell, edit the kernel line as follows:

  • Change rorw to allow read-write access to file system
  • Add init=/bin/bash to run Bash shell

After that, one can uses eg. nano to edit text configuration files.

To get boot messages:

  • Remove quiet splash
  • Add --verbose

Alt-SysRq-REISUB / Alt-PrtScr-REISUB

Using the Magic SysRq key, one can usually reboot its system nicely (better than holding the power button for 5sec).

Press and hold Alt-SysRq, then press the following keys in sequence, waiting 1 second between each press: R, E, I,S, U, B.

On laptop, you often have to press and hold Alt, then Fn-SysRq, then release Fn while holding Alt-SysRq, and finally R E I S U B.

On keyboard without SysRq, the combination Alt-PrtScr also works.

unRaw      (take control of keyboard back from X),
 tErminate (send SIGTERM to all processes, allowing them to terminate gracefully),
 kIll      (send SIGKILL to all processes, forcing them to terminate immediately),
  Sync     (flush data to disk),
  Unmount  (remount all filesystems read-only),
reBoot.

A mnemonic "Reboot Even If System Utterly Broken", or BUSIER when read backwards

/proc/sysrq-trigger

SysRq can also be triggered by writting to /proc/sysrq-trigger [17]:

echo u | /proc/sysrq-trigger          # Remount all fs read-only

Troubleshooting

References:

Serial Console

References:

Use the serial console to debug kernel freezes, hangs... that leaves no traces in the kernel logs.

To enable the serial console, you need to have a serial port (named /dev/ttyS0 on Linux, COM1 on windows), and have a process like minicom that connects to that port on the machine to debug.

To enable, edit /etc/default/grub, and add the line:

-GRUB_CMDLINE_LINUX_DEFAULT="splash quiet"
+GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0 console=tty0 ignore_loglevel"

The first parameter will ensure that all kernel output is redirected to the serial console. The second parameter will ensure that the kernel output is still logged to the guest text console. The third parameter enforces the guest Linux kernel to print all kernel messages to the console.

Update grub and restart.

sudo update-grub
sudo shutdown -r now
VirtualBox

On VirtualBox [18], one can easily configure a virtual serial port. In the settings, open the Serial Ports tab:

  • Enable serial port COM1 (irq 4, io 0x3F8).
  • Select mode Raw File, and select a file on the host file system.

Restart the guest, and verify that the port is running:

sudo stty -F /dev/ttyS0 -a
# speed 9600 baud; rows 0; columns 0; line = 0;
# intr = ^C; quit = ^; erase = ^?; kill = ^U; eof = ^D; eol = <undef>;
# eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
# werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;

Another option is to use Host pipe, and use for instance minicom to attach to the named pipe [19], [20], [21].

Netconsole

References:

netconsole is similar to the serial console, except that messages are sent to a network socket instead of a serial link.

Setup Receiver side

On the remote host, the simpler is to start a netcat process or equivalent to collect the log:

nc -l -p 6666 -u                     # ... OR ...
netcat -u -l -p 6666                 # ... OR ...
nc -l -p 6666 -u > netconsole.txt &

A more stable solution is to use syslog-ng (see Ubuntu wiki above) or rsyslog (see Stapelberg's link above).

To setup rsyslog:

  • First create the log folder and rsyslog config:
# Here we assume sender hostname is 'zavcxl0006' with IP 192.168.1.4
sudo mkdir -p /var/log/remote/zavcxl0006
cat >/etc/rsyslog.d/remote.conf <<EOT
\$ModLoad imudp
\$RuleSet remote

# For each IP address that you want to store logs from,
# add and modify the following two (!) lines:
if \$fromhost-ip=='192.168.1.4' then /var/log/remote/zavcxl0006/console.log
& stop

\$InputUDPServerBindRuleset remote
\$UDPServerRun 6666

\$RuleSet RSYSLOG_DefaultRuleset
EOT
  • Create the logrotate config:
cat >/etc/logrotate.d/remote <<'EOT'
/var/log/remote/*/*.log
{
        copytruncate
        rotate 30
        daily
        missingok
        dateext
        notifempty
        delaycompress
        compress
        maxage 31
        postrotate
                invoke-rc.d rsyslog reload > /dev/null
        endscript
}
EOT
Setup Sender side

To use it, first install it from debian packages:

sudo apt install netconsole

To enable netconsole you must change kernel option at boot time. Edit /etc/default/grub:

-GRUB_CMDLINE_LINUX_DEFAULT="splash quiet"
+GRUB_CMDLINE_LINUX_DEFAULT="debug ignore_loglevel"

Then update grub:

sudo update-grub

Or set it via dmesg (see dmesg(8) man page and Documentation/kernel-parameters.txt for details)

netconsole can be started automatically with a static configuration. To start at boot time, update /etc/modules and create a config file in /etc/modprobe.d:

sudo sh -c 'echo netconsole >> /etc/modules'
sudo sh -c 'echo options netconsole netconsole=6666@192.168.1.102/eth0,6666@192.168.1.103/08:00:46:d4:1d:82 > /etc/modprobe.d/netconsole.conf'

Or start it at command-line:

sudo modprobe netconsole netconsole=6666@192.168.1.102/eth0,6666@192.168.1.103/08:00:46:d4:1d:82

The syntax is:

netconsole="[src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr][;...]"
# src-port      source for UDP packets (defaults to 6665)
# src-ip        source IP to use (interface address)      - MANDATORY at boot time
# dev           network interface (eth0)
# tgt-port      port for logging agent (6666)
# tgt-ip        IP address for logging agent
# tgt-macaddr   ethernet MAC address for logging agent (broadcast)

Quotes are mandatory for multi-host configuration (semi-colon) but optional otherwise. Giving the src-ip is mandatory at boot time (when starting module in /etc/modules because the interface does not have an IP address yet).

netconsole can also be configured dynamically [22] using configfs [23]:

sudo mount -t configfs none /sys/kernel/config
cd /sys/kernel/config
sudo mkdir target_a
cd target_a
cat enabled				# check if enabled is 1
echo 0 > enabled			# disable the target (if required)
echo eth2 > dev_name			# set local interface
echo 10.0.0.4 > remote_ip		# update some parameter
echo cb:a9:87:65:43:21 > remote_mac	# update more parameters
echo 1 > enabled			# enable target again

Note netconsole module cannot be removed if configfs contains a dynamic target.

Alternatively, here a script to configure dynamically netconsole module:

unset NETCON_TGTHOST
eval $(get-network.sh < ~/etc/network_definition)
case $LOCATION in
    home)
        # Require
        #   ipkg install netcat
        #   mkdir -p /shares/beq06659/netconsole
        #   nc -l -p 6666 -u >/shares/beq06659/netconsole/zavcxl0005-netconsole&
        NETCON_TGTHOST=lacie-cloudbox
        ;;
    stzav)
        NETCON_TGTHOST=apple-pi.zav.st.com
        ;;
esac

echo "Resetting netconsole for location '$LOCATION'..."
sudo rmmod netconsole 2> /dev/null
if [ -n "$NETCON_TGTHOST" ]; then
    NETCON_PORT=6666
    # Adding -v -v to nc to avoid it to block sometimes. Go figure...
    echo "[------------] Redocking on $(date +"%Y-%m-%d %H:%M:%S")..." | nc -v -v -u -p 6665 $NETCON_TGTHOST $NETCON_PORT
    NETCON_TGTIP=$(getent hosts $NETCON_TGTHOST|awk '{print $1}')
    ping -c 1 $NETCON_TGTIP >/dev/null
    NETCON_TGTMAC=$(arp -n $NETCON_TGTIP|awk '/ether/{print $3}')
    echo "... setting up netconsole for location '$LOCATION' ($NETCON_TGTHOST,$NETCON_TGTIP/$NETCON_TGTMAC)"
    sudo modprobe netconsole netconsole=@/,$NETCON_PORT@$NETCON_TGTIP/$NETCON_TGTMAC
else 
    echo "... no target available for netconsole at location $LOCATION"
fi

On Debian, netconsole can be setup more easily using netconsole-setup:

echo "Resetting netconsole for location '$LOCATION'..."
sudo rmmod netconsole 2> /dev/null
if [ -n "$NETCON_TGTHOST" ]; then
    NETCON_PORT=6666
    # Adding -v -v to nc to avoid it to block sometimes. Go figure...
    echo "[------------] Redocking on $(date +"%Y-%m-%d %H:%M:%S")..." | nc -v -v -w 1 -u -p 6665 $NETCON_TGTHOST $NETCON_PORT
    echo "... using target host '$NETCON_TGTHOST'."
    sudo netconsole-setup 6666@$NETCON_TGTHOST
else 
    echo "... no target available for netconsole at location $LOCATION."
fi
Troubleshooting
  • If getting message
   $ sudo modprobe netconsole netconsole=@/,6666@192.168.1.3/b8:27:eb:69:3e:df
   modprobe: ERROR: could not insert 'netconsole': No such device
Try to add the name of the network interface to use (see [24]).

System Logs

Using systemd

To view kernel logs (with highlighting, grouping...):

sudo journalctl -b        # Since last boot
# or
sudo journalctl -fe       # Last event and follow

/var/log

Some external links:


syslog is an utility to log all system messages, from information messages to critical errors. Log files are stored in /var/log. On Ubuntu, the default logging system is rsyslog, with configuration files /etc/rsyslog.conf and in /etc/rsyslog.conf.d/.

Logs generated by rsyslog (see /etc/rsyslog.d/50-default.conf):

file source description
aptitude
auth.log rsyslog Messages to facilities auth and authpriv
boot
boot.log
btmp
daemon.log rsyslog Messages to facility daemon
debug rsyslog Messages with debug priority, but excluding facilities auth, mail and news
dmesg kernel Boot time hardware detection and driver setup (i.e. kernel messages before syslog daemon is launched).
This is *not* the same as dmesg output (see kern.log)!
dpkg.log
faillog
fontconfig.log
jockey.log
kern.log rsyslog Messages to facility kern (apparently dmesg will display the last 16392 octets of /var/log/kern.log since last boot [25])
lastlog lastlog last login of each user ([26]). It looks big, but it's a sparse file (du -h lastlog) !!!
lpr.log rsyslog Messages to facility lpr
mail.info rsyslog Messages to facility mail, priority ≥ info
mail.err rsyslog Messages to facility mail, priority ≥ err
mail.log rsyslog Messages to facility mail
mail.warn rsyslog Messages to facility mail, priority ≥ warn
messages rsyslog Messages with info,notice and warn priority, but excluding facilities auth, daemon, mail and news
MountManager.log
mysql.err
mysql.log
pm-powersave.log
pm-suspend.log
pycentral.log
syslog.log rsyslog All messages except those in auth.log (i.e. facilities auth and authpriv)
udev
ufw.log rsyslog All messages from UFW firewall
user.log rsyslog All messages targeting facility user
vbox-install.log
wtmp
Xorg.0.log
Xorg.failsafe.log