preparation for kernel space problem
kdump may fail due to hardware problem
Bios/firmware on the latest hardware nmay not be sufficiently tested
/proc/sys/kernel/syssrq to 1 using /etc/sysctl.confab unexpected system freeze may happen during the boot process
quiet and rhgb command line parameters
a built-in feature which tries to survive out of memory condition by killing some processes.
Linux kernel is not carefully designed enought to completely avoid OOM killer deadlock
do not rely on the OOM killer too much
unexpected system reboot?
one of most annoying troubles in Linux because it is difficulut ot understand the reason of rebooting.
a relatively reliable way to capture kernel messages
/sys/module/printk/parameters/time (RHEL 6) /etc/rc/local (RHEL 5)
some hardware support redirection of serial console
the network interface is not avaliable during boot up and initaliza of the kernel kdump
a utility for saving kernel messages is available
service fail over
The fail over will happen without any prior warning if the timeout of watchdog is shorter than timeout of kernel warning mechanisms.
the cause of time out can be within hardware drivers or hardware itself.
Programs like shell scripts are vulnerable to this kind of disturbance
Heartbeat/watchdog software need painstaking error checking and retyr mechanisms
What tools can we use for recording unexpected events?
- System call auiting
watch out for integer overflow probelmes
- This kind of problem one day suddenly happends
Is systemtap good at everything?
SystemTap can be used for not only measuring performance of functionality but also tracing functionality
unfortunately systemtap is not a tool designed for monitoring throughout years
Preparations for use space problems
when a system trouble happend. you need to retrieve and examine log files as soon as possible
- A tool for tracking/restricting various operations from boot.
- Mainlned version is available since Linux 2.6.30 kernel
- Name based access tracking. (/sbin/rsyslogd accessing resources)
the listener process of ssh daemon /usr/sbin/sshd is accessing
what is nice with TOMOYO Linux as a tracking tool?
AKARI is an unexpected usage of LSM interface but useful technique for implementing variougs "single function LSM" module
A new type of rule based in-kernel acess auditing and restricting tool.
- Troubleshooting is something that compares that state and current state.
- It is important that you the normal state is before an encoutner troubles
- There are parameters and tools which help you understand what the normal state of your system is and what is happening to your system.