MONIT 可以監控 Unix/linux 系統服務,可監控偵測項目包含
(1) Service (2) Process (3) disk space
(4) file timestamp / permission / uid / gid
(5) partition inode 等
若監控偵測項目失敗,可定規則讓服務或程式 restart 或 alert
安裝 monit
1. 取得 monit-4.9-2.rf.src.rpm - http://dag.wieers.com/rpm/packages/monit/
2. # rpm -Uvh monit-4.9-2.rf.src.rpm
# rpmbuild -ba /usr/src/redhat/SPECS/monit.spec
# rpm -ivh /usr/src/redhat/RPMS/x86_64/monit-4.9-2.rf.i386.rpm
3. edit /etc/monit.conf
# Poll at 60 sec intervals
set daemon 60
# logfile location
set logfile /var/log/monit.log
# mailserver sent alert out
set mailserver 10.145.138.22, # primary
10.145.138.69, # Secondary
localhost # fallback relay
# alert recipients here
set alert samson@noc.ttn.net
# monit's admin interface , default port 2812
set httpd port 2812 and
use address 192.168.1.1 # admin Server bind on 192.168.1.1
allow 192.168.1.0/24 # allow 192.168.1.0/24 to connect to the server
allow admin:123456 # admin id:passwd
set mail-format {
from: monit@$HOST
subject: $SERVICE $EVENT at $DATE
message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
}
#
# 以下是系統服務的偵測
#
# wanttoexec 程式會listen tcp port 1984 , 故針對 tcp 1984 port 偵測程式是否存在
# 偵測三次都失敗才重啟
check host localhost with address 127.0.0.1
if failed port 1984 type tcp with 3 cycles then
exec "/usr/local/sbin/wanttoexec"
# Check DNS Service , 若 5個 cycle中 restart 3 次 , 則不再 restart
check process named with pidfile /var/run/named/named.pid
start program = "/etc/init.d/named start"
stop program = "/etc/init.d/named stop"
if failed host 127.0.0.1 port 53 type udp then restart
if failed host 127.0.0.1 port 53 type tcp then restart
if 3 restarts within 5 cycles then timeout
# Check Partition permission / uid / gid / space / inode
check device datafs with path /dev/sdb1
group server
start program = "/bin/mount /data"
stop program = "/bin/umount /data"
if failed permission 660 then unmonitor
if failed uid root then unmonitor
if failed gid disk then unmonitor
if space usage > 80 % then alert
if space usage > 98 % then stop
if inode usage > 80 % then alert
if inode usage > 98 % then stop
# 以下範例為相關聯的三個服務 , 若主要服務偵測失敗 ,
除重啟主要服務外亦重啟相關聯的另兩個服務
check process oracle with pidfile /var/run/oracle.pid
start = "/etc/init.d/oracle start"
stop = "/etc/init.d/oracle stop"
if failed port 9001 then restart
check process oracle-import
with pidfile /var/run/oracle-import.pid
start = "/etc/init.d/oracle-import start"
stop = "/etc/init.d/oracle-import stop"
depends on oracle
check process oracle-export
with pidfile /var/run/oracle-export.pid
start = "/etc/init.d/oracle-export start"
stop = "/etc/init.d/oracle-export stop"
depends on oracle
4. 啟動 monit
# chkconfig --level monit on
# service monit start
5. 常用的指令參數
# monit summary
The monit daemon 4.9 uptime: 0m
Remote Host 'localhost' online with all services
Process 'named' running
System 'ziv' Monit instance changed
# monit status
The monit daemon 4.9 uptime: 3m
Remote Host 'localhost'
status online with all services
monitoring status monitored
port response time 0.009s to 127.0.0.1:1984 [DEFAULT via TCP]
data collected Thu Aug 30 15:29:17 2007
Process 'named'
status running
monitoring status monitored
pid 32724
parent pid 1
uptime 22h 28m
childrens 0
memory kilobytes 1424
memory kilobytes total 1424
memory percent 1.1%
memory percent total 1.1%
cpu percent 0.0%
cpu percent total 0.0%
port response time 0.001s to 127.0.0.1:53 [DEFAULT via TCP]
port response time 2.000s to 127.0.0.1:53 [DEFAULT via UDP]
data collected Thu Aug 30 15:29:19 2007
System 'ziv'
status running
monitoring status monitored
load average [0.00] [0.00] [0.00]
cpu 0.0%us 0.0%sy 0.0%wa
memory usage 31012 kB [24.6%]
data collected Thu Aug 30 15:29:19 2007
# monit reload - 若更改 /etc/monit.conf 用此指令reload
6. monit 管理介面
http://192.168.1.1:2812 登入先輸入帳號密碼
登入後的管理介面
若屬於 process 項目,且有設定 start / stop program 的 item
就可以利用網頁介面 stop / start / restart / unmonitor

針對HOST TCP PORT監控
7. 測試以下rule 是否正常運作
check host localhost with address 127.0.0.1
if failed port 1984 type tcp with 3 cycles then
exec "/usr/local/sbin/wanttoexec"
/var/log/monit.log :
[CST Aug 30 09:36:21] info : Starting monit daemon with http interface at [10.20.1.16:2812]
[CST Aug 30 09:36:21] info : Starting monit HTTP server at [10.20.1.16:2812]
[CST Aug 30 09:36:21] info : monit HTTP server started
[CST Aug 30 09:36:21] info : Monit started
[CST Aug 30 09:37:23] info : Monit has not changed
[CST Aug 30 09:37:23] error : 'localhost' failed, cannot open a connection to INET[127.0.0.1:1984] via TCP
[CST Aug 30 09:38:25] error : 'localhost' failed, cannot open a connection to INET[127.0.0.1:1984] via TCP
[CST Aug 30 09:39:27] error : 'localhost' failed, cannot open a connection to INET[127.0.0.1:1984] via TCP
[CST Aug 30 09:40:30] info : 'localhost' connection passed to INET[127.0.0.1:1984] via TCP
偵測tcp port 1984 第3個cycle後皆失敗後才執行"/usr/local/sbin/wanttoexec"
第4個cycle偵測時就已 passed