> 山森 > My Blogs > X級工程師經驗談 > monit - 監控 Unix/Linux 系統服務
Google AdSense
我的標籤
我的書籤

沒有資料

廣告看板

星光三班 - 微風廣場初選 » « Sendmail masquerade

2007/08/30 monit - 監控 Unix/Linux 系統服務

MONIT 可以監控 Unix/linux 系統服務,可監控偵測項目包含

(1) Service (2) Process (3) disk space
(4) file timestamp / permission / uid / gid
(5) partition inode 等

若監控偵測項目失敗,可定規則讓服務或程式 restart 或 alert


安裝 monit

1. 取得 monit-4.9-2.rf.src.rpm - http://dag.wieers.com/rpm/packages/monit/
2. # rpm -Uvh monit-4.9-2.rf.src.rpm
# rpmbuild -ba /usr/src/redhat/SPECS/monit.spec
# rpm -ivh /usr/src/redhat/RPMS/x86_64/monit-4.9-2.rf.i386.rpm
3. edit /etc/monit.conf

# Poll at 60 sec intervals

set daemon 60

# logfile location

set logfile /var/log/monit.log

# mailserver sent alert out

set mailserver 10.145.138.22, # primary
10.145.138.69, # Secondary
localhost # fallback relay

# alert recipients here

set alert samson@noc.ttn.net

# monit's admin interface , default port 2812

set httpd port 2812 and
use address 192.168.1.1 # admin Server bind on 192.168.1.1
allow 192.168.1.0/24 # allow 192.168.1.0/24 to connect to the server
allow admin:123456 # admin id:passwd

set mail-format {
from:
monit@$HOST
subject: $SERVICE $EVENT at $DATE
message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
}

#
# 以下是系統服務的偵測
#

# wanttoexec 程式會listen tcp port 1984 , 故針對 tcp 1984 port 偵測程式是否存在
# 偵測三次都失敗才重啟

check host localhost with address 127.0.0.1
if failed port 1984 type tcp with 3 cycles then
exec "/usr/local/sbin/wanttoexec"

# Check DNS Service , 若 5個 cycle中 restart 3 次 , 則不再 restart

check process named with pidfile /var/run/named/named.pid
start program = "/etc/init.d/named start"
stop program = "/etc/init.d/named stop"
if failed host 127.0.0.1 port 53 type udp then restart
if failed host 127.0.0.1 port 53 type tcp then restart
if 3 restarts within 5 cycles then timeout

# Check Partition permission / uid / gid / space / inode

check device datafs with path /dev/sdb1
group server
start program = "/bin/mount /data"
stop program = "/bin/umount /data"
if failed permission 660 then unmonitor
if failed uid root then unmonitor
if failed gid disk then unmonitor
if space usage > 80 % then alert
if space usage > 98 % then stop
if inode usage > 80 % then alert
if inode usage > 98 % then stop

# 以下範例為相關聯的三個服務 , 若主要服務偵測失敗 ,
除重啟主要服務外亦重啟相關聯的另兩個服務

check process oracle with pidfile /var/run/oracle.pid
start = "/etc/init.d/oracle start"
stop = "/etc/init.d/oracle stop"
if failed port 9001 then restart

check process oracle-import
with pidfile /var/run/oracle-import.pid
start = "/etc/init.d/oracle-import start"
stop = "/etc/init.d/oracle-import stop"
depends on oracle

check process oracle-export
with pidfile /var/run/oracle-export.pid
start = "/etc/init.d/oracle-export start"
stop = "/etc/init.d/oracle-export stop"
depends on oracle

4. 啟動 monit

# chkconfig --level monit on

# service monit start

5. 常用的指令參數

# monit summary
The monit daemon 4.9 uptime: 0m

Remote Host 'localhost' online with all services
Process 'named' running
System 'ziv' Monit instance changed

# monit status
The monit daemon 4.9 uptime: 3m

Remote Host 'localhost'
status online with all services
monitoring status monitored
port response time 0.009s to 127.0.0.1:1984 [DEFAULT via TCP]
data collected Thu Aug 30 15:29:17 2007

Process 'named'
status running
monitoring status monitored
pid 32724
parent pid 1
uptime 22h 28m
childrens 0
memory kilobytes 1424
memory kilobytes total 1424
memory percent 1.1%
memory percent total 1.1%
cpu percent 0.0%
cpu percent total 0.0%
port response time 0.001s to 127.0.0.1:53 [DEFAULT via TCP]
port response time 2.000s to 127.0.0.1:53 [DEFAULT via UDP]
data collected Thu Aug 30 15:29:19 2007

System 'ziv'
status running
monitoring status monitored
load average [0.00] [0.00] [0.00]
cpu 0.0%us 0.0%sy 0.0%wa
memory usage 31012 kB [24.6%]
data collected Thu Aug 30 15:29:19 2007

# monit reload - 若更改 /etc/monit.conf 用此指令reload

6. monit 管理介面

http://192.168.1.1:2812 登入先輸入帳號密碼




登入後的管理介面


若屬於 process 項目,且有設定 start / stop program 的 item

就可以利用網頁介面 stop / start / restart / unmonitor

針對HOST TCP PORT監控


7. 測試以下rule 是否正常運作

check host localhost with address 127.0.0.1
if failed port 1984 type tcp with 3 cycles then
exec "/usr/local/sbin/wanttoexec"

/var/log/monit.log :

[CST Aug 30 09:36:21] info : Starting monit daemon with http interface at [10.20.1.16:2812]
[CST Aug 30 09:36:21] info : Starting monit HTTP server at [10.20.1.16:2812]
[CST Aug 30 09:36:21] info : monit HTTP server started
[CST Aug 30 09:36:21] info : Monit started
[CST Aug 30 09:37:23] info : Monit has not changed
[CST Aug 30 09:37:23] error : 'localhost' failed, cannot open a connection to INET[127.0.0.1:1984] via TCP
[CST Aug 30 09:38:25] error : 'localhost' failed, cannot open a connection to INET[127.0.0.1:1984] via TCP
[CST Aug 30 09:39:27] error : 'localhost' failed, cannot open a connection to INET[127.0.0.1:1984] via TCP
[CST Aug 30 09:40:30] info : 'localhost' connection passed to INET[127.0.0.1:1984] via TCP

偵測tcp port 1984 第3個cycle後皆失敗後才執行"/usr/local/sbin/wanttoexec"

第4個cycle偵測時就已 passed

15:30 [ X級工程師經驗談 / 本日人氣 (0) / 累積人氣 (1170) / 文章引用 (0) / 發表意見 ]  
文章引用
站內引用 / 引用網址: http://api.mw.net.tw/trackback.php?aid=65268&c=f8d771 複製