Python监控SQL Server数据库服务器磁盘使用情况
本篇博客总结一下Python采集SQL Server数据库服务器的磁盘使用信息,其实这里也是根据需求不断推进演化的一个历程,我们监控服务器的磁盘走了大概这样一个历程:
1:使用SQL Server作业监控磁盘空间
很久之前写过一篇博客“MS SQL 监控磁盘空间告警”,后面对这个脚本进行过多次完善和优化,做成了一个模板。在每台SQL Server服务器上都部署了,确实也很实用。告警也很给力,但是缺点也非常明显。
优点:1: 自己动手DIY,在没有部署运维工具的前提下,确实能提前预警,抛开不足来说,告警还是非常给力的。
缺点: 1: 每台服务器都需要部署,升级也很是麻烦。
2: 数据分散,在模式上有致命的先天不足。监控工具一般为星型结构,采集集中数据。
每台服务器都需要部署,如果有修改,每台服务器都需要发布更新,维护管理不方便。采集的一些数据分散,每台SQL Server数据库都需要保存一点数据。
3: 通用性差,只能监控SQL Server服务器,Linux服务器,我们用的是crontab跑shell+perl脚本来监控磁盘空间并告警。
2: Zabbix监控磁盘告警
后面部署了Zabbix监控工具,Zabbix监控工具功能强大,不仅仅提供了磁盘空间告警功能,还提供了监控磁盘I/O等功能。更重要的是通用性很强大:只要是服务器就能监控,而方法1仅仅能监控SQL Server数据库服务器。
优点: 1: Zabbix监控工具功能强大
2: Zabbix监控工具通用性强
缺点: 1:需要分析磁盘空间的历史数据比较麻烦,二次开发也比较麻烦(个人对zabbix了解不深入,可能对于高手而言,也是非常容易简单的事情,仅仅是个人的一点体会感受)。例如我要获取某个服务器的历史数据,对磁盘的增长情况做分析。需要关联好多表,非常麻烦。简单研究后就直接放弃了。
自从Zabbix监控工具上线后,方法1就显得可有可无。基本上处于被替换的尴尬境地。
3:使用Python脚本采集
这里存粹是为了扩展我自己的工具MyDB的功能,顺手写点Python脚本练手,而且目前而言,只能采集SQL Server服务器的磁盘空间使用情况。功能和通用性不能和Zabbix监控工具比。功能有很多局限性和不足之处,通用性也很差,但是也有一些不错的优点。
优点: 1:不需要部署客户端,也不需要每台服务器去部署(与方法1对比而言)。
2:批量采集,集中保存采集数据。方便统一告警,数据分析。
3:简洁与简单,灵活性高:目前就2个表,一个表[dbo].[SERVER_DISK_INFO]保存最近一次采集的磁盘空间使用情况数据,另外一个表[dbo].[SERVER_DISK_INFO_HIS]保存历史数据,可以做一些扩容分析等。
例如,磁盘空间告警了,系统管理员会咨询我,如果进行扩容,需要多大的空间,保证半年内,不会再次出现告警,那么就可以一个脚本计算一下(平均每个月增长值* 月数)
缺点: 1:通用性差,目前只能采集SQL Server数据库服务器的磁盘信息。后续再考虑扩展性。实在没有精力一步到位,慢慢完善扩充。
2:功能单一,不像Zabbix,还可以监控磁盘I/O,这个Python脚本就只能采集磁盘空间使用率,并不能扩展其功能,有一定局限性。
脚本get_win_disk_info.py如下所示:
# -*- coding: utf-8 -*-
'''
-------------------------------------------------------------------------------------------
-- Script Name : get_win_disk_info.py
-- Script Auotor : 潇湘隐者
-- Script Description : 采集SQL Server数据库的磁盘使用数据,方便统一分析和告警处理!
-------------------------------------------------------------------------------------------
'''
import pymssql
import logging
import os.path
import base64
from cryptography.fernet import Fernet
# 第一步,创建一个logger
logger = logging.getLogger()
logger.setLevel(logging.DEBUG) # Log等级开关
# 第二步,创建一个handler,用于写入日志文件
#log_path = os.path.dirname(os.getcwd()) + '/logs/'
log_path = '/home/konglb/logs/'
log_name = log_path + 'get_win_disk_info.log'
logfile = log_name
file_handler = logging.FileHandler(logfile, mode='a+')
file_handler.setLevel(logging.ERROR) # 输出到file的log等级的开关
# 第三步,定义handler的输出格式
formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s")
file_handler.setFormatter(formatter)
# 第四步,将handler添加到logger里面
logger.addHandler(file_handler)
# 如果需要同時需要在終端上輸出,定義一個streamHandler
print_handler = logging.StreamHandler() # 往屏幕上输出
print_handler.setFormatter(formatter) # 设置屏幕上显示的格式
logger.addHandler(print_handler)
key=bytes(os.environ.get('key'),encoding="utf8")
cipher_suite = Fernet(key)
with open('/home/konglb/python/conf/ms_db_conf.bin', 'rb') as file_object:
for line in file_object:
encryptedpwd = line
decrypt_pwd = (cipher_suite.decrypt(encryptedpwd))
password_decrypted = bytes(decrypt_pwd).decode("utf-8") #convert to string
env_db_user=os.environ.get('db_user')
db_user=base64.b64decode(bytes(env_db_user, encoding="utf8"))
try:
dest_db_conn = pymssql.connect(host=os.environ.get('db_host'),
user=bytes.decode(db_user),
password=password_decrypted,
database='DATABASE_REPOSITORY',
charset="utf8");key=bytes(os.environ.get('key'),encoding="utf8")
# cursor = dest_db_conn.cursor();
# as_dict(bool) :如果设置为True,则后面的查询结果返回的是字典,关键字为查询结果的列名;否则(默认)返回的为list。
# 可以通过在创建游标时指定as_dict参数来使游标返回字典变量,字典中的键为数据表的列名
cursor = dest_db_conn.cursor(as_dict=True)
#DELETE FROM [dbo].[DB_JOB_RUN_ERROR]
# WHERE RUN_DATE_TIME >= CAST(CONVERT(VARCHAR(10),GETDATE(),120) AS DATETIME);
sql_text = """INSERT INTO dbo.SERVER_DISK_INFO_HIS
( COLLECT_DATE ,
FACTORY_CD ,
SERVER_NAME ,
DISK_NAME ,
TOTAL_SPACE ,
USED_SPACE ,
FREE_SPACE ,
FREE_PERCENT
)
SELECT COLLECT_DATE ,
FACTORY_CD ,
SERVER_NAME ,
DISK_NAME ,
TOTAL_SPACE ,
USED_SPACE ,
FREE_SPACE ,
FREE_PERCENT
FROM [dbo].[SERVER_DISK_INFO];
TRUNCATE TABLE [dbo].[SERVER_DISK_INFO];
"""
cursor.execute(sql_text);
dest_db_conn.commit()
sql_text = """
SELECT SERVER_CD ,
SERVER_IP ,
USER_NAME ,
dbo.DecryptByPassPhrasePwd(PASSWORD) AS PASSWORD ,
SERVER_NAME,
DB_VERSION ,
INSTANCE_NAME
FROM dbo.DB_SERVER_CONFIG
WHERE DATABASE_TYPE = 'SQL SERVER'
AND COLLECT_DATA = 1;
"""
cursor.execute(sql_text);
rows = cursor.fetchall();
for row in rows:
try:
src_db_conn = pymssql.connect(host=row['SERVER_IP'],
user=row['USER_NAME'],
password=row['PASSWORD'],
database='master',
charset="utf8",
autocommit=True);
sub_cursor = src_db_conn.cursor(as_dict=True)
if row['DB_VERSION'] <= 2000:
logger.info(row['SERVER_NAME'] + ' not gather')
continue
else:
#logger.info(row['DB_VERSION'])
sql_db_patch="SELECT SERVERPROPERTY('productlevel') AS PRODUCT_LEVEL"
sub_cursor.execute(sql_db_patch)
db_patch = sub_cursor.fetchone()
#必须转换,否则返回的为bytes,不是str
patch_info= str(db_patch['PRODUCT_LEVEL'], encoding = "utf-8")
if ((row['DB_VERSION']== 2005) or (row['DB_VERSION'] ==2008 and patch_info == 'RTM')):
#logger.info(row['SERVER_NAME'] + ' patch is ' + patch_info)
#continue
sql_job_info="""
DECLARE @Result INT;
DECLARE @objectInfo INT;
DECLARE @DriveInfo CHAR(1);
DECLARE @TotalSize VARCHAR(20);
DECLARE @OutDrive INT;
DECLARE @UnitGB FLOAT;
DECLARE @CurrentDate DATETIME;
DECLARE @ConfValue INT;
SET @UnitGB = 1073741824.0;
--创建临时表保存服务器磁盘容量信息
CREATE TABLE #DiskCapacity
(
COLLECT_DATE DATETIME ,
FACTORY_CD NVARCHAR(24),
SERVER_NAME NVARCHAR(64),
DISK_NAME NVARCHAR(2) ,
TOTAL_SPACE FLOAT,
USED_SPACE FLOAT,
FREE_SPACE FLOAT,
FREE_PERCENT FLOAT
);
INSERT #DiskCapacity
(DISK_NAME,FREE_SPACE )
EXEC master.dbo.xp_fixeddrives;
EXEC sp_configure 'show advanced options', 1
RECONFIGURE WITH OVERRIDE;
SELECT @ConfValue = value FROM sys.sysconfigures WHERE comment LIKE '%Ole Automation Procedures%'
IF @ConfValue = 0
BEGIN
EXEC sp_configure 'Ole Automation Procedures', 1;
RECONFIGURE WITH OVERRIDE;
END
EXEC @Result = master.sys.sp_OACreate 'Scripting.FileSystemObject',@objectInfo OUT;
DECLARE CR_DiskInfo CURSOR LOCAL FAST_FORWARD
FOR SELECT DISK_NAME FROM #DiskCapacity
ORDER by DISK_NAME
OPEN CR_DiskInfo;
SET @CurrentDate = GETDATE();
FETCH NEXT FROM CR_DiskInfo INTO @DriveInfo
WHILE @@FETCH_STATUS=0
BEGIN
EXEC @Result = sp_OAMethod @objectInfo,'GetDrive', @OutDrive OUT, @DriveInfo
EXEC @Result = sp_OAGetProperty @OutDrive,'TotalSize', @TotalSize OUT
UPDATE #DiskCapacity
SET TOTAL_SPACE=ROUND(@TotalSize/@UnitGB,2), COLLECT_DATE=@CurrentDate,
FACTORY_CD=%s, SERVER_NAME=@@SERVERNAME,
--USED_SPACE=(@TotalSize-FREE_SPACE)/@UnitGB,
--FREE_PERCENT=FREE_SPACE/@TotalSize*100,
FREE_SPACE=ROUND(FREE_SPACE/1024,2)
WHERE DISK_NAME =@DriveInfo
UPDATE #DiskCapacity
SET USED_SPACE=(TOTAL_SPACE-FREE_SPACE),
FREE_PERCENT=ROUND(FREE_SPACE/TOTAL_SPACE*100,2)
WHERE DISK_NAME =@DriveInfo
FETCH NEXT FROM CR_DiskInfo INTO @DriveInfo
END
CLOSE CR_DiskInfo
DEALLOCATE CR_DiskInfo;
EXEC @Result=sp_OADestroy @objectInfo
EXEC sp_configure 'show advanced options', 1
RECONFIGURE WITH OVERRIDE;
IF @ConfValue = 0
BEGIN
EXEC sp_configure 'Ole Automation Procedures', 0;
RECONFIGURE WITH OVERRIDE;
END
EXEC sp_configure 'show advanced options', 0
RECONFIGURE WITH OVERRIDE;
SELECT * FROM #DiskCapacity
"""
sub_cursor.execute(sql_job_info, row['SERVER_CD']);
job_rows = sub_cursor.fetchall();
src_db_conn.close()
error_job_info = []
for sub_row in job_rows:
data = (
sub_row['COLLECT_DATE'], sub_row['FACTORY_CD'], sub_row['SERVER_NAME'], sub_row['DISK_NAME'],
sub_row['TOTAL_SPACE'], sub_row['USED_SPACE'], sub_row['FREE_SPACE'], sub_row['FREE_PERCENT'])
error_job_info.append(data)
'''
logger.info('2008 2005')
logger.info(row['SERVER_NAME'])
sub_cursor.callproc('[msdb].[dbo].[sp_get_diskinfo]')
result_rows =sub_cursor.fetchall()
src_db_conn.close()
error_job_info = []
for sub_row in result_rows:
data = (
sub_row['COLLECT_DATE'], sub_row['FACTORY_CD'], sub_row['SERVER_NAME'], sub_row['DISK_NAME'],
sub_row['TOTAL_SPACE'], sub_row['USED_SPACE'], sub_row['FREE_SPACE'], sub_row['FREE_PERCENT'])
error_job_info.append(data)
'''
else:
sql_job_info = """WITH Server_Disk AS
(
SELECT DISTINCT
REPLACE(vs.volume_mount_point, ':\\' , '') AS DISK_NAME,
CAST(VS.total_bytes/1024.0/1024/1024 AS NUMERIC(18,2) ) AS [TOTAL_SPACE],
CAST(VS.available_bytes/1024.0/1024/1024 AS NUMERIC(18,2)) AS [FREE_SPACE]
FROM sys.master_files AS f
CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.file_id) AS vs
)
SELECT GETDATE() AS COLLECT_DATE,
%s AS FACTORY_CD ,
@@SERVERNAME AS SERVER_NAME ,
D.DISK_NAME AS DISK_NAME,
D.[TOTAL_SPACE] AS TOTAL_SPACE,
D.[TOTAL_SPACE] - D.[FREE_SPACE]
AS USED_SPACE,
D.[FREE_SPACE] AS FREE_SPACE,
CAST(D.[FREE_SPACE] * 100 / D.[TOTAL_SPACE] AS NUMERIC(18, 2)) AS FREE_PERCENT
FROM Server_Disk AS D
ORDER BY D.DISK_NAME;
"""
sub_cursor.execute(sql_job_info, row['SERVER_CD']);
job_rows = sub_cursor.fetchall();
src_db_conn.close()
error_job_info = []
for sub_row in job_rows:
data = (sub_row['COLLECT_DATE'], sub_row['FACTORY_CD'], sub_row['SERVER_NAME'], sub_row['DISK_NAME'],
sub_row['TOTAL_SPACE'], sub_row['USED_SPACE'], sub_row['FREE_SPACE'], sub_row['FREE_PERCENT'])
error_job_info.append(data)
save_job_info = """
INSERT INTO dbo.SERVER_DISK_INFO
( COLLECT_DATE
, FACTORY_CD
, SERVER_NAME
, DISK_NAME
, TOTAL_SPACE
, USED_SPACE
, FREE_SPACE
, FREE_PERCENT
)
VALUES(%s,%s,%s,%s,%d,%d,%d,%d)"""
cursor.executemany(save_job_info, error_job_info);
dest_db_conn.commit()
logger.info(row['SERVER_NAME'] + ' gather successful')
except pymssql.InterfaceError as fe:
logger.error(fe.message)
except pymssql.DatabaseError as e:
dest_db_conn.rollback();
logger.error(row['SERVER_IP'] + ' 采集出错,请检查处理异常')
logger.error(e)
finally:
src_db_conn.close()
except pymssql.InterfaceError as fe:
logger.error(fe.message)
except pymssql.DatabaseError as e:
dest_db_conn.rollback();
logger.error(row['SERVER_IP'] + ' 采集出错,请检查处理异常')
logger.error(e)
finally:
dest_db_conn.close()