博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
prometheus(version 2.0.0)系列之三
阅读量:6458 次
发布时间:2019-06-23

本文共 4092 字,大约阅读时间需要 13 分钟。

hot3.png

本文将结合配置和web界面对prometheus的基本使用做示例解说

首先来看一下初始配置

prometheus.yml

# my global configglobal:  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.  # scrape_timeout is set to the global default (10s).# Alertmanager configurationalerting:  alertmanagers:  - static_configs:    - targets:       - localhost:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:   - "/etc/prometheus/rules/*.yml"  # - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:  # The job name is added as a label `job=
` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090']
在prometheus的基本配置中我们指定了Alertmanager为本地的Alertmanager;并且添加了一个服务,服务的端点只有一个,即prometheus自己;指定的报警规则读取路径为/etc/prometheus/rules/*.yml,稍后我们将会向这个路径中添加规则

alertmanager.yml

global:  # The smarthost and SMTP sender used for mail notifications.  smtp_smarthost: 'localhost:25'  smtp_from: 'alertmanager@example.org'  smtp_auth_username: 'alertmanager'  smtp_auth_password: 'password'  # The auth token for Hipchat.# The directory from which notification templates are read.templates: - '/etc/alertmanager/template/*.tmpl'# The root route on which each incoming alert enters.route:  # The labels by which incoming alerts are grouped together. For example,  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would  # be batched into a single group.  group_by: ['alertname', 'cluster', 'service']  # When a new group of alerts is created by an incoming alert, wait at  # least 'group_wait' to send the initial notification.  # This way ensures that you get multiple alerts for the same group that start  # firing shortly after another are batched together on the first   # notification.  group_wait: 30s  # When the first notification was sent, wait 'group_interval' to send a batch  # of new alerts that started firing for that group.  group_interval: 5m  # If an alert has successfully been sent, wait 'repeat_interval' to  # resend them.  repeat_interval: 3h   # A default receiver  receiver: webhook  # All the above attributes are inherited by all child routes and can   # overwritten on each.  # The child route trees.  routes:# Inhibition rules allow to mute a set of alerts given that another alert is# firing.# We use this to mute any warning-level notifications if the same alert is # already critical.inhibit_rules:- source_match:    severity: 'critical'  target_match:    severity: 'warning'  # Apply inhibition if the alertname is the same.  equal: ['alertname', 'cluster', 'service']receivers:- name: 'webhook'  webhook_configs:  - url: 'http://127.0.0.1:8090/alert_webhook'
  •  
  •  
alertmanager的配置中我们只需要注意两处,即全局配置的接收者webhook以及路由部分指定的唯一接受者为webhook,等会我们将使用一个微服务来示例如何通过webhook接受报警信息

接着我们结合内置的web界面修改配置来看一下效果

在graph导航中我们输入查询语句然后选择需要展示的时间段就有了下面的展示结果

这里写图片描述

然后我们在status->targets导航中查看当前的targets状态,因为基本配置中只有一个并且属于在线状态所以就会有一下展示:

这里写图片描述

然后我们虚构一个端点信息,在基本配置的targets列表中添加一项: 

这里写图片描述

然后使服务重新加载配置:

curl -XPOST http://192.168.0.77:9090/-/reload

刷新targets对应的界面可以看到如下展示(以下为隔了几秒刷新出现的两个状态展示)

这里写图片描述

这里写图片描述

因为端点'localhost:9100'当前不存在所以状态最终为'DOWN'

接下来我们上一个规则看一下效果

要配置规则就必须要对应一定的指标,想知道有哪些指标可以访问以下链接所指: 

这里写图片描述

首先我们先对条件做一个查询看一下图标信息: 

这里写图片描述

然后我们来写规则,然后校验规则:

[root@vm7-1201-pure rules]# promtool check rules http_requests_total.yml Checking http_requests_total.yml  SUCCESS: 1 rules found[root@vm7-1201-pure rules]# cat http_requests_total.yml groups:- name: http_requests_total  rules:  - alert: http_requests    expr: job:http_requests_total:mean5m{job="prometheus"} > 900    for: 10m    labels:      severity: page    annotations:      summary: High request latency

因为当前没有重载使规则生效,所以界面上还是这样的 

这里写图片描述

然后我们重新加载一下:

curl -XPOST http://127.0.0.1:9090/-/reload

再来看一下界面: 

这里写图片描述

当前的报警还未生效,然后我们一直刷新页面使http_requests_total的值上升到出发报警 

这里写图片描述

这时候再去看alertmanager的状态,发现报警已经被接收到: 

这里写图片描述

转载于:https://my.oschina.net/xiaominmin/blog/1800953

你可能感兴趣的文章
团队项目测试报告与用户反馈
查看>>
对软件工程课程的期望
查看>>
Mysql中文字符串提取datetime
查看>>
IOS 与ANDROID框架及应用开发模式对比一
查看>>
由中序遍历和后序遍历求前序遍历
查看>>
我学习参考的网址
查看>>
easyui的combotree以及tree,c#后台异步加载的详细介绍
查看>>
[Processing]点到线段的最小距离
查看>>
考研随笔2
查看>>
乱码的情况
查看>>
虚拟机centos 同一个tomcat、不同端口访问不同的项目
查看>>
在不花一分钱的情况下,如何验证你的创业想法是否可行?《转》
查看>>
Linux/Android 性能优化工具 perf
查看>>
GitHub使用教程、注册与安装
查看>>
论以结果为导向
查看>>
CODE[VS] 1294 全排列
查看>>
<<The C Programming Language>>讀書筆記
查看>>
JS详细入门教程(上)
查看>>
Android学习笔记21-ImageView获取网络图片
查看>>
线段树分治
查看>>