paint-brush
How to Create a Telegram Bot for Monitoring Your Service Uptime in Python (Part 2/3: Alerting)by@balakhonoff
1,204 reads
1,204 reads

How to Create a Telegram Bot for Monitoring Your Service Uptime in Python (Part 2/3: Alerting)

by Kirill BalakhonovJuly 19th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

In part 2 of 3, we explore simple altering for Telegram bots when the system starts to fail.
featured image - How to Create a Telegram Bot for Monitoring Your Service Uptime in Python (Part 2/3: Alerting)
Kirill Balakhonov HackerNoon profile picture


Hello everyone, in the previous article, I started describing my experience of developing a Python Telegram bot for checking the operability and monitoring of my service located on a remote server. In a nutshell, when you're working on a pet project or even some work tasks, you might want to have all the current system status information at hand (I particularly like the ability to see everything and manage it through a Telegram bot) without spending a lot of time on development.


In the previous part, we looked at a way to get instant metrics on demand. In this part, we will be doing simple alerting, i.e., receiving a message in the bot when the system starts to fail. In the next third part, we will cover the case of collecting analytics and receiving online charts.


As in the previous part, the example will be based on a real task, but I will be marking those places in the code that you can change to your logic so that the main part of the example can be reused.


In this case, I need to get an alert in the form of a message in the Telegram bot in the event that the node I'm interested in loses network connection (or for some other reason), but its last synchronized block is lagging behind the network's last block.


As before, we first need to set up the virtual environment:


cd ~
virtualenv -p python3.8 up_env # creating an environment
source ~/up_env/bin/activate # activating the environment


and install the necessary dependencies:


pip install python-telegram-bot
pip install "python-telegram-bot[job-queue]" --pre
pip install --upgrade python-telegram-bot==13.6.0 # the code was written in the times before version 20, so here the version is explicitly specified

pip install numpy # needed for the median value function
pip install web3 # needed for requests to nodes (replace with what you need)


The functions.py file does not undergo changes in this case and remains the same as in the previous part:


import numpy as np
import multiprocessing

from web3 import Web3 #  add those libraries needed for your task

# Helper function that checks a single node
def get_last_block_once(rpc):
    try:
        w3 = Web3(Web3.HTTPProvider(rpc))
        block_number = w3.eth.block_number
        if isinstance(block_number, int):
            return block_number
        else:
            return None
    except Exception as e:
        print(f'{rpc} - {repr(e)}')
        return None


# Main function to check the status of the service that will be called
def check_service():
    # pre-prepared list of reference nodes
    # for any network, it can be found on the website https://chainlist.org/
    list_of_public_nodes = [
        'https://polygon.llamarpc.com',
        'https://polygon.rpc.blxrbdn.com',
        'https://polygon.blockpi.network/v1/rpc/public',
        'https://polygon-mainnet.public.blastapi.io',
        'https://rpc-mainnet.matic.quiknode.pro',
        'https://polygon-bor.publicnode.com',
        'https://poly-rpc.gateway.pokt.network',
        'https://rpc.ankr.com/polygon',
        'https://polygon-rpc.com'
    ]
    
    # parallel processing of requests to all nodes
    with multiprocessing.Pool(processes=len(list_of_public_nodes)) as pool:
        results = pool.map(get_last_block_once, list_of_public_nodes)
        last_blocks = [b for b in results if b is not None and isinstance(b, int)]

    # define the maximum and median value of the current block
    med_val = int(np.median(last_blocks))
    max_val = int(np.max(last_blocks))
    # determine the number of nodes with the maximum and median value
    med_support = np.sum([1 for x in last_blocks if x == med_val])
    max_support = np.sum([1 for x in last_blocks if x == max_val])

    return max_val, max_support, med_val, med_support


Now let's look at the main bot file alert_bot.py. Since in different tasks you may only need alerting or only request instant values, I don't build a bot that can do everything at once, but instead I divide this functionality into different small examples. In this case, the main bot file code will only include alerting, but you can combine everything you need into one bot.


So, we import libraries and functions from the file above and set the necessary constants:


import telegram
from telegram.ext import Updater
from functions import get_last_block_once, check_service

The address of the node, the state of which I'm tracking (also a public node in this case)
OBJECT_OF_CHECKING = 'https://polygon-mainnet.chainstacklabs.com'

Threshold for highlighting critical lag
THRESHOLD = 5

Your Telegram account ID. The easiest way to find out is through the @chatIDrobot bot
USER_ID = 123456789


Next, we describe a function that will be called regularly by the timer:


def check_for_alert(context):

    # Call of the main function to check the network state
    max_val, max_support, med_val, med_support = check_service()
    # Call of the function to check the state of the inspected node
    last_block = get_last_block_once(OBJECT_OF_CHECKING)

    # Forming a message to be sent to Telegram
    message = ""
    # Information about the state of nodes in the external network (median, maximum, and number of nodes)
    message += f"Public median block number {med_val} (on {med_support}) RPCs\n"
    message += f"Public maximum block number +{max_val - med_val} (on {max_support}) PRCs\n"

    # this variable will store the decision whether to send an alert
    # in case the node is lagging or didn't respond
    to_send = False

    # state check
    if last_block is not None:
        out_text = str(last_block - med_val) if last_block - med_val < 0 else '+' + str(last_block - med_val)
        # Comparison with the threshold
        if abs(last_block - med_val) > THRESHOLD:
            to_send = True
            message += f"The node block number shift ⚠️<b>{out_text}</b>⚠️"
        else:
            message += f"The node block number shift {out_text}"
    else: # Handling the exception if the node didn't respond
        to_send = True
        message += f"The node has ⚠️<b>not responded</b>⚠️"

    # triggering the alert and sending a message to the user
    if to_send:
        context.bot.send_message(chat_id=USER_ID, text=message, parse_mode="HTML")


Next, you only have to write the part where the bot is initialized and a regular job checking the node's state is connected to it:


# Your Telegram bot token obtained through BotFather
token = "xxx"

# creating a bot instance
bot = telegram.Bot(token=token)
updater = Updater(token=token, use_context=True)
dispatcher = updater.dispatcher
job_queue = updater.job_queue

# Here, the interval variable (in seconds) sets the frequency
# of launching the check - in this case, every 10 minutes
job_queue.run_repeating(check_for_alert, interval=10.0 * 60.0, first=0.0)

#bot start
updater.start_polling()


Next, the code can be run on any VPS server through:


source ~/up_env/bin/activate
python uptime_bot.py


Having previously configured the systemd unit-file.


As a result, the bot operation will look as follows - if an alert has been triggered, I receive a message with the problem information:


In the following article, I will describe how to implement the remaining task:

  • Retrieve graphs based on the request, showing how everything has been progressing over the past X hours/days. It will consist of two parts: a script to log events triggered by cron and a bot that collects graphs from the logs based on user requests.

The source code of the project is available in the GitHub repository. If you found this tutorial helpful, feel free to star it on GitHub, I would appreciate it🙂


Also published here.