If you are using Databricks serving endpoint, and you wish to export metrics to Datadog, you can face with some challenges in Datadog documentation. This article will help you to solve them, at least it will help you to make a PoC.
Here are list of links which you can use:
Databricks serving endpoint is a service which allows you to deploy your model and serve it. It is a part of Databricks MLflow. You can find more information about it here:
Once you created an endpoint, out of the box you can see some metrics in Databricks UI. Also list of metrics can be received via REST API:
curl --request GET "https://${DATABRICKS_HOST}/api/2.0/serving-endpoints/{endpoint_name}/metrics" --header "Authorization: Bearer ${DATABRICKS_TOKEN}"
But if you want to export them to Datadog, you need to install and configure Datadog agent.
Datadog offers several ways to install agent, they are listed here:
Before agent installation, you need to create a file `datadog.yaml`, which should be located `/etc/datadog-agent/datadog.yaml`, with the following content:
api_key: "__DD_API_KEY__"
site: "datadoghq.com"
hostname: "__YOUR_HOST_NAME__"
Then you need to create `metric-config.yaml` file, which should be located `/etc/datadog-agent/conf.d/openmetrics.d/conf.yaml.default`, with the following content:
instances:
- openmetrics_endpoint: https://__DATABRICKS_HOST__/api/2.0/serving-endpoints/__ENDPOINT_NAME__/metrics
metrics:
- cpu_usage_percentage:
name: cpu_usage_percentage
type: gauge
- mem_usage_percentage:
name: mem_usage_percentage
type: gauge
- provisioned_concurrent_requests_total:
name: provisioned_concurrent_requests_total
type: gauge
- request_4xx_count_total:
name: request_4xx_count_total
type: gauge
- request_5xx_count_total:
name: request_5xx_count_total
type: gauge
- request_count_total:
name: request_count_total
type: gauge
- request_latency_ms:
name: request_latency_ms
type: histogram
tag_by_endpoint: false
send_distribution_buckets: true
headers:
Authorization: Bearer __DATABRICKS_TOKEN__
Content-Type: application/openmetrics-text
The full possible options for `conf.yaml.default` can be found in template file which is located in a container under `/opt/datadog-agent/agent/conf.d/openmetrics.d/`
I’ve created the next Dockerfile, because I have some difficulties with that one which is provided in Datadog documentation:
FROM ubuntu:20.04
# Set environment variables
ENV DD_API_KEY=[SPECIFY_DD_API_KEY]
ENV DD_SITE=datadoghq.com
ENV DD_HOSTNAME=datadog-agent-ubuntu
ENV DATABRICKS_HOST=[SPECIFY_DATABRICKS_HOST]
ENV DATABRICKS_TOKEN=[SPECIFY_DATABRICKS_TOKEN]
ENV ENDPOINT_NAME=[SPECIFY_ENDPOINT_NAME]
# Install curl
RUN apt-get update && apt-get install -y curl
# Copy the datadog.yaml file into the Docker image
COPY datadog.yaml /etc/datadog-agent/datadog.yaml
RUN sed -i -e "s@__DD_API_KEY__@$DD_API_KEY@g" /etc/datadog-agent/datadog.yaml
# Install the Datadog agent
RUN bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh)"
# Copy the metrics-config.yaml file into the Docker image
COPY metrics-config.yaml /etc/datadog-agent/conf.d/openmetrics.d/conf.yaml.default
RUN sed -i -e "s@__DATABRICKS_HOST__@$DATABRICKS_HOST@g; s@__ENDPOINT_NAME__@$ENDPOINT_NAME@g; s@__DATABRICKS_TOKEN__@$DATABRICKS_TOKEN@g; " /etc/datadog-agent/conf.d/openmetrics.d/conf.yaml.default
RUN service datadog-agent restart
# Run a process that doesn't exit
CMD ["tail", "-f", "/dev/null"]
# Step 1: Build the Docker image
# Navigate to the directory containing the Dockerfile
cd /path/to/directory/with/dockerfile
# Build the Docker image
# Replace 'my-image' with the name you want to give to your Docker image
docker build -t datadoc-agent-image .
# Step 2: Run a container from the built image
# Replace 'my-container' with the name you want to give to your Docker container
docker run --name datadog-agent -d datadoc-agent-image
# Step 3: Connect to a container and restart agent
docker exec -it datadog-agent bash
service datadog-agent restart
Some usefully commands for DataDog agent:
datadog-agent status
datadog-agent health
service datadog-agent start
service datadog-agent restart
Also appears here.