paint-brush
How to Use Datadog for the APM Metrics Applicationby@socialdiscoverygroup
13,035 reads
13,035 reads

How to Use Datadog for the APM Metrics Application

by Social Discovery GroupNovember 7th, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

SDG team shares some hacks of how they managed to build a stable application monitoring and metrics system and reduced bug detection time on their products with the help of Datadog. 
featured image - How to Use Datadog for the APM Metrics Application
Social Discovery Group HackerNoon profile picture

Effective metrics and monitoring play a key role in high-quality development, bug fixes, and handling of user requests and incidents. At Social Discovery Group, we employ a diverse range of tools to evaluate the performance of our products.


In this article, we will share some hacks of how we managed to build a stable application monitoring and metrics system and reduced bug detection time on our products with the help of Datadog.


SDG products are helping more than 250 mln people connect and build relationships globally, and our user base is constantly growing. The main factor in this success is our ability to promptly respond to user needs. We have a lot of experience working with various monitoring systems, including Datadog. Here's why:


  1. Currently, Datadog offers a variety of integration options with systems at different levels. You can explore these capabilities here.


  2. Datadog provides comprehensive documentation for configuring integrations with various services.


  3. We managed to build strong partnership relations.


Here, we're sharing our experience in setting up and configuring APM metrics for applications within our Kubernetes cluster for Datadog. The article won't cover the deployment of projects in AKS, the CI/CD process, and other DevOps details.


Instead, we will focus on the finer points of configuring Datadog monitoring for APM metrics.


The technology stack used: Azure Services, Azure Kubernetes Service (AKS), ASP .Net Core 7, Datadog.


For monitoring applications and services, we utilize the Datadog agent deployed within the cluster through a Helm chart with additional parameters from the values.yaml file. In the main agent configuration file, you'll need to enable the DogStatsD module and specify the port (default is 8125).


For the data sent from an external host, the Datadog agent requires the following configuration: dogstatsd_non_local_traffic: true and apm_non_local_traffic: true. Here is the values.yaml file for one of the clusters, with some variables being passed at the deployment stage. The deployment is produced with Azure Devops.


" 
datadog:
  apiKey: #{apiKey}#
  appKey: #{appKey}#
  clusterName:  #{ClusterName}#
  kubeStateMetricsEnabled: true
  clusterChecks:
    enabled: true
  dogstatsd:
    useSocketVolume: false
    nonLocalTraffic: true
  collectEvents: false
  apm:
    portEnabled: true #important to include
  env:
    - name: "DD_KUBELET_TLS_VERIFY"
      value: "false"
  systemProbe:
    collectDNSStats: false
  orchestratorExplorer:
    enabled: false
clusterAgent:
  image:
    repository: public.ecr.aws/datadog/cluster-agent
    tag: #{tag}#
  admissionController:
    enabled: false
agents:
  image:
    repository: public.ecr.aws/datadog/agent
    tag: #{tag}#
    doNotCheckTag:  true
clusterChecksRunner:
  image:
    repository: public.ecr.aws/datadog/agent
    tag: #{tag}#
"


Then, you need to specify the agent's address for transmitting metrics to Datadog in the application settings. The dashboards for app monitoring are based on the internally used metrics; they had to be created independently.


To set up monitoring of ASP.Net services, we used the official documentation that can be found by the link.


Since the agent was already configured, one of the methods was to add the necessary lines to the image build and pass variables in the CI/CD system: DD_ENV, DD_SERVICE, DD_AGENT_HOST, to specify the environment, service name, and agent address, respectively. We also need to add the following to the dockerfiles for the services:


"
RUN TRACER_VERSION=$(curl -s \https://api.github.com/repos/DataDog/dd-trace-dotnet/releases/latest | grep tag_name | cut -d '"' -f 4 | cut -c2-) \
    && curl -Lo /tmp/datadog-dotnet-apm.deb https://github.com/DataDog/dd-trace-dotnet/releases/download/v${TRACER_VERSION}/datadog-dotnet-apm_${TRACER_VERSION}_amd64.deb

# Copy the tracer from build target
COPY --from=build /tmp/datadog-dotnet-apm.deb /tmp/datadog-dotnet-apm.deb
# Install the tracer
RUN mkdir -p /opt/datadog \
    && mkdir -p /var/log/datadog \
    && dpkg -i /tmp/datadog-dotnet-apm.deb \
    && rm /tmp/datadog-dotnet-apm.deb
 
# Enable the tracer
ENV CORECLR_ENABLE_PROFILING=1
ENV CORECLR_PROFILER={846F5F1C-F9AE-4B07-969E-05C26BC060D8}
ENV CORECLR_PROFILER_PATH=/opt/datadog/Datadog.Trace.ClrProfiler.Native.so
ENV DD_DOTNET_TRACER_HOME=/opt/datadog
ENV DD_INTEGRATIONS=/opt/datadog/integrations.json

"


This method works, but it didn't seem like the most optimal solution. We decided to take it a step further and added the following to the deployments of our services:

  "
metadata.labels:
    tags.datadoghq.com/env: feature
    tags.datadoghq.com/service: service_name
    tags.datadoghq.com/version: '1552'

spec.template.metadata.labels:
    admission.datadoghq.com/config.mode: service
    admission.datadoghq.com/enabled: 'true'
    tags.datadoghq.com/env: feature
    tags.datadoghq.com/service: service_name
    tags.datadoghq.com/version: '1111'

spec.template.metadata.annotations:
    admission.datadoghq.com/dotnet-lib.version: v2.38.0

spec.template.spec.containers.name.env:
    - name: DD_TRACE_AGENT_URL
      value: datadog-agent.monitoring
    - name: DD_TRACE_STARTUP_LOGS
      value: 'true'
    - name: DD_LOGS_INJECTION
      value: 'true'
    - name: DD_RUNTIME_METRICS_ENABLED
      value: 'true'
    - name: DD_PROFILING_ENABLED
      value: 'true'
    - name: DD_APPSEC_ENABLED
      value: 'true'
"


That’s what was changed in the agents:


"
datadog:
  apm:
    socketEnabled: true
    portEnabled: true
    enabled: true
clusterAgent:
  admissionController:
    enabled: true
    mutateUnlabelled: false
providers:
  aks:
    enabled: true
"


After all these steps, the data with highly detailed metrics for all services started flowing in Datadog under the APM -> Services section, and the graphs were automatically displayed.


Datadog Dashboard


We had to tinker with the annotations settings for the second method; not everything started working smoothly right away.


Regarding the notification system, it's worth mentioning that it is user-friendly and intuitive in Datadog. Notifications are created in the "Monitors -> Manage Monitors" section.


Notifications at Datadog


The improvements we described above yielded several valuable outcomes. We now have a deeper understanding of how our system operates and adapts to various changes.


Additionally, we've established a stable application monitoring and metrics system that operates independently of service builds, helping to reduce bug detection times.


This, in turn, has allowed us to optimize our services and improve development speed and overall system quality.