Introduction

OpenTelemetry Tracing

Traces track the progression of a single request, called a trace, as it is handled by services that make up an application. The request may be initiated by a user or an application. Distributed tracing is a form of tracing that traverses process, network and security boundaries. Each unit of work in a trace is called a span; a trace is a tree of spans. Spans are objects that represent the work being done by individual services or components involved in a request as it flows through a system. A span contains a span context, which is a set of globally unique identifiers that represent the unique request that each span is a part of. A span provides Request, Error and Duration (RED) metrics that can be used to debug availability as well as performance issues. A trace contains a single root span which encapsulates the end-to-end latency for the entire request. You can think of this as a single logical operation, such as clicking a button in a web application to add a product to a shopping cart. The root span would measure the time it took from an end-user clicking that button to the operation being completed or failing (so, the item is added to the cart or some error occurs) and the result being displayed to the user. A trace is comprised of the single root span and any number of child spans, which represent operations taking place as part of the request. Each span contains metadata about the operation, such as its name, start and end timestamps, attributes, events, and status. To create and manage spans in OpenTelemetry, the OpenTelemetry API provides the tracer interface. This object is responsible for tracking the active span in your process, and allows you to access the current span in order to perform operations on it such as adding attributes, events, and finishing it when the work it tracks is complete. One or more tracer objects can be created in a process through the tracer provider, a factory interface that allows for multiple tracers to be instantiated in a single process with different options. References - https://opentelemetry.io/docs/concepts/data-sources/

OpenTelemetry

Libraries

How to set up OT libraries to create signals:

Logs

OpenTelemetry does not provide a dedicated logging API nor libraries for creating logs right now. There are mature libraries which do that already and support for those projects is planned instead. Most probably that will be an appender which will correlate logs with traces.

How to create simple logs

Standard libraries for programming languages provide simple logging capabilites out of the box. You can use them to create logs. If your application follows The Twelve-Factor App standard "it should write its logs, unbuffered, to stdout".
Later, the execution environment like systemd, Docker or Kubernetes takes care of routing this event stream to the local file or sending to other systems like Sumo Logic directly.

If your application logs are written to the local file they can be collected by OpenTelemetry Collector for further processing.
You can find more on that topic in the OpenTelemetry Collector / Logs section.

Example

Let's assume that your application writes logs to stdout and runs inside a Docker container.

As a result, by default on a Linux machine those logs are written to the /var/lib/docker/containers/<container-id>/<container-id>-json.log file.

The log file from Docker container can be scraped by the OpenTelemetry Collector with the help of filelogreceiver. This receiver will tail the log file and for each line emit a new record for further processing inside the OpenTelemetry Collector. It will also attach a file name metadata to this record.

Correlating logs and traces

When you use the OpenTelemetry appender for logging library you can correlate logs with traces.

Metrics

A metric is a measurement about a service, captured at runtime. Logically, the moment of capturing one of these measurements is known as a metric event which consists not only of the measurement itself, but the time that it was captured and associated metadata. Application and request metrics are important indicators of availability and performance. Custom metrics can provide insights into how availability indicators impact user experience or the business. Collected data can be used to alert of an outage or trigger scheduling decisions to scale up a deployment automatically upon high demand. Refer : https://opentelemetry.io/docs/concepts/data-sources/

Statsd Metrics for Python App:

StatsD is originally a simple daemon to aggregate and summarize application metrics. With StatsD, applications are to be instrumented by developers using language-specific client libraries. These libraries will then communicate with the StatsD daemon using its dead-simple protocol, and the daemon will then generate aggregate metrics and relay them to virtually any graphing or monitoring backend. In the coffee bar app we have used Statsd library to collect the metrics . In the python app we have directly imported Statsd library . Sample code : import statsd Different types of Metrics by Statsd :

  • Counters : They are treated as a count of a type of event per second.
  • Timers : Timers are meant to track how long something took.
  • Gauges : Gauges are a constant data type. They are not subject to averaging, and they don’t change unless you change them.

OpenTelemetry Manual Instrumentation for Python App:

The librars used for Auto instrumentation can be imported directly . Sample code: from opentelemetry import metrics from opentelemetry.sdk.metrics import Counter, MeterProvider from opentelemetry.sdk.metrics.export import ConsoleMetricsExporter from opentelemetry.sdk.metrics.export.controller import PushController

OpenTelemetry defines three metric instruments today:

  • counter: a value that is summed over time – you can think of this like an odometer on a car; it only ever goes up.
  • measure: a value that is aggregated over time. This is more akin to the trip odometer on a car, it represents a value over some defined range.
  • observer: captures a current set of values at a particular point in time, like a fuel gauge in a vehicle.

Auto-Instrumentation Examples

Python app

The python apps part of the framework are instrumented with OpenTelemetry-Python

For each of the application the-coffe-lover, the-coffee-bar, the-coffee-machine and the-cashdesk we execute opentelemetry-auto-instrumentation python app_name -h, e.g. opentelemetry-auto-instrumentation python the-coffee-bar -h. Configuration to the application can be provided by config file - example config file can be found in src/config/config.yaml or by application arguments.

More details in this readme

Ruby APP

The ruby applications part of the framework are instrumented with OpenTelemetry-Ruby

Each of the application machine-svc, water-svc, coffee-svc needs additional configuration - in this case everything is based on the environment variables.

For more details

Dotnet app

The ASP .NET Core application is auto-instrumented by OpenTelemetry-Dotnet.

In this situation the application requires some variables to be set Common

  • SERVICE_NAME - defines the name of the service (calculator-svc by default)
  • EXPORTER - defines the span exporter (otlp, zipkin, jaeger, console - otlp by default) Specific for OTLP Exporter
  • OTEL_EXPORTER_OTLP_ENDPOINT - defines the OTLP gRPC Collector Endpoint (e.g. http://localhost:4317) Specific for Zipkin Exporter
  • OTEL_EXPORTER_ZIPKIN_ENDPOINT - defines the Zipkin HTTP Collector Endpoint (e.g. localhost:9411/api/v2/spans)

For more details

OpenTelemetry Collector

How to set up OT libraries to create signals:

Logs

Metrics

Manual Instrumentation

The python apps part of the framework are instrumented with ManualTelemetry-Python. These metrics are configured using statsd . To receive the Statsd metrics you can configure it in otelcol.yaml receivers. The otelcol.yaml can be found at deployments/docker-compose.

Sample Configuration for Python App:

statsd: endpoint: "0.0.0.0:8125" # default aggregation_interval: 60s # default enable_metric_type: true # default is_monotonic_counter: true # default timer_histogram_mapping: - statsd_type: "histogram" observer_type: "gauge" - statsd_type: "timing" observer_type: "gauge"

Samples of metrics generated in Sumo portal :

{
_collector="tcb-otc-distro",
_collectorId="000000000*****",
_source="tcb-otc-distro",
_sourceCategory="tcb-otc-distro",
_sourceHost="otelcol",
_sourceId="0000000000000000",
_sourceName="OTC Metric Input"
metric_type="counter"
}

Manual Instrumentation for Python App:

The Coffee Bar App can be Auto instrumented using Open Telementary.

batcher_mode = "stateful"
metrics.set_meter_provider(MeterProvider())
meter = metrics.get_meter(__name__, batcher_mode == "stateful")
exporter = ConsoleMetricsExporter()
controller = PushController(meter, exporter, 5)

staging_label_set = meter.get_label_set({"environment": "staging"})

requests_counter = meter.create_metric(
    name="requests",
    description="number of requests",
    unit="1",
    value_type=int,
    metric_type=Counter,
    label_keys=("environment",),

Samples of metrics generated :

ConsoleMetricsExporter(data="Counter(name="requests", description="number of requests")", label_set="(('environment', 'staging'),)", value=25)

For more details refer : https://opentelemetry-python-yusuket.readthedocs.io/en/latest/getting-started.html#adding-metrics

Traces

Pipelines

Receivers

Processors

Exporters

The Coffee Bar

Contributors

Created with ❤️ by Sumo Logic and contributors: