Best practices for ingesting logs, traces, and metrics with Fluent Bit 3.0

Published May 7, 2024 Updated July 19, 2024 7 min read

Giulia Di Pietro

Henrik Rexed

This blog post will introduce you to Fluent Bit 3.0 and some best practices for using it in your observability pipeline.

The recent release of Fluent Bit 3.0 offers some new opportunities for Fluent Bit best practices. Let’s take a look at Fluent Bit and what’s new for v3.

What is Fluent Bit?

Fluent Bit is a telemetry agent designed to receive data (logs, traces, and metrics), process or modify it, and export it to a destination. Fluent Bit can serve as a proxy before you send data to Dynatrace or similar. However, you can also use Fluent Bit as a processor because you can perform various actions on the data. Fluent Bit was designed to help you adjust your data and add the proper context, which can be helpful in the observability backend.

What’s the difference between Fluent Bit and Fluentd?

Fluent Bit and Fluentd were created for the same purpose: collecting and processing logs, traces, and metrics. However, Fluent Bit was designed to be lightweight, multi-threaded, and run on edge devices.

Fluent Bit was created before Kubernetes existed when the internet of things (IoT) was a new buzzword. Unfortunately, their market prediction wasn’t correct; the cloud became more successful than IOT.

Learn more about configuring Fluent Bit to collect logs for your Kubernetes cluster and collecting logs with Fluentd.

What’s new in Fluent Bit 3.0

Fluent Bit recently released version 3.0, which offers a range of updates:

HTTP/2 support: Fluentbit now supports HTTP/2, enabling efficient data transmission with Gzip compression for OpenTelemetry data, enhancing pipeline performance.
New processors: Introducing new processors, including Metric Selector and Content Modifier, for selective data processing and metadata adjustment, improving data relevance and storage efficiency.
SQL processor: Integration of SQL processing capabilities for precise data selection and routing, allowing users to filter logs and traces based on specific criteria for optimized backend storage.
Enhanced filtering: Streamlined data collection with selective metrics inclusion/exclusion, efficient metadata adjustment, and precise data routing, ensuring only relevant data is transmitted to the observability backend.
Scalability and efficiency: Improved scalability and performance enhancements, optimized compression, and efficient data routing contribute to a more streamlined and cost-effective observability solution with Fluentbit 3.0.

In the following paragraphs, we will introduce a few best practices, particularly when using Fluent Bit 3.0. If you’d like to learn more about the updates, don’t miss Henrik’s video: Fluent Bit 3.0 Observability: Elevating Logs, Metrics, and Traces!

Fluent Bit best practices for v3.0

Here are some best practices for using Fluent Bit 3.0, as recommended by our internal expert, Henrik Rexed.

Understand what you want to accomplish

Before using Fluent Bit, you should clearly understand what you want to accomplish with your data. If you have logs, Fluent Bit is the right choice. For metrics, Fluent Bit has some limitations. And for traces, sampling won’t be possible. Be sure you can achieve what you want with Fluent Bit; otherwise, you might need to look into other options. Ask yourself, how much data should Fluent Bit process? Would it be better to send the data somewhere else, like Dynatrace, and let Grail process it?

Use the Expect plugin

Start small and test that you’re achieving what you expect by using the Expect plugin, a gatekeeper that helps your pipeline be more efficient by blocking content that doesn’t follow the rules you put in place.

Before reading the expected logs, you can either create a file with log data designed for testing or use the dummy input plugin that sends dummy data to the pipeline (see example below).

pipeline:

 inputs:

   - name: dummy

     dummy: '{"message": "custom dummy"}'

 outputs:

   - name: stdout

     match: '*'

Apply modifications directly at the source

Attaching the processor to the input or output sections allows you to apply modifications directly at the source, making you more efficient. Localized modifications are thus possible before logs go through the parser, which means you can:

Make global rules
Be more tactical
Save CPU time
Have a more robust and reliable pipeline

Processing at the source relieves pressure on the Fluent Bit agent.

Other types of modifications

Traces

It might be a rare use case, but if you want to obfuscate some traces, you can use the content_modifier. The content modifier can only be used for traces and logs, not metrics.

Metrics

Use the label processor to remove metrics, add them, and reduce cardinality and metric costs. The only thing missing is converting the metric (cumulative and delta).

Logs

The SQL processor makes your pipeline experience easier since you can use select to filter precisely which logs you want to change.

Understand the concept of tags

Your experience with Fluent Bit will improve if you learn how tags function, especially if you’ve never used Fluent before. Tags make routing possible and are set in the configuration of the input definitions. They can be used to apply specific operations only to specific data.

For example, you’ve ingested OpenTelemetry metrics, logs, and traces, so you might have:

Otlp.metrics
Otlp.logs

Using the tag “metrics,” you can target just the otlp.metrics for certain operations.

There’s also a filter plugin called rewrite_tag that, as the name suggests, you can reuse to re-emit a record under a new tag.

Use the service section to check your pipeline’s health

In the service section, you can enable two crucial things: the Health_Check and the http_service. These help you get more insight into the health of your pipeline with Kubernetes. See the example below.

apiVersion: v1

kind: ConfigMap

metadata:

 annotations:

   meta.helm.sh/release-name: fluent-bit

   meta.helm.sh/release-namespace: fluentbit

 labels:

   app.kubernetes.io/instance: fluent-bit

   app.kubernetes.io/managed-by: Helm

   app.kubernetes.io/name: fluent-bit

   app.kubernetes.io/version: 2.2.1

   helm.sh/chart: fluent-bit-0.42.0

 name: fluent-bit

 namespace: fluentbit

data:

 custom_parsers.conf: |

   [PARSER]

       Name docker_no_time

       Format json

       Time_Keep Off

       Time_Key time

       Time_Format %Y-%m-%dT%H:%M:%S.%L

 fluent-bit.yaml: |

   service:

     http_server: "on"

     Health_Check: "on"


   pipeline:

     inputs:

       - name: tail

         path: /var/log/containers/*.log

         multiline.parser: docker, cri

         tag: kube.*

         mem_Buf_Limit: 5MB

         skip_Long_Lines: On

         processors:

           logs:

             - name: content_modifier

               action: insert

               key: k8s.cluster.name

               value: ${CLUSTERNAME}

             - name: content_modifier

               action: insert

               key: dt.kubernetes.cluster.id

               value: ${CLUSTER_ID}

             - name: content_modifier

               context: attributes

               action: upsert

               key: "agent"

               value: "fluentbitv3"

      

       - name: fluentbit_metrics

         tag:  metric.fluent

         scrape_interval: 2

   

     filters:


       - name: kubernetes

         match: kube.*

         merge_log: on

         keep_log: off

         k8s-logging.parser : on

         k8S-logging.exclude: on




       - name: nest

         match: kube.*

         operation: lift

         nested_under: kubernetes

         add_prefix :  kubernetes_

       - name: nest

         match: kube.*

         operation: lift

         nested_under: kubernetes_labels

       - name: modify

         match: kube.*

         rename:

           - log content

           - kubernetes_pod_name k8s.pod.name

           - kubernetes_namespace_name k8s.namespace.name

           - kubernetes_container_name k8S.container.name

           - kubernetes_pod_id k8s.pod.uid

         remove:

            - kubernetes_container_image

            - kubernetes_docker_id

            - kubernetes_annotations

            - kubernetes_host

            - time

            - kubernetes_container_hash

      

       - name: throttle

         match: "*"

         rate:     800

         window:   3

         print_Status: true

         interval: 30s

     


     outputs:

      

       - name: opentelemetry

         host: ${DT_ENDPOINT_HOST}

         port: 443

         match: "kube.*"

         metrics_uri: /api/v2/otlp/v1/metrics

         traces_uri:  /api/v2/otlp/v1/traces

         logs_uri: /api/v2/otlp/v1/logs

         log_response_payload: true

         tls:  On

         tls.verify: Off

         header:

           - Authorization Api-Token ${DT_API_TOKEN}

           - Content-type application/x-protobuf

       - name: prometheus_exporter

         match: metric.*

         host: 0.0.0.0

         port: 2021

Enabling the http_server also exposes many API endpoints.

Utilize Hot Reload to save time with changes

By default, any change you make requires an agent restart, which can be time-consuming. If you enable Hot Reload, an HTTP request is sent to reload Fluent Bit and read the pipeline files, which makes the whole process smoother. It’s particularly great if you’re in a bare metal environment.

Expose data for Prometheus with Fluent Bit Metrics

If you want to expose data from a Prometheus standpoint, use the Fluent Bit Metrics plugin, which asks Fluent Bit to become a Prometheus exporter. The plugin is excellent because it produces interesting metrics.

service:


   flush: 1


   log_level: info


pipeline:


   inputs:


       - name: fluentbit_metrics


         tag: internal_metrics


         scrape_interval: 2


   outputs:


       - name: prometheus_exporter


         match: internal_metrics


         host: 0.0.0.0


         port: 2021

Mind your memory and resources

When you modify your data and reject what you don’t need, a retry logic keeps the rejected data in memory. However, if too many exceptions exist, the Fluent Bit agent will become unstable and crash. So it’s important to regularly look at the stdout of Fluent Bit and remove the noise; otherwise, the logs may become unreliable.

In every input plugin for logs and traces, we see a (memory) buffer. Data can be ingested faster than it can be sent to its destination, which can create backpressure, leading to high memory consumption. Plan to avoid out-of-memory situations. By default, you have a storage type memory, but you may exceed this buffer limit if you have a lot of data. It’s crucial to finetune this memory buffer limit to avoid instability.

On a similar note, pay attention to the resource usage of the agent (CPU, behavior, throttling) and the pipeline metrics. Create a dashboard to see those metrics, as you can see in the example below from Dynatrace:

Fluent Bit dashboard in Dynatrace screenshot

An overview like this can help you see if you’re losing data. For example, “I’m receiving 100 metrics, but only 30 are being outputted, which means I am losing 70% of my data.” Is this intentional? Are there a lot of retries? Retries could be the cause of this issue.

In the dashboard, the Fluent Bit logs are very telling. The logs are rich in information and can help you identify issues that might cause Fluent Bit to be unstable. For example, you can see if there is a major difference between the data being sent into Fluent Bit and how much arrives in Dynatrace.

Fluent Bit dashboard in Dynatrace screenshot

Conclusion

In this blog post, we delved into the latest updates in Fluent Bit 3.0 and outlined some best practices for leveraging its capabilities effectively within your observability pipeline.

Learn more about Fluent Bit 3.0 on this YouTube video created by Henrik: Fluent Bit 3.0 Observability: Elevating Logs, Metrics, and Traces!

View Henrik’s YouTube video, “Fluent Bit 3.0 Observability: Elevating Logs, Metrics, and Traces!”

Learn more about Fluent Bit 3.0