Logagent: The Swiss Army Knife for Log Processing?

June 6, 2018 by Hayden James, in Blog Guests Linux

Guest post by Stefan Thies DevOps Evangelist at Sematext Group Inc.

Dealing with log files or extracting data from various data sources is a daily task in the IT administration. Users ask about statistics or details of their technical or business operations. The relevant data is often present in log files, databases or message queues. Converting data from various sources into actionable knowledge can be a challenging task. For automation of ETL tasks we often see a mix of various scripting languages used. Elastic Stack is a popular toolset to analyze structured and unstructured data. While Logstash is a very feature rich tool, it uses a lot of system resources. Filebeat is lightweight but is not very flexible when it comes to data transformations. People combine Filebeat with Logstash or Elasticsearch ingest pipelines to overcome these issues. Is there a simpler alternative? Logagent is one of the open source Logstash alternatives.

Logagent is not limited to use with Elasticsearch. It has output plugins for files, MQTT, InfluxDB, Apache Kafka, ZeroMQ, Graylog, Sematext Cloud, and more.

Table of Contents

What is Logagent

Logagent is a modern, open-source, lightweight log shipper with out of the box and extensible log parsing, on-disk buffering, secure transport and bulk indexing to Elasticsearch, InfluxDB and Sematext Cloud. The low memory footprint and low CPU overhead make Logagent suitable for deploying on edge nodes and devices. On the other hand its ability to parse and structure logs makes it a great Logstash alternative for centralized buffering and processing. A rich set of plugins supports inputs and outputs for various SQL, NoSQL and time series databases.

Introduction to Logagent

Logagent can be used as either as a command line tool or as a service.

The Logagent setup requires Node.js runtime.

A typical task is to parse log files and structure text lines in multiple fields to create, later on, analytics dashboards in Kibana, InfluxDB or Sematext Cloud.

Logagent is able to parse many log formats such as web server, database, message queues or search engine logs. The log parsing patterns are defined in a file called patterns.yml. The configuration accepts file references to pattern files. The “hot-reload” of patterns makes changes in production environments painless.

The default pattern definition file comes with patterns for:

Web server (Nginx, Apache Httpd)
MongoDB
MySQL
Redis
Elasticsearch
Zookeeper
Cassandra
Postgres
RabbitMQ
Kafka
HBase HDFS Data Node
HBase Region Server
Hadoop YARN Node Manager
Apache Solr
various other formats e.g. Linux/Mac OS X system log files

If an application logs in JSON, Logagent recognises it out of the box. By default Logagent operates in multi-line mode. This means indented log entries like stack traces are processed out of the box. Application-specific multi-line parser rules can be specified in the patterns definitions.

Parse and output web server logs

Let us start with some handy commands to demonstrate Logagent usage. Parsing a web server log on the command line and writing it as JSON file in a new file is as easy as:

cat access_log | logagent -n httpd > access_log.json

The “-n httpd” references the web server log pattern pre-defined in patterns.yml to structure web server logs. Logagent could detect the log structure automatically (no need for -n) when you set the environment variable SCAN_ALL_PATTERNS=true. This option costs CPU time, because Logagents iterates over all pattern definitions until it finds a matching pattern. Therefore SCAN_ALL_PATTERNS=false is the default value.

To see web server logs in YAML format and enrich them with GeoIP information on the console just type:

tail -f access_log | logagent -n httpd --yaml --geoipEnabled=true

Logagent has various ways to mask fields by truncating IP addresses, hashing or encryption. Those shipping logs and wanting to comply with the GDPR regulations might be interested in masking sensitive data in web server logs.

Ship log files to Elasticsearch

Seeing structured logs on console is nice, but typically we want to store logs in Elasticsearch for analysis with Kibana. We just need to tell Logagent where Elasticsearch is running and the index in which data should be written

tail -f access_log | logagent -n httpd --yaml -e https://localhost:9200 -i my_log_index

Nice and easy for one file, right? But if we want to read all logs from /var/log and ingest them all in Elasticsearch we could use glob patterns to discover new log files in a log directory:

logagent -g '/var/log/**/*.log' -e http://localhost:9200 -i my_log_index

Logagent buffers logs to disk and retries failed indexing requests. Failures happen for various reasons (e.g. network down, red cluster state, wrong mapping settings, node down, etc). Reliable log shipping is possible without complicating the log processing pipeline with external persistent queues like Apache Kafka.

Ship log files to cloud services and time series databases

You don’t have a local Elasticsearch & Kibana setup? Not a problem. Sematext Cloud is effectively managed Elasticsearch, so one can use Logagent to ship logs there. Simply use the Sematext logs token and Sematext receiver URL in the Logagent command:

logagent -g '/var/log/**/*.log' -e https://logsene-receiver.sematext.com -i SEMATEXT_LOG_TOKEN

You can see more detailed info at https://apps.sematext.com/ui/howto/Logsene/logagent.

Once data gets shipped, you can create your Dashboards in the Sematext UI or in the integrated Kibana. Sematext Cloud is not the only supported Elasticsearch alternative. Logagent supports AWS Elasticsearch service plugin, InfluxDB or Influx Cloud plugin and other destination via Logagent plugin modules.

Aggregate logs with in-memory SQL

In case you have a large number of log entries, it makes sense to pre-aggregate logs with the in-memory SQL feature. The SQL output filter buffers all logs for a specified period and runs a SQL query. The result of the SQL query is then passed to the Logagent output modules. The below configuration file creates a logging pipeline with the following functionality:

Stream web server logs from a file (like tail -F)
Run aggregation query with in-memory SQL. Use the SQL filter to select e.g. security related messages like logins from the log file. You can apply multiple SQL statements for the same input.
Output results to Elasticsearch

input: 
  files:
    - '/var/log/*/access.log'

outputFilter:
  - module: sql
    config:
      source: !!js/regexp /access.log|httpd/
      interval: 1 # every second
      queries:
        - # calculate average page size for different HTTP methods
          SELECT 'apache_stats' AS _type, 
                  AVG(size) AS size_avg, 
                  COUNT(method) AS method_count, 
                  method as http_method
          FROM ? 
          GROUP BY method
        - # log each request to the login page 
          SELECT * 
          FROM ? 
          WHERE path like "/wp-login%" 
output:
  elasticsearch:
    module: elasticsearch
    url: http://localhost:9200
    index: mylogs

Finally, you run Logagent with the SQL config:

logagent --config logagent-web-sql.yml --yaml --printStats 10

The –yaml option above shows the output on the console, so you can combine static configuration file settings with command line arguments. Very handy while testing configurations before going live. The argument –printStats N, prints every N seconds processing statistics to console.

Ahh, one more Logagent specific thing. It writes its own logs to stderr only. Why? Stdin and stdout are always available for piping the input/output to other command line tools. For example:

cat someting | logagent --config myconfig.yml | netcat ...

Transform data structures with JavaScript

Finally, let’s explore Logagent’s powerful JavaScript scripting feature. In the following example we read data with the command plugin (curl) from a Prometheus metrics URL and ship the data to Elasticsearch. The Prometheus data format is not very handy for creating Kibana visualisations, so we can use a JavaScript function in the configuration file to transform the data structure:

input:
  docker-prometheus: 
    module: command
    command: curl http://127.0.0.1:9323/metrics
    sourceName: prometheus_metrics
    debug: false
    restart: 10

parser:
  patternFiles: []
  patterns:
    - # prometheus
      sourceName: !!js/regexp /prometheus_metrics/
      match:
        - type: prometheus_metrics
          regex: !!js/regexp /\sHELP|\sTYPE\s/i
          inputDrop: !!js/regexp /#\sHELP|#\sTYPE\s/i
        - type: prometheus_metrics
          regex: !!js/regexp /(\S+)({.+?})\s([\d|\.]+)/i
          fields: 
            - name
            - labels
            - value
          transform: !!js/function > 
            function (p) {
              try {
                if (p.name) {
                  p.name_space=p.name.split('_')[0]
                }
                if (p.labels) {
                  // hack, make JS code out of key/value format to parse properties
                  p.labels = eval ('x=' + p.labels.replace(/=/g,':'))
                  var l = Object.keys(p.labels)
                  // convert prometheus numbers in quotes to JS numbers
                  for (var i=0; i<l.length; i++) {
                    var n = Number()
                    if (!isNaN(p.labels[l[i]])) {
                      p.labels[l[i]] = Number(p.labels[l[i]])
                    }
                  } 
                }
              } catch (err) {
                console.log(err)
              } 
              p[p.name]=p.value
              return p
            }
        - type: prometheus_metrics
          regex: !!js/regexp /(\S+)\s([\d|\.]+)/i
          fields: 
            - name
            - value
          transform: !!js/function > 
            function (p) {
                if (p.name) {
                  p.name_space = p.name.split('_')[0]
                }
                p[p.name]=p.value
            }


output:
  stdout: yaml
  logsene:
    module: elasticsearch
    url: https://localhost:9200
    index: prom_stats

Once you browse the plugin repository, you will see that Logagent can be used for a number of advanced scenarios like:

Create Apache Kafka producers and consumers without any programming
Create IoT sensors just with Logagent configurations: Air Pollution Tracking with Node.js, Elastic Stack, and MQTT
Detect Malicious traffic in your server logs
Mask, hash or encrypt sensitive data: GDPR and web server logs

We hope this post will help get you started with Logagent. If you like the tool, send us a tweet. If you face any issues, don’t hesitate to file them on Github!