Install Guide

YALP is designed to be installed on multiple servers, with different components running on separate machines. It can just as easily be installed on a single machine. This guide will show how to setup all components on a single host, but will also describe how the components could easily be distributed.

Celery Broker

Since YALP uses Celery for communication between components, a broker must be installed. For this guide, the default broker rabbitmq will be used.

To install rabbitmq under Ubuntu.

$ sudo apt-get install rabbitmq-server

Installing YALP

For now the easiest way to YALP is installed in a virtualenv.

$ virtualenv /srv/yalp_env
$ source /srv/yalp_env/bin/activate

Then install via pypi using pip or easy_install.

(yalp_env) $ pip install yalp

The three components yalp-inputs, yalp-parsers and yalp-outputs, should now be accessible.

Configuration

YALP uses a single YAML configuration file for all three components. Generally the config file should be consistent throughout the infrastructure, with the exception of the yalp-inputs configuration, which should be specific to the host where the input is being collected.

The first section of the config file deals with Celery configuration.

# Celery configuration
broker_url: amqp://guest:guest@localhost:5672//
parser_queue: parsers
output_queue: outputs
parser_worker_name: parser-workers
output_worker_name: output-workers
broker_url
This is the connection uri for connecting to the broker.
parser_queue
This is the name of the queue that the Parsers will watch for tasks. This can be set to any name so that it is easily identifiable, especially if the broker is being used for other services. The default name is parsers.
output_queue
This is the name of the queue that the Outputs will watch for tasks. This can be set to any name so that it is easily identifiable, especially if the broker is being used for other services. The default name is outputs.
parser_worker_name
This is the name on the Parser processes so that can easily be identifies via tools like ps.
output_worker_name
This is the name on the Output processes so that can easily be identifies via tools like ps.

The next section of the config is for plugin configuration.

# Plugin configuration
input_packages:
  - yalp.inputs
parser_packages:
  - yalp.parsers
output_packages:
  - yalp.outputs

Each option contains a list of python packages that contain plugin modules for the specific component. This allows to specifying custom or third-part plugins. The defaults are in the example above.

Next is the inputs section.

# Input configuration
inputs:
  - file:
      path: '/var/log/nginx/access.log'

This section contains a list of inputs to monitor for events. This example is set to monitor /var/log/messages. The type option limits what parsers and outputers will process this input. Only parsers are outputs that have the same type will process the message. The general format is as follows.

inputs:
  - <module>:
      <option>: <value>
      ...
      <option>: <value>
  - <module>:
      <option>: <value>
      ...
      <option>: <value>

The last two sections are similar to the inputs section but are for configuring the parsers and outputs.

parsers:
  - grok:
      pattern: '%{COMBINEDAPACHELOG}'

outputs:
  - elasticsearch:
      uri: http://localhost:9200

This configures the parsers to pass the message to the outpers without modifing it. The message will then to output to mongodb running on the same machine.