Smule Grapher (SMG) - Configuration Reference

Back To Index

Configuration overview

SMG uses yaml for its configuration file format. Yaml is flexible and it is both easy to read and write.

By default SMG will look for a file named /etc/smg/config.yml and read its configuration from there. That file name can be changed in the conf/application.conf file inside the SMG installation dir, if desired.

All config items are defined as a list. I.e. the top-level yaml structure for a config file must be a list (SMG will fail to parse it otherwise). This basically means that every object definition starts with a dash at the beggining of a line, e.g.:

...

- $rrd_dir: smgrrd
- $include: /path/to/conf/*.yml
...

- id: some.rrd.object
  ...

...

- id: ^some.index.object
  ...

...

Note that ordering matters in general - SMG will try to preserve the order of the graphs to be displayed to match the order in which the objects were defined.

Apart from global variables which are in the form of “- $name: value” pairs, SMG has several types of structured objects. These can be indicated by their “type” property or for some - it can be inferred by the first character of the object id: ‘+’ indicating an Aggregate object, ‘^’ - an Index, and ‘~’ - a Hidden index. An object with no type and none of the special id prefix characters is assumed to be a RRD or a Graph object. So normally an object definition would look like:

# id-prefix-typed objects - the (possibly prefixed) id
# is just a named property
- id: object.id
  ...
- id: +agg.object.id
  ...

# explicitly-typed objects have the type and id as
# named properties
- type: object_type
  id: ...
  ...

Normally the top-level config file would only contain globals including a bunch of “- $include: …” definitions and the actual config files defining these objects would be drop-in files in sub-directories. For example the SMG container image already bundles an /etc/smg/config.yml file which in turn includes a bunch of common directories/globs including /etc/smg/conf.d/*.yml (where one can drop yml files, which in turn can also include other dirs/globs) and /opt/smg/data/conf/autoconf-private.d/*.yml (where by default the Autoconf plugin will drop and remove its per-target template outputs)

Old (deprecated) format

A side note is that originally all object types would be defined using key -> (prop1 -> value1, …) syntax where the above would look like this:

 # id-prefix-typed objects - key is the (possibly
 # prefixed) id
 - object.id:
     ...
 - +agg.object.id:
     ...

 # explicitly-typed objects - key is the type
 # (these used to be special cases of globals)
 - $object_type:
     id: ...
     ...

While now deprecated, this syntax is still suported and can be seen in older config templates and/or examples.

RRD objects

RRD objects usually represent the majority of the SMG config. These define a rrd file to be updated, its structure, the interval on which we want to update it and the command to use to retrieve the values to update. A rrd object is represented by a yaml structure which could look like this:

- id: host.localhost.sysload                            # rrd object id
  command: "smgscripts/mac_localhost_sysload.sh"        # mandatory command outputting values
  timeout: 30                                           # optional - fetch command timeout, default 30
  title: "Localhost sysload (1/5/15 min)"               # optional title - object id will be used if no title is present
  interval: 60                                          # optional - default is 60
  data_delay: 0                                         # optional - default 0
  delay: 0.0                                            # optional - default 0.0
  rrd_type: GAUGE                                       # optional - default is GAUGE. Can be COUNTER etc (check rrdtool docs)
  rrd_init_source: "/path/to/existing/file.rrd"         # optional - if defined SMG will pass --source  to rrdtool create
  stack: false                                          # optional - stack graph lines if true, default - false
  pre_fetch: some_pf_id                                 # optional - specify pre_fetch command id.
  notify-fail: mail-asen,notif-pd                       # optional - sent command failures to these recipients (see notify- conf below)
  labels:                                               # optional - arbitrary map of key/valu labels, useful for
    foo: bar                                            #            filtering and grouping by
  vars:                                                 # mandatory list of all variables to graph
    - label: sl1min                                     # the variable label.
      min: 0                                            # optional - min accepted rrd value, default 0
      max: 1000                                         # optional - max accepted rrd value, default U (unlimmited)
      maxy: 1005                                        # optional - use this te set upper limit on the displayed graph (similar to maxy index option)
      mu: "lavg"                                        # optional - measurement units, default - empty string
      lt: "LINE1"                                       # optional - line type, default is LINE1
      alert-warn-gt: 8                                  # optional - monitoring thresholds, details below
    - label: sl5min                                     # another variable def
    - label: sl15min                                    # a 3d variable def

Most properties are optional except:

Note that changing the number of vars when the rrd object already exists is not supported and will cause update errors, and changing their order will result in messed up data. It is still possible to make such a change outside SMG using rrdtool though.

The following properties are all optional.

Pre-fetch command objects

As explained in the concepts overview SMG RRD objects can specify a pre_fetch command to execute before their own command gets executed (for every interval run). That way multiple objects can be updated from given source (e.g. host/service) while hitting it only once per interval. Pre_fetch itself can have another pre_fetch defined as a parent and one can form command trees to be run top-to-bottom (stopping on failure).

Note that the “run tree” defined in this way must not have cycles which can be created in theory by circularly pointing pre_fetch parents to each other (possibly via other ones). Currently SMG will reject the config update if circular parent depenencies in the run-tree are detected (detection is simply having a hard limit of max 10 parent levels when constructing the run trees).

Here are two example pre_fetch definitions, one referencing the other as a parent:


    - type: pre_fetch
      desc: "check if localhost is up"
      id: host.host1.up
      command: "ping -c 1 host1 >/dev/null"
      notify-fail: mail-asen, notif-pd
      child_conc: 2
      ignorets: true
      timeout: 5

    - type: pre_fetch
      id: host.host1.snmp.vals
      command: "snmp_get.sh o1 laLoad.1 laLoad.2 ssCpuRawUser.0 ssCpuRawNice.0 ..."
      pre_fetch: host.host1.up
      pass_data: true
      timeout: 30
      delay: 5.5

Old (deprecated) syntax - using “$pre_fetch:” instead of “type: pre_fetch”:


    - $pre_fetch:
      id: host.host1.up
      ...

A pre_fetch defines an unique id and a command to execute, together with an optional timeout for the command (30 seconds by default) and an optional “parent” pre_fetch id.

If a pass_data property is defined and is set to true the stdout of the command will be passed to all child commands as stdin.

By default SMG will use the timestamp of the top-level pre-fetch for RRD updates. One can set the ignorets property to ignore the timstamp of the pre-fetch and use the first child timestamp which doesn’t have that property.

In addition pre_fetch can have a child_conc property (default 1 if not specified) which determines how many threads can execute this pre-fetch child pre-fetches (but not object commands which are always parallelized).

One can also specify a delay expressed as a floating point value in seconds. This will cause the command (and it children) to be executed with a delay according to the specified number of seconds. If a negative delay is specified the command will be run with a random delay between 0 and (interval - abs(delay)) seconds.

Pre fetch also supports notify-fail - to override alert recipients for failure and also notify-disable, notify-backoff etc. (check monitoring config for details).

Aggregate RRD objects

In addition to the “regular” RRD Objects described above, one can define “aggregate” RRD Objects. Simiar to the aggregate functions which can be applied on multiple View objects, the aggrgeate RRD objects represent values produced by applying an aggregate function (SUM, AVG, etc, also a special RPN op) to a set of update objects. The resulting value is then updated in a RRD file and the object also represents a View object subject to display and view aggregation functions. An aggrgegate RRD object is defined by prepending a ‘+’ to the object id. Example:

- id: +agg.hosts.sysload                                # aggregate rrd object id, must start with + char
  op: AVG                                               # mandatory aggregate function to apply
  ids:                                                  # list of object ids, either that or a filter must be specified
    - host.host1.sysload                                # regular rrd object id, currently must be defined before this object in the config
    - host.host2.sysload                                # regular rrd object id, currently must be defined before this object in the config
  interval: 60                                          # optional - default is the interval of the first object listed in the ids list
  title: "Localhost sysload (1/5/15 min)"               # optional title - object id will be used if no title is present
  rrd_type: GAUGE                                       # optional - if not set, the rrd_type of the first object will be used
  rrd_init_source: "/path/to/existing/file.rrd"         # optional - if defined SMG will pass --source  to rrdtool create
  stack: false                                          # optional - stack graph lines if true, default - false
  notify-fail: mail-asen,notif-pd                       # optional - sent command failures to these recipients (see notify- conf below)
  labels:                                               # optional - arbitrary map of key/value labels, useful for filtering
    foo: bar                                            #            and grouping by
  vars:                                                 # optional list of all variables to graph. If not set the first object vars list will be used.
    - label: sl1min                                     # the variable label.
      min: 0                                            # optional - min accepted rrd value, default 0
      max: 1000                                         # optional - max accepted rrd value, default U (unlimmited)
      mu: "lavg"                                        # optional - measurement units, default - empty string
      lt: "LINE1"                                       # optional - line type, default is LINE1
      alert-warn-gt: 8                                  # optional - monitoring thresholds, details below
    - label: sl5min                                     # another variable def
    - label: sl15min                                    # a 3d variable def
  labels:                                               # optional - object labels, useful for filtering and grouping by
    foo: bar

As mentioned in the yaml comments above, some of the properties of the aggregate object will be assumed from the first object vars list, unless explicitly defined. It is up to the config to ensure that the list of objects to aggregate is compatible (and meaningful).

There is one additional aggregate operation supported by aggregate objects which is currently not supported by the UI - the RPN:expression op. The syntax for this is like “op: RPN:$ds0,$ds1,+”. That would apply the rpn operation on each of the vars of the list of objects where ds0 would map to the first object value, ds1 to the second one and so on.

As of v1.4+ SMG also supports supplying filter property instead of ids list (technically - it is possible to use and combine both). The filter property value is a map representing a standard (regex-based) SMG Filter which in turn is applied at config generation time to the entire list of RRD objects defined in the config. Note that this filter can only work with local objects.

View (Graph) objects

As mentioned in the concepts overview every RRD object is implicitly also a View object. Additional View objects can be defined in the configuration by referencing existing RRD objects too. These have two main purposes:

    - host.localhost.sysload.5m:                             # View object id
      title: "Localhost sysload - 5min only"                 # optional title - object id will be used if no title is present
      ref: host.localhost.sysload                            # object id of an already defined rrd object
      stack: false                                           # optional - stack graph lines if true, default - false
      gv:                                                    # a list of integers ("graph vars")
        - 1                                                  # index in ref object vars list

Note that a special use case of a view object can be to simply define an alias (another object id) for an existing object, possibly to simplify filtering and grouping of somehow related objects with unrelated IDs and labels.

    - host.va1.varnishstat.hitperc:
      title: "va1 - varnishstat: cache_hit/client_req %"
      ref: host.va.varnishstat.client_req_hitttl
      cdef_vars:
        - label: "hitperc"
          mu: "%"
          cdef: "$ds1,100,*,$ds0,/"

One should not set both gv and cdef_vars on the same View object. If one does that the gv values will be ignored. Note that technically it is possible to do what gv does using just cdef_vars except that gv provides a simpler way to just select a subset or reorder graph lines.

View objects support the following properties:

    ...
    gv:
      - 2
      - 0

      ...
      cdef_vars:
          ...
          cdef: "$ds1,100,*,$ds0,/"

The cdef expression can be translated as (form right to left): Divide the value of ( the product of the 2nd (ds1) var with 100 ) by the value of the 1st (ds0) var. Or ds1 * 100 / ds0 in more conventional notation. In our case the ds0 represents “requests/sec” and ds1 represents “cache hits/sec”. So that expression is calculating the cache hit % (from all requests). Rddtool has great documentation on RPN expressions and I strongly recommend reading that for anyone who wants to write cdef expressions.

Interval definitions

The interval_def object defines the behavior of the thread pool associated with given poll interval. It has a mandatory interval property specifying the interval (in seconds) for which this applies and optional threads property specifying max number of threads (default is 4 and if set to 0 it will be dynamically set to the number of cpu cores available) and and a pool property specifying the type of thread pool - one of FIXED (default) and WORK_STEALING. Examples:

    - type: interval_def
      interval: 60
      threads: 20
      pool: FIXED

    - type: interval_def
      interval: 200
      threads: 0 # will use num cpu cores
      pool: WORK_STEALING

Old (deprecated) syntax:

    - $interval_def:
      interval: 60
      ...

Note that these are optional - smg will assume some sane defaults if they are omitted but depending on the workload chances are that some tuning of the number of threads doing the polling may be needed.

Indexes

In SMG an “index” represents a named “filter” which can be used to identify and display a group of graphs together. Also an index can define a parent and child indexes, so one can define a tree-like navigational structure using indexes. If an index does not have a parent index it is considered a “top-level” index. All top-level indexes are displayed on the SMG main page (by remote), together with their first-level child indexes.

Index objects are defined in the yaml along with the RRD and View objects and their object ids start with the ‘^’ special symbol.

Here is an example Index definition:

    - id: ^hosts.localhost       # index id, must start with a ^ char
      title: "localhost graphs"  # optional - id (sans the ^ char) will be used if not specified
      cols: 6                    # optional (default - the value of $dash-default-cols) how many coulmns of graph images to display
      rows: 10                   # optional (default - the value of $dash-default-rows) how many rows (max) to display. excess rows are paginated
      px: "host.localhost."      # optional filter: rrd object id prefix to match. Ignored if null (which is the default).
      sx: ...                    # optional filter: rrd object id suffix to match. Ignored if null (which is the default).
      rx: ...                    # optional filter: regex to match against rrd object id. Ignored if null (which is the default).
      rxx: ...                   # optional filter: exclude objects which ids match the supplied regex. Ignored if null (which is the default).
      trx: ...                   # optional filter: regex to match against object text representation (including title, var names etc). Ignored if null (which is the default).
      lbls: "foo=bar ..."        $ optional filter: labels filter expression, see below for examples
      agg_op: ...                # optional "aggregate op"
      gb: ...                    # optional "group by" value
      gbp: ...                   # optional "group by param" value
      parent: some.index.id      # optinal parent index id. Indexes without parent are considered top-level
      children:                  # optional list of child index ids to display under this index
        - example.index.id1
        - example.index.id2
        - ...
      alerts:                    # Alerts (and notfications) definitions for objects matching the index filter
        - label: ...             # Individual time series within objects are matched using the "label" property
          alert-crit-gt: ...     # See below for more details
          notify-crit: ...
        - ..
      remote: "*"                # optional - index remote, default is null, use "*" to specify index matching all remotes.
                                 # Individual remote instances can be specified by using their comma-separated ids.

Index objects support the following properties:

The remaining index properties represent a Filter with Graph Options. These deserve their own subsections:

Filter

A filter (and its graph options) is configured as part of an index definition using the following properties (along with the rest index properties):

Graph options

The remaining properties represent “Graph Options” - generally specifying some non-default options to display graphs. These are rarely specified in index definitions but more often - come from UI requests.

Hidden Indexes

Only difference in defining a hidden index from a regular one is that it has a ‘~’ character in the beginning of the index object id (instead of ‘^’) and the only difference in behavior is that the hidden one is not displayed on the index page(s).

Currently hidden indexes are only useful in the context of monitoring - to define a group of objects for which in turn to define alert thresholds, but not necessarily clutter the main page with all of the groups.

Notification commands

The notify-command object defines a named command object to be executed for delivery of alert notifications (can have many of those). The command id can then be referenced as “recipient” in notify-{crit/warn/spike} object/index or global definitions. Example notify-command definitions, together with globals referencing them:

- type: notify-command
  id: mail-people
  command: "smgscripts/notif-mail.sh 'asen@smule.com somebodyelse@smule.com' "

- type: notify-command
  id: pagerduty
  command: "smgscripts/notif-pagerduty.sh"

- $notify-crit: mail-people,pagerduty
- $notify-warn: mail-people

The actual command gets executed with the following variables set by SMG in the child process environment:

Check smgscripts/notif-mail.sh for an example of how this could work

Monitoring configuration

Every object variable (mapping to a graph line) can have zero or more alert configurations applied. Each alert config is a key -> value pair where the key is a special keyword recognized by SMG and the value usually represents some alert threshold. Currently the following alert keywords are defined:

    alert-warn: NUM      # same as alert-warn-gte
    alert-warn-gte: NUM  # warning alert if value greater than or equal to NUM
    alert-warn-gt: NUM   # warning alert if value greater than NUM
    alert-warn-lte: NUM  # warning alert if value is less than or equal to NUM
    alert-warn-lt: NUM   # warning alert if value is less than NUM
    alert-warn-eq: NUM   # warning alert if value is equal to NUM
    alert-warn-neq: NUM  # warning alert if value is equal to NUM
    alert-crit: NUM      # same as alert-crit-gte
    alert-crit-gte: NUM  # critical alert if value greater than or equal to NUM
    alert-crit-gt: NUM   # critical alert if value greater than NUM
    alert-crit-lte: NUM  # critical alert if value is less than or equal to NUM
    alert-crit-lt: NUM   # critical alert if value is less than NUM
    alert-crit-eq: NUM   # critical alert if value is equal to NUM
    alert-crit-neq: NUM   # critical alert if value is equal to NUM
    alert-p-_pluginId_-_checkId_: # configure a plugin-implemented check for the value

The built-in “mon” plugin implements the following three checks

  alert-p-mon-anom: "" # anomaly alert - detecting unusual spikes or drops
                       #   it accepts a string with 3 values separated by ":"
                       #   the default value (when empty string is provided) is
                       #   "1.5:30m:30h" which means 1.5 (relative) change
                       #   in the last 30 minutes period compared to the previous
                       #   30h period.

  alert-p-mon-pop: ... # period over period value check - detecting if the current
                       #   value has changed with certain thresholds over the same value
                       #   some period ago. It accepts a string with 4 values separated
                       #   by ":".
                       #   The first value is a period and an optional resolution
                       #   separated by "-". E.g. "24h-1M" means compare with value 24 hrs
                       #   ago over a 1 minute average. If the -1M part is omitted the
                       #   object interval will be used as resolution (that would be the
                       #   highest available resolution in the RRD file).
                       #   The second value is the comparison operator - one of lt(e),
                       #   gt(e) or eq.
                       #   The third and fourth values are the warning and critical
                       #   thresholds of change.
                       #   E.g. to define an warning alert if some value drops below 0.7
                       #   from yesterday and a critical alert if the value drops below 0.5
                       #   from yesterday, at a 5min-average resolution, one can use the
                       #   following config string: "24h-5M:lt:0.7:0.5"
                       #   Both warning and critical thresholds are optional, e.g. use
                       #   something like "24h:lt:0.7" to set only a warning threshold and
                       #   something like "24h:lt::0.5" to set only critical threshold.

  alert-p-mon-ex: ...  # "Extended" check, supporting some special use cases, mainly related
                       #   to using different data resoulution than the update inteval (e.g.
                       #   to check the hourly average of given value despite the value being
                       #   updated every minute. Format is
                       #   "_step_:_op_-_warn_thresh_:_op_-_crit_thresh_[:HH_MM-HH_MM[*_day_],...]"
                       #   step is the time resolution at which we want the current value fetched
                       #   op is one of gt, gte, eq, lte, lt
                       #   warn_thresh and crit_thresh are the respective warning and critical
                       #   threshold numbers.
                       #   The final portion is optional and is a comma separated list of time
                       #   period specifications. Time period is specified by setting time of day
                       #   and/or day of week (first 3 letters from English weekdays) or month (number),
                       #   separated via *. The time of day is defined as a start and end (separated
                       #   by -) hour and minute (separated by _). The check can only trigger alerts
                       #   when the current time is within the time of day (if specified) and day of
                       #   week/month (if specified)

Check here for more details on how alert-p-mon-anom/anomaly detection works.

In addition to alert-* properties one can also define the following notify- settings, specifying a list of “recipients” ($notify-command ids) to be exectuted on “hard” errors (and recoveries) at the appropriate severity level (crit/warn/spike), the special “notify-disable” flag explicitly disabling notifications for the applicable object var or the notify-backoff value specifying at what interval non-recovered object alert notifications should be re-sent. The notify-strikes value determines how many consecutive error states to be considered a hard error and in turn - trigger alert notifications.

notify-crit: notify-cmd-id-1,notify-cmd-id-2,...
# (notify-fail is not relevant for var states)
notify-fail: notify-cmd-id-1,notify-cmd-id-2,...
notify-warn: notify-cmd-id-1,notify-cmd-id-2,...
notify-anom: notify-cmd-id-1,notify-cmd-id-2,...
notify-disable: true
notify-backoff: 6h
notify-strikes: 3

In order to disable fetch error notifications for given object one must set “notify-disable: true” at the object level. Pre-fetch commands support notify-disable too.

When multiple conflicting notify-strikes values apply, SMG will use the minimal from these. When multiple conflicting notify-backoff values apply, SMG will use the maximal from these. Any applicable notify-disable set to true will result in disabled notifications.

Currently there are two ways to apply alert/notify configs to any given object variable:

  ...
  vars:
    - label: sl1min
      ...
      alert-warn-gt: 10
      notify-disable: true
    - label: sl5min
      ...
      alert-warn-gt: 8
      notify-warn: mail-on-warn
      notify-backoff: 1h
  ...
  notify-fail: ...
  notify-strikes: 5
  alerts:
   - label: sl1min   # any objects matching the index filter but also variables having the sl1min label
     alert-warn-gt: 3
     notify-warn: mail-on-warn
     alert-crit-gt: 8
     notify-crit: mail-on-crit
   - label: 1         # when number - it will be used as a 0-based index in the variables array
     alert-warn-gt: 3
     alert-crit-gt: 8
   - label: iowait
     alert-p-mon-anom: ""

The alerts property is an array of yaml objects each specifying a “label” and one or more alert thresholds or notify- properties. If the label is an integer number it will be interpreted as an index in the list of variables of the objects matching the index filter. For example one can define default anomaly (spike/drop) detection on all objects using the following config (just add more alerts defs if there are objects with more than 8 vars):

- ~all.alert.spikes:
  # empty filter means match all
  alerts:
    - label: 0
      alert-p-mon-anom: ""
    - label: 1
      alert-p-mon-anom: ""
    - label: 2
      alert-p-mon-anom: ""
    - label: 3
      alert-p-mon-anom: ""
    - label: 4
      alert-p-mon-anom: ""
    - label: 5
      alert-p-mon-anom: ""
    - label: 6
      alert-p-mon-anom: ""
    - label: 7
      alert-p-mon-anom: ""

Remote SMG instances

A $remote defines an unique remote id and an url at which the remote SMG instance is accessible. Here is an example remote definition:

- type: remote
  id: another-dc
  url: "http://smg.dc2.company.com:9080"
# slave_id: dc1
# graph_timeout_ms: 30000
# config_fetch_timeout_ms: 300000

If the optional slave_id parameter is provided it indicates that this instance is a “worker” in the context of that remote. Its value must be the id under this instance is configured on the “master”. A slave instance will not load and display the relevant remote instance config and graphs but will only notify it on its own config changes.

One can run a setup where the “main” instance (can be two of them, for redundancy) has multiple remotes configured where the remote instances only have the “main” one as configured (for them) remote (with slave_id set). With such setup one only needs a single “beefy” (more mem) “main” instance which will hold all available across the remotes objects and the other ones will only keep theirs.

The graph_timeout_ms and config_fetch_timeout_ms values allow one to override the global timeouts for garph/monitor state API calls and config fetch (which can be much slower)

Custom RRA definitions

Whenever rrdtool creates a new rrd file it must get a set of definitions for Round Robin Archives ( RRAs, explained better here ). A rra_def has an id and a list of RRA definitions under the rra key. The actual RRA definitions are strings and defined using rrdtool syntax. Here is how the default SMG RRA for 1 minute interval would look like if defined as $rra_def:

- type: rra_def
  id: smg_1m
  rra:
    - "RRA:AVERAGE:0.5:1:5760"
    - "RRA:AVERAGE:0.5:5:1152"
    - "RRA:AVERAGE:0.5:30:1344"
    - "RRA:AVERAGE:0.5:120:1440"
    - "RRA:AVERAGE:0.5:360:5840"
    - "RRA:AVERAGE:0.5:1440:1590"
    - "RRA:MAX:0.5:1:5760"
    - "RRA:MAX:0.5:5:1152"
    - "RRA:MAX:0.5:30:1344"
    - "RRA:MAX:0.5:120:1440"
    - "RRA:MAX:0.5:360:5840"
    - "RRA:MAX:0.5:1440:1590"

Note that normally one does not need to define or use any $rra_def objects, the defaults which SMG will pick would work just fine for most of the cases (these have been inspired by mrtg/cacti). Still there are some use cases where one wants to use different RRAs - e.g. keep some important graphs at higher resolutions for longer period (the draw-back being a bigger RRD file size).

Globals

Global for SMG variables are defined as a name -> value pairs where the name is prefixed with ‘$’ sign. Here is the list of supported global variables, together with their default values (all of these are optional):

Note that in production setups where there is already a reverse proxy serving the static images directly it is recommended to keep $proxy-disable to “false” (same as omitting it) and then to intercept the /proxy/<remote-id> URLs at the reverse proxy and proxy these directly to the respective SMG instances (at their root URL) instead of hitting the local SMG one for proxying. Here is how an example reverse proxy configuration for apache could look like for a remote named “some-dc” where SMG is running on smg1.some-dc.myorg.com and listening on port 9080 (possibly another reverse proxy there):

        ProxyPass        /proxy/some-dc  http://smg1.some-dc.myorg.com:9080/
        ProxyPassReverse /proxy/some-dc  http://smg1.some-dc.myorg.com:9080/

Check the Running and troubleshooting section for more details on reverse proxy setup in production.

Authentication configuration

TODO

- $auth-anonymous-root: false
- $auth-anonymous-admin: false
- $auth-anonymous-viewer: true

- $auth-default-session-ttl: "24h"
- $auth-system-allow-localhost: true
- $auth-system-xff-header: "X-Forwarded-For"
- $auth-system-authorization-header: "Authorization"
- $auth-system-allowed-networks: "192.168.0.0/16 10.0.0.0/8"
- $auth-system-login-url: "/login"
- $auth-system-logout-url: "/logout"

- type: auth-user-password
  handle: asen
  name: "Asen Lazarov"
  password_hash: "plain:asen:1234"
  role: admin

- type: auth-user-password
  handle: test
  # echo -n test:test | shasum -a 256
  password_hash: "sha-256:31f014b53e5861c8b28a8707a1d6a2a2737ce2c22fd671884173498510a063f0"
  role: viewer

- type: auth-user-token
  token: 1d6a2a2737ce2c22fd671884173498510a063f0

The Auth plugin also uses globals to configure itself

- $auth-plugin-role-access: admin
- $auth-plugin-trusted-header-enabled: false
- $auth-plugin-trusted-header-handle: "X-SMG-Auth-handle"
- $auth-plugin-trusted-header-name: "X-SMG-Auth-name"
- $auth-plugin-trusted-header-role: "X-SMG-Auth-role"
- $auth-plugin-trusted-header-default-role: admin

Custom dashboards configuration

Custom dashboards are defined in the yaml configuration using an object of type cdash.

The cdash object is a yaml map which must have an unique id property, an optional title and a list of items of various types. All item types have some common set of properties - an unique id, an optional title, width and height properties. The other mandatory propert is the type which can be one of the following:

Example:

- type: cdash
  id: noc
  title: NOC
  items:
    - id: alerts
      title: Active alerts
      type: MonitorProblems
      width: 350
      height: 600
    - id: alert-log
      title: Alert Log
      type: MonitorLog
      width: 350
      height: 600
      limit: 50
      ms: WARNING
    - id: jmx.premote-graphs
      type: IndexGraphs
      title: jmx.premote Graphs
      width: 700
      height: 600
      ix: jmx.premote
      limit: 2
    - id: group1
      type: Container
      width: 700
      height: 800
      items:
      - id: index.states
        title: Index States
        type: IndexStates
        width: 450
        height: 500
        img_width: 300
        ixes:
          - jmx.premote
          - localhost
      - id: calc-netext
        type: Plugin
        width: 700
        height: 300
        plugin_id: calc
        ix: some.id
    - id: google
      title: External web page
      type: External
      width: 800
      height: 620
      fheight: 600
      url: https://google.com/

Bundled Plugins configuration

Calc Plugin

     calc:
       expressions:
         - expr_id:
            title: "Test expression"
            expr: "localhost.sysload[0] + localhost.dummy[1]"
            period: 24h
            step: 60
            maxy: 1000
            dpp: off
            d95p: off

Common Commands Plugin

JMX Plugin

Scrape Plugin

Note: The automatic config generation from scrape plugin is now deprecated. AutoConf and its openmetrics template essentially replace that in a more consistent manner.

Autoconf Plugin

Kube Plugin

Mon Plugin

InfluxDb plugin

This is able to forward all updates to an InfluxDb URL. Somehwat underdeveloped.