If you look at the Nagios interface for your server and select Status Maps you will see your group servers and devices centered around your Nagios instance. In order for Nagios to work efficiently you need to include parent relationships for each device defined.
Go to http://vmX-gY.lab.workalaya.net/nagios3 and click on the "Map" link.
Now we will add parent relationships for router, switch and server.
Adding Parents to templates/nagios/routers.cfg as
(venv) vmX-gY@ansible-host:~/ansible-playbook$ vi templates/nagios/routers.cfg
define hostgroup {
hostgroup_name routers
alias Router Group
}
define host {
use generic-host
host_name gw-rtr
alias LAB Transit Provider Router
address gw-rtr.lab.workalaya.net
hostgroups routers,ssh-servers
}
define host {
use generic-host
host_name rtr1-g{{class_group}}
alias Group {{class_group}} Router
address rtr1-g{{class_group}}.lab.workalaya.net
hostgroups routers,ssh-servers
parents gw-rtr
}
and Adding Parents to templates/nagios/vms.cfg as
(venv) vmX-gY@ansible-host:~/ansible-playbook$ vi templates/nagios/vms.cfg
define hostgroup {
hostgroup_name vms
alias VM Group
}
define host {
use generic-host
host_name srv1-g{{class_group}}
alias Server, Group {{class_group}}
address srv1-g{{class_group}}.lab.workalaya.net
hostgroups vms,ssh-servers,http-servers,ubuntu-servers
parents rtr1-g{{class_group}}
}
{% for i in range(1,4) %}
define host {
use generic-host
host_name vm{{i}}-g{{class_group}}
alias VM {{i}}, Group {{class_group}}
address vm{{i}}-g{{class_group}}.lab.workalaya.net
hostgroups vms,ssh-servers,http-servers,ubuntu-servers
parents rtr1-g{{class_group}}
}
{% endfor %}
and push config to nagios host
(venv) vmX-gY@ansible-host:~/ansible-playbook$ ansible-playbook nagios.yml -t update_config
PLAY [nagios_hosts] ************************************************************
TASK [Gathering Facts] *********************************************************
ok: [vmX-gY.lab.workalaya.com]
TASK [Generate the nagios monitoring templates] ********************************
changed: [vmX-gY.lab.workalaya.com] => (item=routers.cfg)
changed: [vmX-gY.lab.workalaya.com] => (item=vms.cfg)
RUNNING HANDLER [verify config] ************************************************
changed: [vmX-gY.lab.workalaya.com]
RUNNING HANDLER [restart nagios3] **********************************************
changed: [vmX-gY.lab.workalaya.com]
PLAY RECAP *********************************************************************
vmX-gY.lab.workalaya.com : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
If you would like to use appropriate icons for your defined hosts in Nagios this is where you do this. We have the two types of devices:
There is a fairly large repository of icon images available for you to use located here:
/usr/share/nagios/htdocs/images/logos/
these were installed by default as dependent packages of the nagios3 package in Ubuntu. In some cases you can find model-specific icons for your hardware, but to make things simpler we will use the following icons for our hardware:
/usr/share/nagios/htodcs/images/logos/base/debian.*
/usr/share/nagios/htdocs/images/logos/cook/router.*
/usr/share/nagios/htdocs/images/logos/cook/switch.*
The next step is to edit the file templates/nagios/routers.cfg and tell nagios what image you would like to use to represent your devices.
(venv) vmX-gY@ansible-host:~/ansible-playbook$ vi templates/nagios/routers.cfg
Here is what an entry for your routers looks like (there is already an entry for debian-servers that will work as is). Note that the router model (3600) is not all that important. The image used represents a router in general.
define hostgroup {
hostgroup_name routers
alias Router Group
}
define hostextinfo {
hostgroup_name routers
icon_image cook/router.png
icon_image_alt Cisco Routers (7200)
vrml_image router.png
statusmap_image cook/router.gd2
}
define host {
use generic-host
host_name gw-rtr
alias LAB Transit Provider Router
address gw-rtr.lab.workalaya.net
hostgroups routers,ssh-servers
}
define host {
use generic-host
host_name rtr1-g{{class_group}}
alias Group {{class_group}} Router
address rtr1-g{{class_group}}.lab.workalaya.net
hostgroups routers,ssh-servers
parents gw-rtr
}
and push config to nagios host
(venv) vmX-gY@ansible-host:~/ansible-playbook$ ansible-playbook nagios.yml -t update_config
PLAY [nagios_hosts] ************************************************************
TASK [Gathering Facts] *********************************************************
ok: [vmX-gY.lab.workalaya.com]
TASK [Generate the nagios monitoring templates] ********************************
changed: [vmX-gY.lab.workalaya.com] => (item=routers.cfg)
ok: [vmX-gY.lab.workalaya.com] => (item=vms.cfg)
RUNNING HANDLER [verify config] ************************************************
changed: [vmX-gY.lab.workalaya.com]
RUNNING HANDLER [restart nagios3] **********************************************
changed: [vmX-gY.lab.workalaya.com]
PLAY RECAP *********************************************************************
vmX-gY.lab.workalaya.com : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Take a look at the Status Map in the web interface (Map link on the left). It should be much nicer, with real icons instead of question marks for most items.
The idea is to create service groups for your 4 group servers. Servicegroups consider the service defined by the combined services to be down if any of the services in a group are down.
In this case we'll group together ssh and http. In real life you might do msyql, imap, smtp, http and your mta (postfix, mail, exim) if those were services required to deliver a mail interface to your users.
We start by editing the file:
(venv) vmX-gY@ansible-host:~/ansible-playbook$ vi templates/nagios/servicegroups.cfg
For group 1 this service group would look like:
define servicegroup {
servicegroup_name group{{class_group}}-ssh-http
alias Group {{class_group}} SSH and Web
members vm1-g{{class_group}},SSH,vm1-g{{class_group}},HTTP,vm2-g{{class_group}},SSH,vm2-g{{class_group}},HTTP, \
vm3-g{{class_group}},SSH,vm3-g{{class_group}},HTTP,srv1-g{{class_group}},SSH,srv1-g{{class_group}},HTTP
}
We used "\" to indicate a new line. Without this you will see errors.
Note that "SSH" and "HTTP" need to be uppercase as this is how the service_description is written in the file /etc/nagios3/conf.d/services_nagios2.cfg
update ansible playbook named nagios.yml to include servicegroups.cfg as
(venv) vmX-gY@ansible-host:~/ansible-playbook$ vi nagios.yml
- hosts: nagios_hosts
become: true
tasks:
- name: ensure package cache is up to date
apt: update_cache=yes cache_valid_time=3600
tags: install
- name: install Nagios Version 3
package:
name: "{{ item }}"
state: present
with_items:
- nagios3
- nagios3-doc
tags: install
- name: Check nagios Users
stat:
path: /etc/nagios3/htpasswd.users
ignore_errors: true
register: nagios_user_pwfile_exists
tags: configure
- name: Create empty password file
command: touch /etc/nagios3/htpasswd.users
args:
creates: /etc/nagios3/htpasswd.users
when: not nagios_user_pwfile_exists
tags: configure
- name: Create nagios admin user
htpasswd:
path: /etc/nagios3/htpasswd.users
name: nagiosadmin
password: "{{ class_password }}"
state: present
ignore_errors: true
tags: configure
- name: Generate the nagios monitoring templates
template:
src: ./templates/nagios/{{ item }}
dest: /etc/nagios3/conf.d
backup: yes
with_items:
- routers.cfg
- vms.cfg
- servicegroups.cfg
tags: update_config
notify: verify config
handlers:
- name: verify config
shell: nagios3 -v /etc/nagios3/nagios.cfg
notify: restart nagios3
- name: restart nagios3
service: name=nagios3 state=restarted
Save your changes, verify your work and push changes to Nagios host using ansible.
(venv) vmX-gY@ansible-host:~/ansible-playbook$ ansible-playbook nagios.yml -t update_config
PLAY [nagios_hosts] ************************************************************
TASK [Gathering Facts] *********************************************************
ok: [vmX-gY.lab.workalaya.com]
TASK [Generate the nagios monitoring templates] ********************************
ok: [vmX-gY.lab.workalaya.com] => (item=routers.cfg)
ok: [vmX-gY.lab.workalaya.com] => (item=vms.cfg)
changed: [vmX-gY.lab.workalaya.com] => (item=servicegroups.cfg)
RUNNING HANDLER [verify config] ************************************************
changed: [vmX-gY.lab.workalaya.com]
RUNNING HANDLER [restart nagios3] **********************************************
changed: [vmX-gY.lab.workalaya.com]
PLAY RECAP *********************************************************************
vmX-gY.lab.workalaya.com : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Now if you click on the Service Groups menu item in the Nagios web interface you should see this information grouped together.
You will edit the file /etc/nagios3/cgi.cfg to give read-only guest user access to the Nagios web interface.
By default Nagios is configured to give full r/w access via the Nagios web interface to the user nagiosadmin. You can change the name of this user, add other users, change how you authenticate users, what users have access to what resources and more via the cgi.cfg file.
First, update your nagios.yml ansible playbook file to create a "guest" user and password in the htpasswd.users file.
You can use any password you want (or none). A password of "guest" is not a bad choice if you plan for this to be a r/o account.
Next, update your nagios.yml ansible playbook file to update the file "/etc/nagios3/cgi.cfg" and tell Nagios to allow the "guest" user some access to information via the web interface.
Edit nagios.yml ansible playbook file
(venv) vmX-gY@ansible-host:~/ansible-playbook$ vi nagios.yml
Content to nagios.yml ansible playbook file should look like
- hosts: nagios_hosts
become: true
tasks:
- name: ensure package cache is up to date
apt: update_cache=yes cache_valid_time=3600
tags: install
- name: install Nagios Version 3
package:
name: "{{ item }}"
state: present
with_items:
- nagios3
- nagios3-doc
tags: install
- name: Check nagios Users
stat:
path: /etc/nagios3/htpasswd.users
ignore_errors: true
register: nagios_user_pwfile_exists
tags: configure
- name: Create empty password file
command: touch /etc/nagios3/htpasswd.users
args:
creates: /etc/nagios3/htpasswd.users
when: not nagios_user_pwfile_exists
tags: configure
- name: Create nagios admin user
htpasswd:
path: /etc/nagios3/htpasswd.users
name: nagiosadmin
password: "{{ class_password }}"
state: present
ignore_errors: true
tags: configure
- name: Create nagios guest user
htpasswd:
path: /etc/nagios3/htpasswd.users
name: "{{ item.username }}"
password: "{{ item.password }}"
state: present
ignore_errors: true
with_items:
- { username: 'guest', password: 'guest' }
tags: add_guest
- name: Configure nagios.cgi to allow guest access
lineinfile:
dest: "/etc/nagios3/cgi.cfg"
regexp: "^{{ item.property | regex_escape() }}="
line: "{{ item.property }}={{ item.value }}"
with_items:
- { property: 'authorized_for_system_information', value: 'nagiosadmin,guest' }
- { property: 'authorized_for_configuration_information', value: 'nagiosadmin,guest' }
- { property: 'authorized_for_all_services', value: 'nagiosadmin,guest' }
- { property: 'authorized_for_all_hosts', value: 'nagiosadmin,guest' }
tags: add_guest
notify: verify config
- name: Generate the nagios monitoring templates
template:
src: ./templates/nagios/{{ item }}
dest: /etc/nagios3/conf.d
backup: yes
with_items:
- routers.cfg
- vms.cfg
- servicegroups.cfg
tags: update_config
notify: verify config
handlers:
- name: verify config
shell: nagios3 -v /etc/nagios3/nagios.cfg
notify: restart nagios3
- name: restart nagios3
service: name=nagios3 state=restarted
Save your changes, verify your work and push changes to Nagios host using ansible.
(venv) vmX-gY@ansible-host:~/ansible-playbook$ ansible-playbook nagios.yml -t add_guest
PLAY [nagios_hosts] ************************************************************
TASK [Gathering Facts] *********************************************************
ok: [vmX-gY.lab.workalaya.com]
TASK [Create nagios guest user] ************************************************
changed: [vmX-gY.lab.workalaya.com] => (item={'username': 'guest', 'password': 'guest'})
TASK [Configure nagios.cgi to allow guest access] **********************************
changed: [vmX-gY.lab.workalaya.com] => (item={'property': 'authorized_for_system_information', 'value': 'nagiosadmin,guest'})
changed: [vmX-gY.lab.workalaya.com] => (item={'property': 'authorized_for_configuration_information', 'value': 'nagiosadmin,guest'})
changed: [vmX-gY.lab.workalaya.com] => (item={'property': 'authorized_for_all_services', 'value': 'nagiosadmin,guest'})
changed: [vmX-gY.lab.workalaya.com] => (item={'property': 'authorized_for_all_hosts', 'value': 'nagiosadmin,guest'})
RUNNING HANDLER [verify config] ************************************************
changed: [vmX-gY.lab.workalaya.com]
RUNNING HANDLER [restart nagios3] **********************************************
changed: [vmX-gY.lab.workalaya.com]
PLAY RECAP *********************************************************************
vmX-gY.lab.workalaya.com : ok=5 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
To see if you can log in as the "guest" user you will need to clear the cookies in your web browser or open an alternate web browser if you have one. You will not notice any difference in the web interface. The difference is that a number of items that are available via the web interface (forcing a service/host check, scheduling checks, comments, etc.) will not work for the guest user.
This change is required in order to allow users to "Acknowledge" problems with hosts and services in the Web interface. The default file permissions are set up in a secure way to prevent the web interface from updating nagios, so you need to make them slightly more permissive.
Next, update your nagios.yml ansible playbook file to update the file "/etc/nagios3/nagios.cfg" and to change directory permissions and to make the changes permanent.
Edit nagios.yml ansible playbook file
(venv) vmX-gY@ansible-host:~/ansible-playbook$ vi nagios.yml
Content to nagios.yml ansible playbook file should look like
- hosts: nagios_hosts
become: true
tasks:
- name: ensure package cache is up to date
apt: update_cache=yes cache_valid_time=3600
tags: install
- name: install Nagios Version 3
package:
name: "{{ item }}"
state: present
with_items:
- nagios3
- nagios3-doc
tags: install
- name: Check nagios Users
stat:
path: /etc/nagios3/htpasswd.users
ignore_errors: true
register: nagios_user_pwfile_exists
tags: configure
- name: Create empty password file
command: touch /etc/nagios3/htpasswd.users
args:
creates: /etc/nagios3/htpasswd.users
when: not nagios_user_pwfile_exists
tags: configure
- name: Create nagios admin user
htpasswd:
path: /etc/nagios3/htpasswd.users
name: nagiosadmin
password: "{{ class_password }}"
state: present
ignore_errors: true
tags: configure
- name: Create nagios guest user
htpasswd:
path: /etc/nagios3/htpasswd.users
name: "{{ item.username }}"
password: "{{ item.password }}"
state: present
ignore_errors: true
with_items:
- { username: 'guest', password: 'guest' }
tags: add_guest
- name: Configure nagios.cgi to allow guest access
lineinfile:
dest: "/etc/nagios3/cgi.cfg"
regexp: "^{{ item.property | regex_escape() }}="
line: "{{ item.property }}={{ item.value }}"
with_items:
- { property: 'authorized_for_system_information', value: 'nagiosadmin,guest' }
- { property: 'authorized_for_configuration_information', value: 'nagiosadmin,guest' }
- { property: 'authorized_for_all_services', value: 'nagiosadmin,guest' }
- { property: 'authorized_for_all_hosts', value: 'nagiosadmin,guest' }
tags: add_guest
notify: verify config
- name: Update nagios.cfg to Enable External commands
lineinfile:
dest: "/etc/nagios3/nagios.cfg"
regexp: "^{{ item.property | regex_escape() }}="
line: "{{ item.property }}={{ item.value }}"
with_items:
- { property: 'check_external_commands', value: '1' }
register: update_directory_permission
tags: external_command
notify: verify config
- name: change directory permissions
shell: "dpkg-statoverride --update --add {{ item.user }} {{ item.group }} {{ item.permission }} {{ item.dir }}"
with_items:
- { user: 'nagios', group: 'www-data', permission: '2710', dir: '/var/lib/nagios3/rw' }
- { user: 'nagios', group: 'nagios', permission: '751', dir: '/var/lib/nagios3' }
when: update_directory_permission.changed
tags: external_command
notify: restart nagios3
- name: Generate the nagios monitoring templates
template:
src: ./templates/nagios/{{ item }}
dest: /etc/nagios3/conf.d
backup: yes
with_items:
- routers.cfg
- vms.cfg
- servicegroups.cfg
tags: update_config
notify: verify config
handlers:
- name: verify config
shell: nagios3 -v /etc/nagios3/nagios.cfg
notify: restart nagios3
- name: restart nagios3
service: name=nagios3 state=restarted
Save your changes, verify your work and push changes to Nagios host using ansible.
(venv) vmX-gY@ansible-host:~/ansible-playbook$ ansible-playbook nagios.yml -t external_command
PLAY [nagios_hosts] ************************************************************
TASK [Gathering Facts] *********************************************************
ok: [vmX-gY.lab.workalaya.com]
TASK [Update nagios.cfg to Enable External commands] ***************************
changed: [vmX-gY.lab.workalaya.com] => (item={'property': 'check_external_commands', 'value': '1'})
TASK [change directory permissions] ********************************************
changed: [vmX-gY.lab.workalaya.com] => (item={'user': 'nagios', 'group': 'www-data', 'permission': '2710', 'dir': '/var/lib/nagios3/rw'})
changed: [vmX-gY.lab.workalaya.com] => (item={'user': 'nagios', 'group': 'nagios', 'permission': '751', 'dir': '/var/lib/nagios3'})
RUNNING HANDLER [verify config] ************************************************
changed: [vmX-gY.lab.workalaya.com]
RUNNING HANDLER [restart nagios3] **********************************************
changed: [vmX-gY.lab.workalaya.com]
PLAY RECAP *********************************************************************
vmX-gY.lab.workalaya.com : ok=5 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Once this is done, go to 'Problems' > 'Services (Unhandled)' and find a service in the red (critical) or yellow (warning) state. Click on the service name. Then under "Service commands" click on "Acknowledge this service problem".
The problem should disappear from the list of unhandled problems.