System Monitoring

Lately I’ve been experimenting with various different system monitoring, charting, alarming, etc solutions.

Some of the systems I’ve experimented with include:

  • Graphdat – Great upcoming product that provides an experience like no other.
  • Cacti – My current solution for monitoring internal SNMP systems.
  • Munin – Looked nice, but out-of-the box SNMP wasn’t so successful.
  • Graphene ontop of Graphite – Looks absolutely beautiful, but you essentially have to write (or find) your own data collectors for everything.
  • MRTG – my previous solution, and great for monitoring basic SNMP counters. Could no longer handle the size of my network storage – hence the search for a replacement.
  • New Relic – a little slow, and had some issues with their dashboards appearing / disappearing. No way to customize hostnames currently, which is annoying for anyone with even mild OCD

So far, I must say that the offering from GraphDat is simply amazing. Check it out for yourself below.

Adding SELinux policies for your apps

I recently had all sorts of nightmares trying to configure cacti to talk to haproxy, via an snmp perl-script. It turns out, the problem wasn’t the normal chmod fun, wrong paths, or anything like that. Instead it was a feature of some distros of Linux known as SELinux (Security Enhanced Linux).

The most confusing part, was that all the errors I was getting, were directing me at commands, and configs, that I’d already checked.

e.g.

snmpbulkwalk -c public -v2c 127.0.0.1 1.3.6.1.4.1.29385
# SNMPv2-SMI::enterprises.29385 = No Such Object available on this agent at this OID
 
perl /etc/snmp/haproxy.pl
# Warning: no access control information configured.
# It's unlikely this agent can serve any useful purpose in this state.
# Run "snmpconf -g basic_setup" to help you configure the Haproxy.conf file for this agent.

Eventually, after 8 hours of scouring the interwebs, I sent an email off to the developer asking for help. Of course, I managed to discover the problem less than an hour after sending said email.

Regardless, at the end of the day, the following was the magic bullet I used to fix my problem:

setenforce Permissive
rm /var/log/audit/audit.log
service auditd restart
[yourCommand]
cat /var/log/audit/audit.log | audit2allow -M [filename]
semodule -i [filename].pp
setenforce Enforcing

Enabling Secure Sockets Layer (SSL) on Team Foundation Server 2010

I recently attempted to enable HTTPS on my Team Foundation Server.

Easy right? Just add a HTTPS binding to the tfs website.

Actually, it’s a bit trickier than that, you see, TFS2010 likes to use your local machine name for internal communications by default. In order to get the web UI to play nicely, you have 2 options.

  1. Add a HTTPS binding for the machine name to the TFS Website too (e.g. https://tfsbox)
  2. Add the tfs binding into the web config (e.g. https://tfs.mydomain.com)

After I got #1 working, I pursued the ideal solution, which was option #2.

To do this, follow these steps:

  1. Open “Application Tier/Web Access/Web/web.config” in the TFS directory
  2. Find the <tfsServers> block
  3. Add in a <clear /> tag to prevent any incorrect or old bindings from being included
  4. Add in your server like so¬†<add name=”https://tfs.mydomain.com/tfs” />

At the end, you should end up with something that looks like this:

1
2
3
4
<tfServers>
    <clear />
    <add name="https://tfs.mydomain.com/tfs" />
</tfServers>

Without this, you’ll keep on getting the error “TFS30063: You are not authorized to access ‘tfs.mydomain.com'” and similar errors everywhere from the Web UI, all the way through to the Change URL screen in the Team Foundation Server Administration Console.

IMPORTANT NOTE: You’ll notice that there was already a commented sample server in the web.config. It’s important that you either ignore this line, or correct it, as it will be missing the trailing /tfs that is required to get everything working.