Wednesday, September 14, 2016

OK, Now About UPS's and the Raspberry Pi

I have had a UPS for my NAS for a while now, but this season's storms caused me to seriously think about getting another one for my set of Raspberry Pi's. Every time the wind got over 20 mph, the power would go out, thunderstorms would kill the power, and the occasional dust devil in the right place would cause those short power failures that just mess things up. A couple of the power failures were those messy ones where the power dropped to almost nothing, then came back, went away again, and finally just died for a couple of hours.

I wasn't concerned about spikes; I have a bunch of protectors around for that, but it gets annoying fixing the various software problems power failures make on the Pi. So, I got a second UPS to install where I have my Pi collection. I bought another APC BE550G:

This is big enough to handle the short outages, and has enough guts to run the various things I may hook up. I chose this one because it's the same as the one on my NAS that runs it and several other little computers up in the attic. Why have two different devices?

Yes, it's an expensive solution, but after I buy a power supply board and the circuitry necessary for battery backup on the Pi, I would have spent just as much and still need more for the other little computers. I already know how to make one of these, so it would be just grunt work and not much fun.

The NAS up in the attic has really cool power fail software, so I went looking for something similar to run on the Pi and found a tool called apcusbd. This is a daemon that runs all the time monitoring the USP port on the UPS. When it finds a problem it can send email letting me know as well as messages to anyone logged into the machine. It will also send a power down to the UPS shutting off the power when a timer runs out.

This power down is important. When you halt a Pi, it just halts, there's no power down, and that means you have to unplug, plug to get it to come back online. So, for unattended operation, you have to somehow cycle the power to get the Pi back to working. I have to halt it because, if I don't, it will screw up the file system, or even worse, corrupt the SD card so I have to get it from backup. Been there, done that, too many times.

If I could get this all working correctly, it would go a long ways to solving some of my SD card problems over long periods of time. I dug in and installed, configured, and tested apcusbd. It worked really well on the first try after I configured it. There are literally thousands of sites out there that tell you how to install and configure this software, so I won't repeat it here, but there was one problem: it wouldn't turn the power off on the UPS.

Digging into the configuration did nothing for me; it looked like I got it right, so I started testing pieces of it. Apcupsd allows the use of scripts that I could modify a bit and test the power down functions. When I tried them, they worked and actually turned the power off. This left me confused; if they worked, why didn't the power turn off? I looked through the syslog and found nothing to help in troubleshooting this problem, so I dug into the shutdown procedure of the Pi running under Jessie. Man, what a complex mess that is.

See, systemctl sets run level zero which causes the scripts in /etc/rc0.d to be executed in order and the last one is 'halt'. This seems reasonable, but the shutoff code for the UPS is in the halt script. everything else is already done ... including syslogd which is the thing that logs all the data for the shutdown. No, I couldn't change it to write to a file since the file systems are all remouted read only by that time. There was no way to tell what the heck wasn't happening.

I never managed to solve that problem, so I decided to send the turn off signal to the UPS earlier in the halt steps since the UPS gives me a grace period of over a minute of power after the signal; plenty of time to finish the halt process. To this end, I created a little script to add to the shutdown process that would sense the power failure from apcupsd and send the signal using the scripts that already come with the software. I couldn't get the script to be run by systemctl.

Yes, I spent hours looking on the web for what was preventing it from being used, and found other folk that had the exact same problem. There were exactly zero actual solutions. Lots of talk about run levels, LSB's, running update-rc.d, keeping certain little things out of the script, but nothing definitive that would work. I kept reducing the stuff in the script until I had almost nothing in there and it still wouldn't get run during the shutdown and halt process. What I finally wound up with before I gave up on this idea was:

#! /bin/sh
# Provides:          davetesting
# Required-Start:
# Required-Stop:
# Default-Start:
# Default-Stop:      0
# Short-Description: Dave Testing
# Description:       Just testing this thing

echo "Dave Testing in aaa"

echo "Dave inside aaa before the file thing"
echo $1
echo "Dave inside aaa got a $1"
I put my name in there so I could find it in the syslog, but it didn't matter. The script simply wasn't getting run. What was most annoying is that all of the similar questions I saw on the web were simply unresolved. The other folk apparently just gave up and did something different. That's the same tactic I took.

I saw that the script to stop the apcupsd daemon was actually running, so I put a little test code in there to see what would happen. The test code was executed and did exactly what it was supposed to do. Here was my hook into getting it working since the 'right way' didn't work and my attempt at a separate script didn't either.

The way apcusb works is to sense a power failure signal from the UPS, then create a flag file to be read later. After the daemon is stopped, it should be called again after the file systems are mounted read only with a parameter telling it to not run as a daemon, but instead, send the power down signal to the UPS. This little action is controlled by the presence of a flag file. I took that code from the init.d script for halt and put it in the script that starts and stops apcupsd. Of course that means the file systems are not read only and leaves the possibility of a problem, but since the UPS gives me over a minute grace, there won't be any problem; the Pi would have long since halted.

Then when the power comes back up, the UPS waits a little bit, restores power to the Pi and then the normal boot process gets going. During the boot process the flag file is removed so everything is back to normal waiting for the next storm.

Here's the init script that I came up with:


# Provides:             apcupsd
# Required-Start:       $remote_fs $syslog
# Required-Stop:        $remote_fs $syslog
# Should-Start:         $local_fs
# Should-Stop:          $local_fs
# Default-Start:        2 3 4 5
# Default-Stop:         0 1 6
# Short-Description:    Starts apcupsd daemon
# Description:          apcupsd provides UPS power management for APC products.

DESC="UPS power management"

test -x $DAEMON || exit 0
test -e $CONFIG || exit 0

set -e


echo "Dave is in ups code messing around"
echo "Dave found $1 "
if [ -f /etc/apcupsd/powerfail ]; then
        echo "Dave Found powerfail"
        echo "Dave No powerfail Found"

if [ "x$ISCONFIGURED" != "xyes" ] ;
        echo "Please check your configuration ISCONFIGURED in /etc/default/apcupsd"
        exit 0

case "$1" in
                echo -n "Starting $DESC: "

                rm -f /etc/apcupsd/powerfail

                if [ "`pidof apcupsd`" = "" ]
                        start-stop-daemon --start --quiet --exec $DAEMON
                        echo "$NAME."
                        echo ""
                        echo "A copy of the daemon is still running.  If you just stopped it,"
                        echo "please wait about 5 seconds for it to shut down."
                        exit 0

                echo -n "Stopping $DESC: "
                start-stop-daemon --stop --oknodo --pidfile /var/run/ || echo "Not Running."
                rm -f /var/run/
                echo "$NAME."
                if [ -f /etc/apcupsd/powerfail ]; then
                        echo "Dave Doing  powerfail"
                        /etc/init.d/ups-monitor poweroff
                sync; sync;

                $0 stop
                sleep 10
                $0 start

                #/sbin/apcaccess status
                $APCACCESS status

                echo "Usage: $N {start|stop|restart|force-reload}" >&2
                exit 1

exit 0
It's called apcupsd (duh) and, after running update-rc.d to create init links shows up in rc0.d as K01apcusbd. You'll have to scroll down a bit to find what I changed. I put my name in there so I could easily find the debug I added when prowling around in syslog. Needless to say, with booting it and killing it a hundred times, syslog got really large, so finding a line in there was a problem. The flag file is called 'powerfail' and if it's there, I cause the UPS to shut down. 

No, it isn't the 'correct' way to do this, but since I just couldn't find enough information to do it 'correctly', ... I hacked it.

I'm going to install apcupsd on my other Pi's because they can run as slaves to the one I already did. Apcupsd has a supper cool feature in that it can talk to other instances of itself on other machines and cause them to halt as well. Since my UPS up in the attic runs for 45 minutes after a power failure, I'll set this machine to run for 40 minutes then tell the other ones to shut down. At 42 minutes, I kill the power to the UPS, and everything should be just fine. 

I'm left with a nagging lack of understanding why the scripts don't work as they come from Raspberry Pi and apcupsd, and why I couldn't get my own version of a init.d script to work, but my particular problem is solved. If I ever figure it out, or someone points it out, I'll update this to reflect what really 'should' be done.

Now to implement the networked shutdown for the other machines and see what doesn't make sense there.

No comments:

Post a Comment