Search Results

Keyword: ‘slackware lilo’

Re-installing lilo from a Slackware boot CD

September 29th, 2009 6 comments

So you broke lilo. Well done.

Insert your Slackware install DVD or CD1 and boot with defaults.
Once booted:

mkdir /foo
mount /dev/sda1 /foo
mount --bind /proc /foo/proc
mount --bind /sys /foo/sys
mount --bind /dev /foo/dev
chroot /foo
vi /etc/lilo.conf
lilo
exit
reboot

where /dev/sda1 is your installed / partition. Adjust as necessary.

Categories: Linux, Slackware Tags: , , ,

Slackware 14.1 :: New initrd + LUKS Keyboard Issues

November 11th, 2013 1 comment

Hey hey. So Slackware(,64}-14.1 is out. Woo!

Ok, celebration over; time to work on the minor issues that arise post-release.

Issue #1: You upgrade your LUKS-encrypted system with a USB keyboard to 14.1 and *BOOM* goes your keyboard input at boot time. But that’s not possible, you were careful(!) You made sure to update your initrd and re-ran lilo. You even double-checked that you were using the right {e,x,u,o}hci-hcd controller module but it still doesn’t work. You scream and pull your hair out because re-plugging the keyboard makes it show in the log buffer so it MUST work, but still nada.

Yep. Been there.

The problem is simply that there’s a new module to worry about: hid_generic. Yes your keyboard is loaded properly but you no longer have enough code to load it as an interface device. So just add hid_generic to your initrd, rerun lilo and relax:

This assumes you have your root and swap partitions as logical volumes inside a single volume group within a LUKS-encrypted partition.

#!/bin/bash

# Set parameters
KERNEL_VERSION=$(uname -r) # e.g. 3.10.17 for Slackware 14.1
ROOT_FS='ext4' # Your root filesystem
LVM_PART='/dev/sda2' # The partition containing your LUKS/LVM setup
VOLGROUP='myvg' # The LVM VG containing your encrypted / and swap partitions
KEYMAP='uk' # Your console keyboard map, eg. uk for en_GB

# Make the initrd
mkinitrd -c -k "${KERNEL_VERSION}" -m "${ROOT_FS}":jbd2:mbcache:ehci-pci:ehci-hcd:xhci-hcd:uhci-hcd:ohci-hcd:usbhid:hid_generic:hid -f "${ROOT_FS}" -r /dev/"${VOLGROUP}"/root -C "${LVM_PART}" -L -l "${KEYMAP}" -h /dev/"${VOLGROUP}"/swap -o /boot/initrd.gz

# Re-run lilo
lilo -v

Migrating Slackware to New Hardware

December 3rd, 2009 1 comment

Sometimes it is necessary to retain a Slackware installation, but change the hardware it runs on:

  • Migrating to or from virtual hardware (VirtualBox, VMware etc)
  • Duplicating an installation across multiple new servers
  • Using temporary hardware to set up a new OS for a server to minimise downtime
  • Plain old hardware failure
  • etc.

Under many other Operating Systems, especially Windows, this can be painful and perhaps not even worth doing. As usual it is easy with Slackware.

In the simplest case (default install, default kernel, desktop environment) there’s literally nothing you need to do. Just put the disk or image in the new hardware and boot.

If you are using a custom kernel (as you should be) then you will need to create a new kernel and update lilo either before or after you do the migration. If you have any sense of self-preservation, your lilo config will include the default “huge” kernel and so the minimum you need to do is just boot the huge kernel on the new hardware, then you can go about making a new custom kernel later.

The bit that will catch you out:
There is one thing that might make you stumble: you moved the installation, booted up and all is well.. but the network isn’t working. You run `ifconfig -a` and find your two network interfaces are now called eth2 and eth3 and neither is configured.. “What’s going on?”, you ask. The answer is udev.

udev knows that your two new network interfaces are different to your old ones because they have different MAC addresses and decides you might not want to have them configured the same; perhaps you are going to swap in the old cards later and have four, so it reserves the previously used “ethX” labels in case it sees those cards again.

Since you are migrating to new hardware, you want it to forget about your previous network interfaces and re-use the labels with your new hardware. Head to /etc/udev/rules.d and find the file called 70-persistent-net.rules. Take a look at it.. it should look something like this:

# This file was automatically generated by the //lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.

# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:f3:7c:75:31", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:f3:7c:75:32", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"

# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:f3:7c:75:b7", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"

# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:f3:7c:75:b8", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"

See how it has an entry for both the old interfaces and the new ones? That’s what’s meant by persistent net rules. If you moved to a new machine again, it would add the two new interfaces as eth4 and eth5.

You have two choices for fixing the situation:

  1. Delete the file. It will be re-created on next boot and start again from eth0.
  2. Manually modify the file to reflect the configuration you want.

Note: There is also a 70-persistent-cd.rules file that it’s worth keeping an eye on during hardware migration, but usually it’s only the net rules that actually cause people problems.

High-Availability Storage with Slackware, DRBD & Pacemaker

September 29th, 2009 13 comments

The Problem

The current storage system at work is unmanageably insane. Storage is split across a ton of different machines, mostly Solaris 7 & 9 with one Solaris 10 machine. There’s one hardware RAID5, two software RAID5s and several almost randomly thrown together stripes and mirrors, all spread very widely and accessed through samba & NFS, sometimes using autofs and NIS, and all through what is effectively a single top-level directory pointing at everything. It took a whole whiteboard just to describe the RAID5 layouts.. the others are still somewhat undocumented. Also, a major problem is that we are running out of space. When Sun quoted over £4,000 to expand our available space by 500GB, I knew it was time to do something serious.

The Solution

Why, migrate to Slackware of course!

The Details

Primary Goal: Low-Cost Redundancy

Previously all disaster-mitigation rested on expensive Sun hardware and software support contracts that didn’t even cover the network sufficiently. If something died, it would be down until Sun helped to fix it with engineer(s) and replacement parts. So, we were spending insane amounts of money and getting very little in return and with a lot of downtime at risk. What I aimed for in the replacement was to have redundancy built-in to everything. The idea being that if something failed, redundant equipment would take over and the failed part would be replaced under warranty for as long as the warranty lasts, or replaced at cost if after warranty-expiration. Because the parts would all be normal PC components rather than expensive proprietary Sun equipment, replacement costs are easy to accept.

Secondary Goal: Simplification

The old system is a mess. Even the longest-serving members of staff have trouble finding where things are stored and duplication of data is a big problem. The new system has to be simple. In terms of storage simple means a single logical storage structure; a single root directory under which is a well thought out hierarchy of directories each with its own clear purpose. This would not only make it easier for the users, but for me as well. Administration of everything from authentication to backups would become a breeze.

The Disks

SCSI vs SATA

Most of the old equipment used SCSI disks and it was the cost of Sun’s proprietary U320 SCSI disks that generated the £4,000 quote for 0.5TB extra storage. With 1TB enterprise-class SATA disks now available for £100 each, it was a no-brainer. It doesn’t matter how many fail over the life of the system, it’s still not going to be worth running SCSI disks. For the same price Sun asked for 0.5TB I could buy 40TB raw SATA storage. The only concern would be performance. With a little analysis, it is easy to see that the performance of a new SATA system would be more than sufficient. The old system had a single hardware-RAID enclosure of 0.5TB running U320 Sun SCSI disks that provided good I/O performance, but the rest of the system couldn’t even compare with it. The second best systems were running software RAID5 on SPARC machines with only a gig of RAM each and minimal processing speed, needless to say the third and fourth best machines didn’t even approach that. Decision made. If the staff were used to the awful performance of the previous systems, there’s no way I could do worse with a well-designed SATA based system.

UPDATE 20100427:
bonnie++ proved me right. The new system is between 3 and 10 times faster than the old hardware RAID5 and up to 20 times faster than the old software RAID5s, depending on the test.

RAID

Since this is a storage system, redundancy revolves around how the data is stored and protected. I did a huge amount of research into storage systems and redundancy, looking at real-world examples and case studies including everything from the expensive proprietary solutions to the cheapest open solutions. RAID is central to just about every single one of them (our old system included). If the data matters, you absolutely have to have some kind of on-line redundancy otherwise a single disk failing could put you back to your tape backups which may or may not have succeeded. The question is.. what RAID level?

RAID0 was never a consideration. There’s no redundancy.

RAID1 is too primitive for our needs. The use of space is inefficient and you need to put logical volume management on top if you want volumes any larger than your physical block devices.

RAID5 is what has been in place here for some time, and for small data sets it is a reasonable option. But for the amount of data we are now talking about handling I consider RAID5 to be tantamount to professional negligence. If one disk dies, the amount of time spent rebuilding the array is long enough that a second disk is too likely to die, irrecoverably trashing the whole array. This actually happened a few years ago and due to an extreme clusterf*ck by all involved left the company without one of its critical storage servers for 5 whole months.

With these more primitive RAID levels discounted I started to look at the serious contenders: RAID6 and RAID10.

RAID6 is a good solution to a simple problem. RAID5 fails if you lose two disks in rapid succession which is a likely occurrence. RAID6 adds a second parity disk so that you may lose two and still be okay. Unfortunately this second parity calculation means it’s worse on performance than RAID5. NetApp have a good solution to the performance problem with their proprietary RAID6 implementation which they call RAID-DP (double parity). But have you seen their price list?! It’s horrifying.

RAID10 is pretty good for redundancy but just doesn’t sit right with me. I could understand mirroring a stripe, but standard RAID10 is striping mirrors; it just feels wrong. There’s too many cases where losing the wrong disks together will cause data loss. Where RAID5 and RAID6 don’t really care which disk is which, RAID10 does.

So, RAID levels 6 and 10 are close to what I want, but neither ticks all the boxes. More than that, neither will handle the loss of three disks. It may be unlikely, but paranoia is never a bad thing when you’re dealing with critical data and you have to at least be able to answer the question if only for your disaster recovery plan: what do I do if I lose three disks? If the answer is panic, then your design is flawed. In the above cases the answer is just revert to backups. Well, call me paranoid, but I don’t trust tape backups. They are a last resort as far as I am concerned; an afterthought if you will. They should not form part of your primary plan, they are there for when absolutely everything else fails.

So, if neither 6 or 10 will do.. what do you do? It took me a while to come up with the answer, but I did come up with it: RAID61.

RAID61 (or RAID 6+1) is what I am calling my solution. As the name suggests, it is a combination of RAID levels 6 and 1. Instead doing it the RAID10 way and having each member disk of the RAID6 array mirrored which would be difficult to implement and risky based on exactly which disks could die without affecting the array, this is the other way around. Two complete RAID6 arrays each a mirror of the other. This allows for the loss of any 5 disks at once. And, if they are the right disks (RAID10 thinking) you can lose as many disks as you have in the RAID6 and still two more with no data loss and no service interruption. In terms of performance, you can expect the same performance as RAID6. It’s not quite as good as RAID10 performance for a small number of disks, but it’s definitely acceptable and could possibly beat RAID10 as the number of disks increases.

You may have noticed that the space efficiency isn’t brilliant: (N+2)*2 where N is the number of disks required to provide the accessible storage space required. Some people may not consider that viable for their environment, but when you really look at the details, you are looking at almost the same space-efficiency as RAID10, but with an order of magnitude more redundancy. It’s a judgement call which you go with, but there’s no question which one I prefer when given the arguments above; especially since I’m talking about a SATA environment where raw disk cost is so low anyway.

The Complete Hardware Setup

You might be wondering how I intend to implement all this. Well, this is where the design really shines.

  • Two completely independent SuperMicro servers with Xeon processors and an empty CPU slot for later instant-upgradability.
  • Slackware64-13.0 Operating System
  • 3ware 9650SE-12ML Hardware Raid
    • OS installed on hardware RAID1 using Western Digital Velociraptor hard disks
    • Storage on RAID6 (256K stripes) using 1TB disks from Western Digital and Seagate, all sourced from different vendors to ensure different production batches
  • RAID6 synchronised over a dedicated gigabit network connection using DRBD
  • Pacemaker handling fail-over management.
  • Triple-redundant PSUs on the SuperMicro chassis
  • PSUs cross-plugged into two APC 2200VA UPSes so both UPSes support both machines.

The Resulting Redundancy

Let me just walk you through the redundancy this RAID61 set-up provides:

X Dies: Result (Data Loss, Performance Loss, Service Interruption)

  • 1 Disk: Best server primary during rebuild (None, None, None)
  • 2 Disks (same array): Best server primary during rebuild (None, None, None)
  • 2 Disks (diff arrays): Best server primary during rebuild (None, Slight, None)
  • 3+ Disks (same array): Best server primary, rebuild/resync failed (None, None, None)
  • 3 Disks (diff arrays): Both rebuild, best array primary (None, Slight, None)
  • 4 Disks (1+x): Best server primary & rebuild, rebuild/resync failed (None, Slight, None)
  • 4 Disks (2+2): Both rebuild, best is primary (None, Worse, None)
  • 5+ Disks (2+x): Best server primary & rebuild, rebuild/resync failed (None, Worse, None)
  • 6+ Disks (3+x): Data loss. Manually reconstruct & revert to backup (None, Complete, Yes)
  • 1 RAID controller: Best server primary. Manual intervention (None, None, None)
  • 1 Motherboard: Best server primary. Manual intervention (None, None, None)
  • <5 PSU modules: Best server primary. Manual intervention (None, None, None)
  • Other core hardware: Best server primary. Manual intervention (None, None, None)
  • 1 UPS: Replace UPS. Neither server affected (None, None, None)
  • Power outage (brown or black): Both servers supported by 2xUPS. Auto-shutdown at critical battery (None, None, Only during extended outage)
  • Armageddon / The Rapture / Alien Invasion: Pray (Yes, Complete, Yes)

The Implementation

This is the difficult part: actually doing it.

Hardware

Thankfully I have a friendly server vendor that’s good at getting whatever you ask for however you ask for it for a very respectable price. So sourcing the hardware was not too difficult. It cost a bit extra to get the disks from lots of different sources because of shipping prices, but that was expected and done successfully. Absolutely everything, UPSes included, for a shade over £7,000. Absolutely awesome. Especially considering that the same solution from NetApp would cost in excess of £25,000 (approximately, based on real quotes) and would probably require a Windows server in the middle too.

Worth noting that the 9650SE-12ML RAID cards were on the firmware from 9.5.1.1 and so couldn’t do 256K striping as it’s a relatively new feature, but they provide a downloadable ISO boot disk that lets you update the firmware quickly and easily.

Once booted I spent a little time playing with the BIOS and setting up the RAID configuration identically on both machines which was such a beautifully simple experience in comparison with other RAID BIOSes I have dealt with in the past. I really like 3ware and am very sad they’ve been bought out by LSI (whose equipment I have sadly had to deal with before).

Because the storage arrays are 3TB in size (5x1TB RAID6 with 1 Hot Spare), a standard MBR wouldn’t do the job, so I had to discover the wonder of GUID Partition Tables (GPT) and the fact that none of the software I use supports it. cfdisk, fdisk and sfdisk all fall over and die at the prospect of a GPT. For a while I found myself stuck with GNU Parted which I really hate, but did manage to find an fdisk clone for GPT.

Software

I used PXE with NFS to boot and install Slackware64-13.0 onto one of the servers. Immediately built a custom kernel exactly as I would on any other machine. Then sbopkg added the extra few bits like htop, nload, lshw, nagios-nrpe (my own slackbuild submitted to SBo), nagios-plugins (mine too) and hddtemp (yep.. that too). A few preference modifications to the setup later and it was ready for the software specific to this setup: APC PowerChute, 3ware 3DM2, DRBD & Pacemaker.

APC PowerChute

As an application I hate PowerChute. It’s written in Java (ugh) and gives you no flexibility whatsoever, but they provide an installer and an init script and it seems to work reasonably well so it is at least acceptable. The default install location is in /opt and moving the provided init script to /etc/rc.d/rc.PBEAgent and calling it from rc.local finished the job.

3ware 3DM2

3DM2 is very similar to PowerChute in that you have to use the provided Java installer which is hateful but works in most cases. I didn’t in mine of course. For some reason, even using the CLI mode, the installer would take literally 20 or 30 mins to jump from one screen to the next. I have no idea why and I couldn’t replicate the behaviour on my 32-bit desktop machine. I have absolutely no explanation for why this is and after hours of looking into the problem I simply gave up and waited for the installer to complete, which at least it did. It took all afternoon, but it worked. And just for good measure I have taken a copy of the installed files so I can re-install in seconds at a later date if necessary. I had considered doing this from a 32-bit Slackware machine, but I think the installer may do some environment-specific configuration so I didn’t tempt fate.

With the provided init script moved to /etc/rc.d/rc.3dm2 and called from rc.local that was another one dealt with.

DRBD

I hadn’t looked at the DRBD installation process prior to doing it and discovered the choice of building a kernel module or patching the kernel source. It doesn’t seem there’s all that much difference between the two options, but since I like hacking the kernel and already had a custom kernel that included the 3ware-9xxx driver it seemed logical to patch the kernel and recompile; especially since kernel compilation only takes 7 minutes on this machine (-j13).

Update: Even though they still provide the capability to create a kernel patch, less than two weeks ago, the kernel patching instructions were removed from the DRBD site. All the evidence I can find points to the fact that they do not want you building it as anything but a module, but I can’t for the life of me work out why. I believe that DRBD is about to officially enter the Linux kernel where you will have the choice of building as a module or building into the kernel – so why they have some requirement to build it as a module I don’t know. Even most of the stuff you can run to monitor DRBD will complain that the kernel module is not loaded if it is compiled-in.

To compile-in (v8.3.4):

cd /usr/src
tar -xvf ~/drbd-8.3.4.tar.gz
cd drbd-8.3.4
make clean
make KDIR=/usr/src/linux kernel-patch
cd /usr/src/linux
patch -p1 < /usr/src/drbd-8.3.4/patch-linux-drbd-8.3.4

Then either manually modify your .config file to add:

CONFIG_BLK_DEV_DRBD=y

or just `make menuconfig` and go to Device Drivers -> Block Devices, highlight “DRBD Distributed Replicated Block Device support” and press “y”. DO NOT enabled DRBD tracing. You don’t need it and last time I checked it could cause major instability.

Then compile your kernel, install your modules, update /boot, run lilo and reboot.

You will also need to install the userland tools for DRBD:

cd /usr/src/drbd-8.3.4
make install-tools

At this point I set about starting and testing DRBD. I setup an XFS filesystem on the 3TB storage array, leaving space at the end for safety and for the DRBD internal metadata. Exactly how you do this is up to you as your system will be different to mine and it’s very important that you understand what you’re doing before you even start. I was a little worried to start with because the initial synchronisation was estimating completion in approximately 3 months time(!!). didn’t take long to discover I hadn’t set a sync rate and so it was being limited by the default rate. I changed it to 110M and 20 hours later the initial sync was complete and fully functioning.

UPDATE 20091109:
I have decided to give up and bend to the will of Linbit and do module installation. Primarily because, since they’ve made some modifications to the source for v8.3.6 it’s become very much easier to made a SlackBuild out of it.

You can find my SlackBuild for v8.3.6 on SlackBuilds.Org and also here.

UPDATE 20100308:
DRBD v8.3.7 is out and has also entered the 2.6.33 kernel which is now in Slackware{,64}-current. The SlackBuild has been split into two: drbd-kernel and drbd-tools. If you are using a 2.6.33 or later kernel you only need the drbd-tools package. With an earlier kernel you need both.

Details and downloads: http://blog.tpa.me.uk/2010/03/04/drbd-8-3-7-slackbuilds/

Also submitted to SlackBuilds.Org where they should be available soon.

Pacemaker

Oh My God!

Setting up the pacemaker stack is the hardest thing I have yet had to do in my professional career. It’s insanely complicated. It used to be reasonably simple. You’d set-up heartbeat and that was it. Now, you have to install a minimum of four different components, each one completely unstable and barely documented. Your only other choice is to use the older heartbeat stuff which is already way past its sell-by date. What makes it even harder to understand initially is that the homepage you need for all of this is the Cluster Labs site, which concentrates on Pacemaker, whether it’s on Heartbeat or OpenAIS/Corosync, not the Linux HA site which concentrates on Heartbeat. Most people who are vaguely aware of previous Linux high availability set-ups know the system as “Hearbeat”, Heartbeat being the communication core of the system which would then have Pacemaker on top for resource management. The new implementation is known as “Pacemaker”, but with the heartbeat components replaced by OpenAIS. However, OpenAIS has been split into two projects: OpenAIS and Corosync, Corosync basically being the guts of the round-robin communication protocol and OpenAIS being some extra gubbins on top. It’s a ridiculous and insane mess and I’m confusing myself just trying to describe it.

Soo.. yeah it’s a mess.

Having said that, I shall continue to describe what I’ve done to get to a working setup. Bear in mind that each of the steps I’ve taken represents days or even weeks of: playing around clawing at brick walls trying to get information, compiling, recompiling, re-recompiling, upgrading, restarting, finding bugs, reporting bugs, upgrading around bugs, starting from scratch, modifying code to fit Slackware and rewriting configuration files so many times I can’t even remember where I started.

Note: It would appear that the intention of the developers is to only ever distribute the software via distribution specific packages, which is basically just RHEL, SUSE & Debian (the primary maintainers are all SUSE staff) and because they work exceedingly closely with the distributions and the package releases for them, they couldn’t give a toss that it’s insane when approached from a source-installation point of view.

UPDATE 05/11/2009
As of right now, the latest Pacemaker tip (and therefore the 1.0.6 “stable” release) will not compile on Slackware64 because of a hard-coded reference to /usr/lib in configure.ac. This has been reported, but not fixed in mercurial yet. If you need this patch, drop a comment on this page. I’m not actually expecting anyone to need the patch before the tip gets fixed.

Installation Process:

  1. Install Cluster Glue
  2. Install Cluster Resource Agents
  3. Install Corosync
  4. Install OpenAIS
  5. Install Pacemaker

In all of that, only OpenAIS and Corosync have normal release versions as you would expect with minor revisions for features & bugfixes. Cluster Glue, Cluster Resource Agents and Pacemaker all live in a mercurial repository with absolutely no meaningful release tags. The latest version of Pacemaker for example is 1.0.5, however that means nothing as the officially tagged 1.0.5 is over two months old and very very broken; not even suitable for a test environment. The only way to proceed is to use the mercurial “tip” (HEAD to you and me) which is tagged simply with a hex string. The same is true for the base Cluster packages. It’s hit or miss whether you’re going to get something that works, but it’s your only choice.

In order to ease my pain, I have put significant effort into making SlackBuilds for all of these components which are based on mercurial hex versions. The versions used in the SlackBuilds are what I’m currently running in production, but I’d recommend updating the versions to the latest tip before running them.

Note: These packages are tagged as if they have come from SlackBuilds.Org (SBo) as I hope to submit them there when I’ve set the files up exactly as they want them and I can’t be bothered to re-tag them just to site them here. Also, they are currently set up for x86_64 and a default of 13 jobs during make. Edit the scripts and adjust to your needs.

  1. Cluster Glue
  2. Cluster Resource Agents
  3. Corosync
  4. OpenAIS
  5. Pacemaker

Install in that order as each one depends on the previous one.

UPDATE 20100422:
The Pacemaker stack has finally been approved by SlackBuilds.Org so you may find the latest versions of these SlackBuilds there. I recommend sbopkg to get it all installed nice and simply
.

That will get you an installed Linux-HA software stack, but you haven’t even started yet. You have to configure everything. There’s so much to learn and so much to configure I’m not going to go through it all, but I am going to give you some notes on the most important bits:

System Boot
rc.local

if [ -x /etc/rc.d/rc.logd ]; then
    /etc/rc.d/rc.logd start
fi

if [ -x /etc/rc.d/rc.corosync ]; then
    /etc/rc.d/rc.corosync start
fi

rc.local_shutdown

if [ -x /etc/rc.d/rc.corosync ]; then
    /etc/rc.d/rc.corosync stop
fi

if [ -x /etc/rc.d/rc.logd ]; then
    /etc/rc.d/rc.logd stop
fi

Enable logd & corosync startup

chmod a+x /etc/rc.d/rc.{logd,corosync}
  • Ignore rc.ldirectord and rc.openais
  • OpenAIS might have it’s own startup script and its own call to “aisexec” but all it really does is start corosync with some flowers around it
  • ldirectord is one of those things you don’t need unless you know you need it
  • Just ignore them both unless you know you want/need them

NFS Resources
Normally, any service you want to manage with pacemaker should not be started with your normal system init scripts and should be left to the Resource Agent. However, the Heartbeatr OCF Resource Agent for NFS (ocf:heartbeat:nfsserver) actually delegates the grunt work to your own init script. In Slackware, if rc.nfsd is executable so that the OCF RA can run it, it would normally be called at system start-up too. If you want to manage NFS via pacemaker, you need to edit rc.M, rc.K and rc.6 in /etc/rc.d to comment out the calls to /etc/rc.d/rc.nfsd and then make it executable (chmod a+x /etc/rc.d/rc.nfsd).

Unfortunately it’s not quite that simple either. OCF Resource Agents are required to return very precise exit codes and they expect precise exit codes from init scripts. They also require a monitor() or status() option in order to function correctly. To that end I’ve had to do something quite horrible to the rc.nfsd script to make it work. Effectively I’ve added a status() routine, but in order to make sure it’s LSB compliant for the OCF RA, I’ve stolen a pre-compiled checkproc binary from a 64-bit SUSE machine as it produces the exact return codes the RA expects. Here is my modified script: rc.nfsd

Yeah, I don’t like using SUSE code any more than you do, but SUSE wrote most of the HA code and they wrote checkproc and they’re designed to both conform to the exact same standards, so live with it.

It’s not over. Oh no.

The nfsserver RA also has calls to mktemp in order to do its stuff. But whoever wrote it hardcoded the mktemp calls with /sbin/mktemp. This is not where mktemp is in Slackware, it’s in /usr/sbin/mktemp. Here’s a copy of the RA with the modifications made: /usr/lib/ocf/resources.d/zordrak/nfsserver

Notice it’s from /usr/lib/ocf/resources.d/zordrak not /usr/lib/ocf/resources.d/heartbeat. The best way to handle local modifications to OCF RAs is to make your own directory for the modified copies, then call them as such from the pacemaker configuration. For example, this OCF is called from my pacemaker config as ocf:zordrak:nfsserver instead of ocf:hearbeat:nfsserver. If I upgrade in the future, I don’t have to worry about my copy getting overridden during the upgrade.

Samba Resources
Samba isn’t as hard. You just need to `chmod a-x /etc/rc.d/rc.samba` to stop it from being started and stopped by the master init scripts and then use this OCF RA (which is not provided with the code): /usr/lib/ocf/resources.d/zordrak/samba

Pacemaker Configuration
Be VERY careful with quotes when configuring pacemaker. If you are editing the XML directly (whether using `crm configure edit`, or using cibadmin to export/import) then all parameter values should be enclosed in quotes. If, however, you are modifying parameters using the crm configure (live) command line or similar, then don’t use quotes. I am assured this very bad quote-handling will get cleaned up in coming commits, but I can’t be sure if or when. If you have a problem, check whether quoting has caused it. It is for this reason I didn’t use a symbol in a STONITH IPMI reset option because depending on how you modify the config, it might not even be possible to pass the symbol because you cant quote or escape it, but you cant get it past the shell either. Obviously there are ways and means of achieving whatever you want to do, but this is just a warning to be very careful with configuration quotes.

My CIB
Here I give you a sanitised version of my CIB to give you an idea of the configuration I have set up and how it looks.
There is a Master/Slave resource called ms-store_drbd which handles the master-slave configuration of the DRBD resources (store_drbd). There is then a group called store_serv which is dependant upon the DRBD resource and can only run on the same machine as DRBD and only once a DRBD node has become primary. The store_serv group consists of the filesystem on the DRBD device, a bind mount for sharing nfs state data, an IP address, NFS services and Samba services:

node node1
node node2
primitive nfs_fs ocf:zordrak:Filesystem \
        params device="/mnt/store/nfs" directory="/var/lib/nfs" options="bind" fstype="none" \
        meta target-role="Started"
primitive nfsd ocf:zordrak:nfsserver \
        params nfs_init_script="/etc/rc.d/rc.nfsd" nfs_notify_cmd="/usr/sbin/sm-notify" nfs_shared_infodir="/var/lib/nfs" nfs_ip="1.2.3.4" \
        meta target-role="Started"
primitive samba ocf:zordrak:samba \
        params smbd_enabled="1" nmbd_enabled="1" winbindd_enabled="0" smbd_bin="/usr/sbin/smbd" nmbd_bin="/usr/sbin/nmbd" smbd_pidfile="/var/run/smbd.pid" nmbd_pidfile="/var/run/nmbd.pid" testparm_bin="/usr/bin/testparm" samba_config="/etc/samba/smb.conf" \
        meta target-role="Started"
primitive st-node1 stonith:external/ipmi \
        params hostname="node1" ipaddr="1.2.3.9" userid="admin" passwd="password" interface="lanplus" \
        meta target-role="Started"
primitive st-node2 stonith:external/ipmi \
        params hostname="node2" ipaddr="1.2.3.10" userid="admin" passwd="password" interface="lanplus" \
        meta target-role="Started"
primitive store_drbd ocf:linbit:drbd \
        params drbd_resource="r0" \
        op monitor interval="59s" role="Master" timeout="30s" \
        op monitor interval="60s" role="Slave" timeout="30s" \
        meta target-role="Started"
primitive store_fs ocf:zordrak:Filesystem \
        params device="/dev/drbd0" directory="/mnt/store" fstype="xfs" \
        meta target-role="Started"
primitive store_ip ocf:zordrak:IPaddr2 \
        params ip="1.2.3.4" nic="eth0" cidr_netmask="24" \
        meta target-role="Started"
group store_serv store_fs nfs_fs store_ip nfsd samba \
        meta target-role="Started"
ms ms-store_drbd store_drbd \
        meta clone-max="2" notify="true" globally-unique="false" target-role="Started" master-max="1" master-node-max="1" clone-node-max="1"
location cli-prefer-store_serv store_serv \
        rule $id="cli-prefer-rule-store_serv" inf: #uname eq node1
location l-st-node1 st-node1 -inf: node1
location l-st-node2 st-node2 -inf: node2
colocation store_serv-on-store_drbd inf: store_serv ms-store_drbd:Master
order store_serv-after-store_drbd inf: ms-store_drbd:promote store_serv:start
property $id="cib-bootstrap-options" \
        expected-quorum-votes="2" \
        stonith-action="poweroff" \
        no-quorum-policy="ignore" \
        dc-version="1.0.5-b3bf89d0a6f62160dc7456b2afa7ed523d1b49e8" \
        cluster-infrastructure="openais"
Categories: Tags:

Compiling your own kernel in Slackware Linux in 10 easy steps…

August 31st, 2009 33 comments
  • This guide assumes you are logged in as root.
  • This guide is for re-compiling a Slack kernel rather than upgrading it.
  • If compiling the kernel as root worries you, compile as a non-root user.
  • DO NOT compile a pre-release kernel as root.
  • Slackware ships with stable, tested kernels so don’t worry.
  • I personally follow this process as root.
  • I make no guarantees of any kind.

1. Clean your kernel source

cd /usr/src/linux
make mrproper

2. Use the provided generic kernel as a template

cp /boot/config-generic-smp-$(uname -r) /usr/src/linux/.config

3. Modify the config

This is the hardest part. But it’s hard like a three month old banana is hard.
You can choose one of three tools for the job: config, menuconfig or xconfig. If you choose anything other than menuconfig you are insane and will be shot when the new world order comes to fruition.

cd /usr/src/linux # in case you left for some daft reason
make menuconfig

You will now be presented with the kernel configuration menu.

a. Change your LocalVersion

You could overwrite the old generic kernel if you wanted to. Don’t. Re-label your kernel so that it’s called something else.

  • Select “General Setup”
  • Press Enter
  • Press the down cursor key once
  • Press Enter
  • Type the “-” key and then a word that pleases you. I usually use the hostname of the machine.
  • Press Enter
  • Press the right cursor key once
  • Press Enter

Voilá, you’re back at the first screen. From this point on I am going to assume that that was sufficient practice for you to handle modifying a kernel option on your own and will become less patronising.

b. Select your processor family

This isn’t technically required, but to build your own kernel and not have it optimised for your processor is insane. So do it.

  • Select “Processor type and features”
  • Select “Processor family”
  • Select the right processor family for the processor in the machine on which the kernel will be running

If you don’t know which processor family is right for your processor, follow this link.

c. While you’re there, how much RAM have you got? (32-bit Slackware only)

If you have 3.5GB RAM or less, skip to the next step. If you have more than 3.5GB RAM, while in the “Processor type and features” menu, go down to “High Memory Support” and change it to 64GB. Don’t argue, just do it.

d. Compile-in your root file system

I’m going to assume you know what file-system you have installed on your root (/) partition. If you don’t know then it’s ext3 if you’re on Slackware 12, or ext4 on Slackware 13. Go back to the main menu and head down into the “File systems” submenu. Find your file system in the list and once it’s highlighted, press “y” to ensure that it’s compiled-in [*] rather than compiled as a module [m] or not compiled at all [ ]. There are more options for your file system, but the defaults will suit you just fine.

e. Compile-in the storage controller for your main hard disk

This is probably the hardest step of the entire process. It’s not month old banana, more like week old pear. The reason it can be difficult is you need to find out what driver is required for your disk controller. Some are easy, some aren’t. For example, an Asus motherboard with an nForce chipset has six or so primary SATA ports. If your disk is plugged into one of these, the option you want is:

  • “Device Drivers” -> “Serial ATA (prod) and Parallel ATA (experimental) drivers” -> “NVIDIA SATA Support”

If you have an Intel motherboard then almost certainly all you’ll need is:

  • “Device Drivers” -> “Serial ATA (prod) and Parallel ATA (experimental) drivers” -> “Intel ESB, ICH, PIIX3, PIIX4 PATA/SATA support”

In any case, it is up to you to be sure what you need. If you get it wrong, it’s not the end of the world, you just reboot into the working kernel and try again, but if you want to save yourself hassle, get it right first time. Once you know which one you need.. make sure it is selected for compile-in [*]. If you’re feeling experimental, you can then change any other controller that is currently compiled-in to compile as a module [m] by highlighting it and pressing “m”. This will make your kernel image smaller and your system leaner and faster as you wont have unnecessary code clogging up your kernel, but it won’t make your system any less functional as you can always load the code as a module if you need it.

That’s it.. it’s all downhill from here.

4. Compiling the kernel

Exit the config utility and save the kernel config when prompted.

Copy the config right now. Since you’ll copy it later anyway, do it now so you have a backup in case you do anything stupid like a `make mrproper` in the kernel source directory.

cp .config /boot/config-generic-smp-$(uname -r)-<LocalVersion>

where <LocalVersion> is the string you added in the kernel config LocalVersion option.

Time to make your kernel:

make -j7

This will compile your kernel and on a reasonable system take 20-ish minutes. The -j7 tells the compiler how many threads to run at once. On a single core, single processor system, reduce this to -j3. On a reasonably average system use -j5 or -j7. On a Xeon E5520 single-processor system (8 cores) use -j11. On a twin-processor, Quad-Core Xeon system use as many as you want.

Go make a cup of coffee while it compiles..

Go on.. I’ll wait…

Ok, so that’s it, you’ve compiled your kernel.

5. Install the kernel modules

This bit’s really hard:

make modules_install

Told you.

6. Install the kernel, config & System.map

For 32-bit Slackware:

cp arch/i386/boot/bzImage /boot/vmlinuz-generic-smp-$(uname -r)-<LocalVersion>

or if you’re running Slackware64:

cp arch/x86/boot/bzImage /boot/vmlinuz-generic-smp-$(uname -r)-<LocalVersion>

System.map:

cp System.map /boot/System.map-generic-smp-$(uname -r)-<LocalVersion>

config (I know you already did it, but just in case):

cp .config /boot/config-generic-smp-$(uname -r)-<LocalVersion>

7. Update your symlinks

cd /boot
rm System.map vmlinuz config
ln -s System.map-generic-smp-$(uname -r)-<LocalVersion> System.map
ln -s config-generic-smp-$(uname -r)-<LocalVersion> config
ln -s vmlinuz-generic-smp-$(uname -r)-<LocalVersion> vmlinuz

8. Update lilo

vi /etc/lilo.conf

You will already have a section like this to boot your new kernel:

image = /boot/vmlinuz
root = /dev/sda1
label = Linux
read-only

although you may want to change the label to reflect that it’s your new custom kernel.

You also need to add a line for the huge kernel as a just-in-case. This will allow you to boot straight back into your system in case your generic kernel is borked. You need to check your /boot directory for the correct filename, but it will almost certainly be exactly like the example, but with a specific kernel version in the name.

image = /boot/vmlinuz-huge-smp-$(uname -r)
root = /dev/sda1
label = Slack-HugeSMP
read-only

9. Run lilo

lilo

10. Reboot

Cross your fingers, because if all is well, you will now reboot into your own custom kernel. No initrd, no bloat, just Pat’s generic kernel with a performance tweak and the drivers you need to boot with.

Tell me that wasn’t simple(!)

This guide is purely for the idiot-proof 10-step simple-rebuild process. If you want to upgrade to a new version, or start with a clean config, or do more aggressive kernel tweaking, or play with on-boot modules, or make an initrd (shudder) then you should head over to alienBOB’s guide on the SlackBook Wiki: http://alien.slackbook.org/dokuwiki/doku.php?id=linux:kernelbuilding

Categories: Tags:

The Evil of InitRD

August 30th, 2009 No comments

I understand that there are rare situations in which an initrd can be useful. For example, when hardware is constantly being swapped out, or when you absolutely must have an identical kernel image, or if you are using LUKS for your / partition. However, in general an initrd is, in my opinion, a completely pointless level of complexity that you are better off without.

I’m currently managing in excess of 15 completely different Slackware installs across lots of different hardware and not one of them is running a “huge” kernel, or an initrd. If a module is absolutely essential to your system’s ability to boot, why in hell haven’t you compiled it into your kernel? It’s like having a your starter-motor in your car disconnected from the battery, but having a relay on its own mini-battery connect it up when you turn the ignition.. why would you do it? What’s the point? The car will not start without it, why is it not hard wired into the system? It’s the same for the kernel. If the system won’t start without it, compile it into the kernel. Don’t play about with an injection system you don’t even need.

Ok, so your root device is inside RAID & LVM on a GPT partition. So, compile in mdadm support for your RAID level, LVM support and GPT partition support; job done.

For the less experienced, it’s also a good way to learn the basics of kernel compilation without needing to do much more than follow a standard process. All you need to do is use the config from the generic kernel you were going to boot anyway, and knowledge of what filesystem your root device is installed with and what storage controller your disk uses. Go into the kernel config, add them in [*], make, make modules_install, move and symlink the new kernel, update lilo, reboot. Once you’ve done it twice, it becomes so routine and easy you’ll wonder why you haven’t been doing it forever.

Then, once you’re more familiar with the process, you can start removing other parts of the kernel you don’t need, especially hardware controllers for hardware you don’t have, with each step making your kernel smaller and your system leaner and faster.

Have kernel, will compile.

Post followed-up by:
Compiling your own Slackware kernel

Categories: General, Kernel, Linux, Rant Tags: , , , ,