Unix articles: September 2009

Thursday, September 24, 2009

VCS and LLT delays or lost errors

VCS and LLT delays or lost errors
When using Veritas Cluster Server (VCS), the following error messages from your system logs can indicate a problem with the cluster heartbeat interconnects:

Dec 12 15:26:20 serverb llt: [ID 194859 kern.notice] LLT:10019: delayed hb 935350 ticks from 1 link 0 (qfe:0)
Dec 12 15:26:20 serverb llt: [ID 761530 kern.notice] LLT:10023: lost 18706 hb seq 3448194 from 1 link 0 (qfe:0)
Dec 12 15:26:20 serverb llt: [ID 194859 kern.notice] LLT:10019: delayed hb 935350 ticks from 1 link 1 (eri:0)
Dec 12 15:26:20 serverb llt: [ID 761530 kern.notice] LLT:10023: lost 18706 hb seq 3448194 from 1 link 1 (eri:0)

These types of messages can be seen when you are running two LLT links over the same physical network. This is bad from a design point of view, as it may introduce a single point of failure. However, there are situations where you may have two physical connections into your cluster servers and have the links run over the same VLAN. If you are sure your interconnects are working properly and you are experiencing this error due to the issue described above then you should be able to solve it by changing your /etc/llttab file on all cluster members.

By default, on Solaris, your /etc/llttab file will look something like this:

set-node servera
set-cluster 1
link eri0 /dev/eri:0 - ether - -
link qfe0 /dev/qfe:0 - ether - -
link-lowpri ce0 /dev/ce:0 - ether - -

The second to last field for each of the links is the SAP field, or ethernet type used for the LLT link. This defaults (when specified using -) to 0xCAFE. Two LLT links on the same physical broadcast domain for a cluster cannot share the same SAP ID. If you do this, you may get the above error messages. Assuming this to be your problem (eg, if you run your eri0 and qfe0 links over the same broadcast domain) you can work around the problem by changing your /etc/llttab file to the following:

set-node servera
set-cluster 1
link eri0 /dev/eri:0 - ether 0xCAFE -
link qfe0 /dev/qfe:0 - ether 0xCAFF -
link-lowpri ce0 /dev/ce:0 - ether - -

This tells LLT to use different SAP types for the two links. All cluster members need to have this change made on them and have the cluster node restarted or have llt restarted.

Saturday, September 5, 2009

Performing maintenance when booted from cdrom

Introduction

If the system has to be booted from a cdrom or from the network ("boot cdrom" or "boot net") in order to perform maintenance, the operator needs to adjust for the existence of a mirrored operating system. Because these alternate boot devices do not include the drivers necessary for disksuite, they cannot be used to operate on state database replicas and Disksuite metadevices. This raises subtle issues addressed below.

Typically, the administrator is often under pressure while performing these types of maintenance. Because simple mistakes at this stage can render the system unusable, it is important that the process be well documented and tested prior to using it in production. Fortunately, this process is simpler than that for Veritas volume manager because there are no "de-encapsulation" issues to address.

Booting from cdrom with DiskSuite mirrored devices

In the example below, the server pegasus has two internal disks (c0t0d0 and c0t1d0) under Disksuite control. The operating system is mirrored between the two devices, with slices five and six on each disk employed for state database replicas. Assume that the administrator has forgotten the root password on this server, and needs to boot from cdrom in order to edit the shadow file.

Insert the Solaris operating system CD into the cdrom drive and boot from it into single-user mode:

ok boot cdrom -s

Initializing Memory
Rebooting with command: boot cdrom -s
Boot device: /pci@1f,4000/scsi@3/disk@6,0:f File and args: -s
SunOS Release 5.8 Version Generic_108528-07 64-bit
Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved.
Configuring /dev and /devices
Using RPC Bootparams for network configuration information.
Skipping interface hme0

INIT: SINGLE USER MODE
#

Fsck and mount the root disk's "/" partition in order to edit the /etc/shadow file:

# fsck -y /dev/rdsk/c0t0d0s0

# mount /dev/dsk/c0t0d0s0 /a

Remove the encrypted password from the /a/etc/shadow file:

# TERM=vt100; export TERM

# vi /a/etc/shadow

For example, if the entry for the root user looks like the following:

root:NqfAn3tWOy2Ro:6445::::::

Change it so that is looks as follows:

root::6445::::::
Comment out the rootdev entry from the /a/etc/system file:

# vi /a/etc/system

For example, change the line:

rootdev:/pseudo/md@0:0,0,blk

to

* rootdev:/pseudo/md@0:0,0,blk

Update the /a/etc/vfstab file so that it references simple disk slices instead of disksuite metadevices. Note that you only have to change those enties that correspond to operating system slices.

For example, one would change the following /a/etc/vfstab file:

#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/md/dsk/d1 - - swap - no -
/dev/md/dsk/d0 /dev/md/rdsk/d0 / ufs 1 no logging
/dev/md/dsk/d4 /dev/md/rdsk/d4 /var ufs 1 no logging
swap - /tmp tmpfs - yes -

to:
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c0t0d0s1 - - swap - no -
/dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no logging
/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /var ufs 1 no logging
swap - /tmp tmpfs - yes -

Unmount the root filesystem, fsck it, and return to the ok prompt:

# cd /; umount /a; fsck -y /dev/rdsk/c0t0d0s0

# stop-a
ok

Boot from c0t0d0 into single user mode. It is important to boot just to single user mode so that DiskSuite does not start automatically:

ok boot -sw

When prompted for the root password, just press the ENTER key. Once at the shell prompt, clear the metadevices that referenced the filesystems that you updated in the /a/etc/vfstab file above:

# metaclear -f -r d0 d1 d4

Now would be an appropriate time to create a new password for the root account, via the passwd root command

Exit from single user mode and continue the boot process.

# exit

The system should now boot cleanly, but will now be running on a single disk, and would not survive the failure of that disk. In order to restore operating system redundancy, follow the procedure for mirroring the operating system

Friday, September 4, 2009

Replacing a failed bootdisk

In the following example, the host has a failed bootdisk (c0t0d0). Fortunately, the system is using DiskSuite, with a mirror at c0t1d0. The following sequence of steps can be used to restore the system to full redundancy.

System fails to boot

When the system attempts to boot, it fails to find a valid device as required by the boot-device path at device alias "disk". It then attempts to boot from the network:

screen not found.
Can't open input device.
Keyboard not present. Using ttya for input and output.

Sun Ultra 30 UPA/PCI (UltraSPARC-II 296MHz), No Keyboard
OpenBoot 3.27, 512 MB memory installed, Serial #9377973.
Ethernet address 8:0:20:8f:18:b5, Host ID: 808f18b5.

Initializing Memory
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
...

Boot from mirror

At this point, the administrator realizes that the boot disk has failed, and queries the device aliases to find the one corresponding to the disksuite mirror:

ok devalias
sds-mirror /pci@1f,4000/scsi@3/disk@1,0
sds-root /pci@1f,4000/scsi@3/disk@0,0
net /pci@1f,4000/network@1,1
disk /pci@1f,4000/scsi@3/disk@0,0
cdrom /pci@1f,4000/scsi@3/disk@6,0:f
...

The administrator then boots the system from the mirror device "sds-mirror"

ok boot sds-mirror

The system starts booting off of sds-mirror. However, because there are only two of the original four state database replicas available, a quorum is not achieved. The system requires manual intervention to remove the two failed state database replicas:

Starting with DiskSuite 4.2.1, an optional /etc/system parameter exists which allows DiskSuite to boot with just 50% of the state database replicas online. For example, if one of the two boot disks were to fail, just two of the four state database replicas would be available. Without this /etc/system parameter (or with older versions of DiskSuite), the system would complain of "insufficient state database replicas", and manual intervention would be required on bootup. To enable the "50% boot" behaviour with DiskSuite 4.2.1, execute the following command:

# echo "set md:mirrored_root_flag=1" >> /etc/system

Boot device: /pci@1f,4000/scsi@3/disk@1,0 File and args:
SunOS Release 5.8 Version Generic_108528-07 64-bit
Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved.
WARNING: md: d10: /dev/dsk/c0t0d0s0 needs maintenance
WARNING: forceload of misc/md_trans failed
WARNING: forceload of misc/md_raid failed
WARNING: forceload of misc/md_hotspares failed
configuring IPv4 interfaces: hme0.
Hostname: pegasus
metainit: pegasus: stale databases

Insufficient metadevice database replicas located.

Use metadb to delete databases which are broken.
Ignore any "Read-only file system" error messages.
Reboot the system when finished to reload the metadevice database.
After reboot, repair any broken database replicas which were deleted.

Type control-d to proceed with normal startup,
(or give root password for system maintenance): ******

single-user privilege assigned to /dev/console.
Entering System Maintenance Mode

Oct 17 19:11:29 su: 'su root' succeeded for root on /dev/console
Sun Microsystems Inc. SunOS 5.8 Generic February 2000

# metadb -i
flags first blk block count
M p unknown unknown /dev/dsk/c0t0d0s5
M p unknown unknown /dev/dsk/c0t0d0s6
a m p lu 16 1034 /dev/dsk/c0t1d0s5
a p l 16 1034 /dev/dsk/c0t1d0s6
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errors

# metadb -d c0t0d0s5 c0t0d0s6
metadb: pegasus: /etc/lvm/mddb.cf.new: Read-only file system

# metadb -i
flags first blk block count
a m p lu 16 1034 /dev/dsk/c0t1d0s5
a p l 16 1034 /dev/dsk/c0t1d0s6
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errors

# reboot -- sds-mirror

Check extent of failures

Once the reboot is complete, the administrator then logs into the system and checks the status of the DiskSuite metadevices. Not only have the state database replicas failed, but all of the DiskSuite metadevices previously located on device c0t0d0 need to be replaced. Clearly the disk has completely failed.

pegasus console login: root
Password: ******
Oct 17 19:14:03 pegasus login: ROOT LOGIN /dev/console
Last login: Thu Oct 17 19:02:42 from rambler.wakefie
Sun Microsystems Inc. SunOS 5.8 Generic February 2000

# metastat
d0: Mirror
Submirror 0: d10
State: Needs maintenance
Submirror 1: d20
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 13423200 blocks

d10: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c0t0d0s0
Size: 13423200 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s0 0 No Maintenance

d20: Submirror of d0
State: Okay
Size: 13423200 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t1d0s0 0 No Okay

d1: Mirror
Submirror 0: d11
State: Needs maintenance
Submirror 1: d21
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2100000 blocks

d11: Submirror of d1
State: Needs maintenance
Invoke: metareplace d1 c0t0d0s1
Size: 2100000 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s1 0 No Maintenance

d21: Submirror of d1
State: Okay
Size: 2100000 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t1d0s1 0 No Okay

d4: Mirror
Submirror 0: d14
State: Needs maintenance
Submirror 1: d24
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2100000 blocks

d14: Submirror of d4
State: Needs maintenance
Invoke: metareplace d4 c0t0d0s4
Size: 2100000 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s4 0 No Maintenance

d24: Submirror of d4
State: Okay
Size: 2100000 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t1d0s4 0 No Okay

Replace failed disk and restore redundancy

The administrator replaces the failed disk with a new disk of the same geometry. Depending on the system model, the disk replacement may require that the system be powered down. The replacement disk is then partitioned identically to the mirror, and state database replicas are copied onto the replacement disk. Finally, the metareplace command copies that data from the mirror to the replacement disk, restoring redundancy to the system.

# prtvtoc /dev/rdsk/c0t1d0s2 fmthard -s - /dev/rdsk/c0t0d0s2
fmthard: New volume table of contents now in place.

# installboot /usr/platform/sun4u/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0

# metadb -f -a /dev/dsk/c0t0d0s5

# metadb -f -a /dev/dsk/c0t0d0s6

# metadb -i
flags first blk block count
a u 16 1034 /dev/dsk/c0t0d0s5
a u 16 1034 /dev/dsk/c0t0d0s6
a m p luo 16 1034 /dev/dsk/c0t1d0s5
a p luo 16 1034 /dev/dsk/c0t1d0s6
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errors

# metareplace -e d0 c0t0d0s0
d0: device c0t0d0s0 is enabled

# metareplace -e d1 c0t0d0s1
d1: device c0t0d0s1 is enabled

# metareplace -e d4 c0t0d0s4
d4: device c0t0d0s4 is enabled

Once the resync process is complete, operating system redundancy has been restored.

Thursday, September 3, 2009

Configuring NEW LUNs:

# format < /dev/nullSearching for disks...doneAVAILABLE DISK SELECTIONS: 0. c1t0d0 /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100000c506b2fca,0 1. c1t1d0 /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100000c506b39cf,0Specify disk (enter its number):

# cfgadm -o show_FCP_dev -alAp_Id Type Receptacle Occupant Conditionc1 fc-private connected configured unknownc1::2100000c506b2fca,0 disk connected configured unknownc1::2100000c506b39cf,0 disk connected configured unknownc3 fc-fabric connected unconfigured unknownc3::50060482ccaae5a3,61 disk connected unconfigured unknownc3::50060482ccaae5a3,62 disk connected unconfigured unknownc3::50060482ccaae5a3,63 disk connected unconfigured unknownc3::50060482ccaae5a3,64 disk connected unconfigured unknownc3::50060482ccaae5a3,65 disk connected unconfigured unknownc3::50060482ccaae5a3,66 disk connected unconfigured unknownc3::50060482ccaae5a3,67 disk connected unconfigured unknownc3::50060482ccaae5a3,68 disk connected unconfigured unknownc3::50060482ccaae5a3,69 disk connected unconfigured unknownc3::50060482ccaae5a3,70 disk connected unconfigured unknownc3::50060482ccaae5a3,71 disk connected unconfigured unknownc3::50060482ccaae5a3,72 disk connected unconfigured unknownc4 fc connected unconfigured unknownc5 fc-fabric connected unconfigured unknownc5::50060482ccaae5bc,61 disk connected unconfigured unknownc5::50060482ccaae5bc,62 disk connected unconfigured unknownc5::50060482ccaae5bc,63 disk connected unconfigured unknownc5::50060482ccaae5bc,64 disk connected unconfigured unknownc5::50060482ccaae5bc,65 disk connected unconfigured unknownc5::50060482ccaae5bc,66 disk connected unconfigured unknownc5::50060482ccaae5bc,67 disk connected unconfigured unknownc5::50060482ccaae5bc,68 disk connected unconfigured unknownc5::50060482ccaae5bc,69 disk connected unconfigured unknownc5::50060482ccaae5bc,70 disk connected unconfigured unknownc5::50060482ccaae5bc,71 disk connected unconfigured unknownc5::50060482ccaae5bc,72 disk connected unconfigured unknownc6 fc connected unconfigured

# cfgadm -c configure c3Nov 16 17:32:25 spdma501 last message repeated 54 timesNov 16 17:32:26 spdma501 scsi: WARNING: /pci@8,700000/SUNW,qlc@2/fp@0,0/ssd@w50060482ccaae5a3,48 (ssd2):Nov 16 17:32:26 spdma501 corrupt label - wrong magic numberNov 16 17:32:26 spdma501 scsi: WARNING: /pci@8,700000/SUNW,qlc@2/fp@0,0/ssd@w50060482ccaae5a3,47 (ssd3):Nov 16 17:32:26 spdma501 corrupt label - wrong magic numberNov 16 17:32:26 spdma501 scsi: WARNING: /pci@8,700000/SUNW,qlc@2/fp@0,0/ssd@w50060482ccaae5a3,46 (ssd4):spdma501:# format < /dev/nullSearching for disks...Nov 16 17:33:04 spdma501 last message repeated 1 timeNov 16 17:33:07 spdma501 scsi: WARNING: /pci@8,700000/SUNW,qlc@2/fp@0,0/ssd@w50060482ccaae5a3,48 (ssd2):Nov 16 17:33:07 spdma501 corrupt label - wrong magic numberdonec3t50060482CCAAE5A3d61: configured with capacity of 17.04GBc3t50060482CCAAE5A3d62: configured with capacity of 17.04GBc3t50060482CCAAE5A3d63: configured with capacity of 17.04GBc3t50060482CCAAE5A3d64: configured with capacity of 17.04GBc3t50060482CCAAE5A3d65: configured with capacity of 17.04GBc3t50060482CCAAE5A3d66: configured with capacity of 17.04GBc3t50060482CCAAE5A3d67: configured with capacity of 17.04GBc3t50060482CCAAE5A3d68: configured with capacity of 17.04GBIF

YOU DON'T SEE THE NEW LUNS IN FORMAT, RUN devfsadm !!!!

# /usr/sbin/devfsadmLabel the new disks !!!!
# cd /tmp
# cat format.temp
# for disk in `format < /dev/null 2> /dev/null grep "^c" cut -d: -f1`
do
format -s -f /tmp/format.cmd $disk
echo "labeled $disk ....."
done