Unix articles: June 2011

Tuesday, June 28, 2011

How to Install Oracle Explorer Manually

How to Install Oracle Explorer Manually
Use the following procedure to install Oracle Explorer after you have downloaded the latest installer, as described in How to Download Oracle Explorer.

--------------------------------------------------------------------------------
Note –
Oracle Explorer must be installed in the global zone if you are installing it on the Solaris 10 Operating System (Solaris OS). In Solaris 10, the pkgadd command includes a -g flag that restricts installation to the global zone.

--------------------------------------------------------------------------------

If a version of Oracle Explorer is installed on the host, remove the SUNWexplo and SUNWexplu packages before installing the new Oracle Explorer package.

Become superuser.

Type the following command at the prompt:

pkgrm SUNWexplo

If the SUNWexplu package is also installed, type the following command at the prompt:

pkgrm SUNWexplu

--------------------------------------------------------------------------------
Note –
Removing the current SUNWexplo and SUNWexplu package saves the Oracle Explorer defaults file.

In Oracle Explorer 3.6.2 and earlier versions, the defaults file is explorer_install_dir/etc/default/explorer.

In Oracle Explorer 4.0 and later versions, the defaults file is /etc/opt/SUNWexplo/default/explorer.

You can save the defaults file and use it as input when you run the explorer -g command to create or update the defaults file. During installation of Oracle Explorer version 4.0 or later, this file is moved from the explorer_install_dir/etc/default/explorer directory to the /etc/opt/SUNWexplo/default/explorer directory. The contents of the defaults file are displayed as the default responses when you run the explorer -g command.

--------------------------------------------------------------------------------

The output directory of the most recent Oracle Explorer run is saved in the explorer_install_dir/output directory.

Extract Oracle Explorer from Services Tools Bundle (STB) using -ext option .

To obtain the STB installer options, type ./install_stb.sh -help

Uncompress and untar the Explorer_.tar.Z file:

cd /var/tmp/stb/extract/Explorer

Decide which of the following commands you should use to untar the file:

If you do not have zcat installed, type:

uncompress Explorer_.tar.Z
tar xvf Explorer_.tar

If you have zcat installed, type:

zcat Explorer_.tar.Z | tar xvf -

--------------------------------------------------------------------------------
Note –
If you want to use Explorer from an alternate path, proceed to step 2 in How to Use Explorer from an Alternate Path.

--------------------------------------------------------------------------------

To install Explorer and create directories called SUNWexplo and SUNWexplu type the following command at the prompt as superuser:

pkgadd -d . SUNWexplo SUNWexplu

--------------------------------------------------------------------------------
Note –
If this is an NFS installation that will support clients running Solaris 7 or older, use the following command:

echo "EXP_NFS_DEPLOY=1" > response
pkgadd -d . -r response SUNWexplo SUNWexplu

Thursday, June 23, 2011

Solaris JDK Installation (64-bit)

Installing the 64-bit JDK for the Solaris operating system on SPARC, x64, and EM64T platforms is a two-step process. These steps can be performed in either order, but you must install both sets of bundles for a given platform:

On SPARC processors: Install solaris-sparc (32-bit) with solaris-sparcv9 (64-bit)
On x64/EM64T processors: Install solaris-i586 (32-bit) with solaris-x64 (64-bit)
Install the 32-bit JDK using the 32-bit JDK Installation Notes .
Install the supplemental files for 64-bit support using the following 64-bit installation instructions.

64-Bit Installation Instructions
As with the installation of the 32-bit JDK, the 64-bit supplemental support is available in two installation formats; use the same installation format as you used for the installation of the 32-bit JDK.

Self-extracting binary - See the Installation of Self-Extracting Binary below
Solaris packages - See the Installation of Solaris Packages below

Note: For any text on this page containing the following notation, you must substitute the appropriate JDK update version number (such as "_01") for the notation.

For example, if you are installing update 1.6.0_01, the following command:

chmod +x jdk-6

-solaris-sparcv9.sh

would become:

chmod +x jdk-6_01-solaris-sparcv9.sh

Note: The jre-1_6_0 -solaris-x64.sh installer provides support for all processors that support the AMD 64-bit extensions to the Intel x86 architecture, including EM64T.

Installation of Self-Extracting Binary
Follow these instructions to add 64-bit support to the JDK which has already been installed using the self-extracting binary. If you want to install Solaris packages comprising the JDK, see Installation of Solaris Packages.

1. Download the self-extracting binary and check the file size to ensure that you have downloaded the full, uncorrupted software bundle.

You can download to any directory you choose; it does not have to be the directory where you want to install the JDK.
Before you download the file, notice its byte size provided on the download page on the web site. Once the download has completed, compare that file size to the size of the downloaded file to make sure they are equal.

2. Make sure that execute permissions are set on the self-extracting binary:

On SPARC processors:
chmod +x jdk-6 -solaris-sparcv9.sh
On x64/EM64T processors:
chmod +x jdk-6 -solaris-x64.sh

3. Change directory to the same directory from where you ran the self-extracting binary for the 32-bit install.

This is the directory that contains the jdk1.6.0 directory of the 32-bit JRE. The next step installs the JDK into the current directory.
4. Run the self-extracting binary.
Execute the downloaded file, prepending the path to it. For example, if the downloaded file is in the current directory, prepend it with " ./" (necessary if " ." is not in the PATH environment variable):
On SPARC processors:
./jdk-6 -solaris-sparcv9.sh

On x64/EM64T processors:
./jdk-6 -solaris-x64.sh

The binary code license is displayed, and you are prompted to agree to its terms.

The supplemental files for 64-bit support are installed in directories named for the machine architecture model, which are added at several locations within the same jdk1.6.0 directory where the 32-bit JDK was installed. For example, on SPARC processors the 64-bit Java VM Library file ( libjvm.so) is stored in the jre/lib/sparcv9/server, whereas the version for x64/EM64T is stored in jre/lib/x64/server.

Installation of Solaris Packages
Use these instructions if you want to use the pkgadd utility to install 64-bit support for the JDK.

1. Download and check the file size to ensure that you have downloaded the full, uncorrupted software bundle.

It's best to create a new directory to save the download bundle to, as the next step will extract several directories and files into this directory. The directory can be anywhere you choose.
Before you download the file, notice its byte size provided on the download page on the web site. Once the download has completed, compare that file size to the size of the downloaded file to make sure they are equal.

2. Extract the contents of the compressed tar file:

On SPARC processors:
zcat jdk-6 -solaris-sparcv9.tar.Z | tar -xf -

On x64/EM64T processors:
zcat jdk-6 -solaris-x64.tar.Z | tar -xf -

This creates several directories ( SUNWj6rtx, SUNWj6dvx, and SUNWj6dmx) in the current directory, which contain 64-bit support for the JDK.

3. Become root by running su and entering the super-user password.

4. Uninstall any earlier installation of 64-bit packages for this version of the JDK.

If your machine has an earlier 64-bit version of the JDK installed in the default location ( /usr/jdk/jdk1.6.0), you must remove it before installing a later 64-bit version at that location.
You can skip this step if you intend to install the 64-bit version of the JDK in a non-default location. For more details, see Selecting the Default Java Platform.

To uninstall the Solaris packages for the JDK, remove them by running:

On all processors:
pkgrm SUNWj6rtx SUNWj6dvx SUNWj6dmx

5. Run the pkgadd command to install the packages.

On all processors:
pkgadd -d . SUNWj6rtx SUNWj6dvx SUNWj6dmx

This command installs the files for 64-bit support into the JDK installation at /usr/jdk/jdk1.6.0.

6. Delete the tar files and extracted SUNW* directories.

7. Exit the root shell. No need to reboot.

Tuesday, June 21, 2011

Sun Enterprise[TM]10000 / Sun Fire[TM] 12K/15K/E20K/E25K servers: Dynamic Reconfiguration (DR) Cheat Sheets

Goal
Dynamic Reconfiguration (DR) has seen a variety of changes over the past years.

Below is a quick guide that can be used to help set up and use DR in the Sun Enterprise 10000 (E10K) and Sun Fire 12K/15K/E20K/E25K server environments.
Solution
Steps to Follow
Sun Enterprise 10000 (E10K)

The method in which DR is enabled, differs according to the Solaris[TM] Operating System(OS)release. This applies to all versions of DR.

For Solaris 2.5.1 OS, DR is enabled by setting the Open Boot PROM(OBP) parameter(dr-max-mem), to any non-zero number via 'setenv' or 'eeprom'. See the following examples.
ok setenv dr-max-mem 1
or
# eeprom dr-max-mem=1

NOTE: If 'dr-max-mem' is set to 0, DR attach/detach is DISABLED. If 'dr-max-mem' is set to anything other than 0 (non-zero), DR attach/detach is ENABLED. This value denotes the maximum memory configuration permitted for the domain after all DR attaches have been completed. For example, a value of 16384 would allow for a maximum of 16GB of memory. However, be careful not to set this variable too high, as it unnecessarily enlarges the kernel and wastes memory that might be better used elsewhere.

For Solaris 2.6 OS(similar to 2.5.1), DR is enabled by setting the OBP parameter (dr-max-mem) to any non-zero number, via 'setenv' or 'eeprom'. See the following examples.
ok setenv dr-max-mem 1
or
# eeprom dr-max-mem=1

NOTE: If 'dr-max-mem' is set to 0, DR attach/detach is DISABLED. If 'dr-max-mem' is set to anything other than 0 (non-zero), DR attach/detach is ENABLED. If the value is specifically set to 2, it will make the number of DR kernel pages at boot time, 5X larger than the normal value. Be aware, that in environments with large configurations (i.e., Tbs of storage), it is possible to exhaust the kernel resources prior to the system becoming fully active. Review Bug ID 4218687 for details.

For Solaris 7-10 OS's, DR is enabled with an entry(kernel_cage_enable) in the /etc/system file. When this variable is set to 1 , it is enabled. If set to 0 then this function is disabled. The 'dr-max-mem' OBP parameter becomes obsolete as well, with Solaris 7-10 OS's. The following, represents an example entry in the /etc/system file, to enable DR:
* DR enabled set kernel_cage_enable=1
* DR entry complete
There are three versions of DR that can be utilized on an E10K platform
Legacy DR (DR) - This was the initial release of DR, seen in SSP 3.1 through SSP 3.3. Each DR operation consisted of a 3 step manual process.
1. To add a board (ex. SB6):
ssp:domain% dr
dr> init_attach 6
dr> drshow 6 obp (to verify board inventory)
dr> complete_attach 6
dr> exit

2. To remove a board (ex. SB6):
NOTE: Stop edd so that no Recordstops can occur during a detach DR operation.
If a Recordstop were to occur during a DR operation, the domain will have to be STOPPED!
Therefore, you should stop 'edd' and then re-start it again after DR is finished with the 'edd_cmd' command:
ssp% edd_cmd -x stop
ssp:domain% dr
dr> drain 6
dr> drshow 6 IO (determine if there is active I/O on board being detached)
dr> complete_detach 6
dr> reconfig
dr> exit
Restart edd again:
ssp% edd_cmd -x start

Automated DR (ADR) - Introduced in SSP 3.3, ADR had a new command structure that would allow users to use DR in scripts to 'automate' the process so each DR operation is completed by one command instead of three, as in the previous release.

New Generation DR (ngdr) - Introduced with the Sun Fire 12K/15K and backported into the E10K in SSP 3.4 and SSP 3.5 running Solaris 8 and Solaris 9 OS. This new command structure, allows for remote DR capabilities as well.
These automated methods may be used for DR operations:
1. addboard -d [-f] [-q] {-b board_number | SB}
2. moveboard -d [-f] [-q] {-b board_number | SB}
3. deleteboard -d [-f] [-q] {-b board_number | SB}

Adding a board (ex. SB6):
ssp% addboard -b 6 -d domain_name -r 2 -t 600
where (-b) is SB#, (-d) is domain name, (-r) is # of retries, (-t) timeout

Removing a board (ex. SB6):
ssp% deleteboard -b 6 -r 2 -t 900

Moving a board (ex. SB6):
ssp% moveboard -b 6 domain_name -r 2 -t 900

If any RT (real-time) processes are running on a domain, it will prevent a DR from completing. These processes must be stopped for DR to work properly, if it complains about them. Use the command:
ssp% ps -eo class | grep RT
to identify which PIDs(Process Ids) to kill if necessary. Be aware of which RT processes are running, and what their exact function is. Be sure to understand any adverse affects that may arise if these processes are killed manually.
________________________________________
Sun Fire 12K/15K/E20K/E25K Servers
________________________________________

Syntax for SMS (System Management Services) 1.x DR commands from the SC (System Controller):
1. addboard -d [-q] [-f]
2. moveboard -d [-q] [-f]
3. deleteboard [-q] [-f]
Examples:
sms> addboard -d A SB10
sms> moveboard -d B SB7
sms> deleteboard SB0

If running 'rcfgadm'(Remote configadm) commands from the SC, the usage may be as follows:
sc0:sms-user:> rcfgadm -d [-f] [-v] -c
function - assign | unassign, configure | unconfigure, or connect | disconnect
APIDs - can be either logical or physical, and are either static or dynamic.
PHYSICAL EXAMPLES:
/devices/pseudo/dr@0:IO4
/devices/pseudo/dr@0:IO6
/devices/pseudo/dr@0:IO14
/devices/pseudo/dr@0:SB4
/devices/pseudo/dr@0:SB6

LOGICAL EXAMPLES:
IO4, IO6, IO14, SB4, SB6

STATIC AP TYPES:
HPCI, CPU, MCPU, pci-pci/hp

DYNAMIC AP TYPES:
cpu, mem, io

Examples:
sc0:sms-user:> rcfgadm -d a -f -c configure SB6
sc0:sms-user:> rcfgadm -d a -c unconfigure IO14
sc0:sms-user:> rcfgadm -d a -c configure SB6
sc0:sms-user:> rcfgadm -d a -c configure pcisch3:e06B1slot2 <--DR an I/O component (See Below) Breakdown of specific I/O card to DR: Example from above: sc0:sms-user:> rcfgadm -d a -c configure pcisch3:e06B1slot2

pcisch<#>: This represents the pcisch device instance number. The example shows that the device being configured is pcisch3, the third instance of a pcisch device for this domain. Prior to configuring a new device instance, you should do a grep pcisch /etc/path_to_inst on the domain to confirm what instances of the device are currently configured. Choose the next available instance to configure into the domain.

e<#>: This indicates the Expander Board location of this device. The example shows that this is e06, indicating the device is located on Expander 06.

B1: Indicates a slot1-type board.
NOTE: The board type will always be B1 on a Sun Fire[TM] 12K/15K/E20K/E25K for the I/O devices, because a slot1 board is the only type of board where these devices can be installed.

Slot<#>: Indicates the Cassette slot# (1-4) that the device is located in on a slot1 board. The example above shows slot2.
This is the Bottom Left cassette slot on the I/O Board.

See Technical Instruction Document: 1017493.1 for a diagram.
Useful information gathering commands:
rcfgadm -d a lists all attachment points except dynamic points.
rcfgadm -d a -al lists all current configurable hardware information (including dynamic).
rcfgadm -d a -avl lists all current configurable hardware in verbose mode.
If the 'cfgadm' (configadm) command on the domain is used:
cfgadm [-f] [-v] -c
Command uses the same syntax rules and examples as you see above with `rcfgadm`. The difference is, that 'cfgadm' is executed on the domain itself, not from the SC as 'rcfgadm' is used. There is no '-d ' option required for 'cfgadm'.

http://download.oracle.com/docs/cd/E19065-01/servers.10k/816-3627-10/index.html

Sun Fire[TM] 12K/15K/E20K/E25K servers: System Controller, Solaris[TM] OS installation, and Solaris[TM] Volume Manager (SVM) Configuration

Goal
The following procedure can be used to configure a Solaris[TM] installation and Solaris[TM] Volume Manager (SLVM) configuration on the System Controller when the System Controller (SC) disk must be completely rebuilt due to a failure.
Solution
How to configure Sun Fire[TM] 12K/15K/E20K/E25K servers:
1. As part of the Solaris installation, partition the disk at c0t2d0 following the partition tables:
Part Size
0 8.00 GB /
1 4.00 GB swap
4 32.00 MB SLVM DB
5 32.00 MB SLVM DB
7 Avail. Free /export/install
Note: your previous installation of Solaris[TM] may have a two, four or four and a half gig swap space. All are supported, but new installations are recommended installing a four gig slice.

2. After the Solaris installation is complete, make sure the partition map on c0t3d0 is the same as c0t2d0. One method:
Run format
Select c0t2d0
Choose (p)artitions and name the map SCDISK
(q)uit the partition menu.
Choose (d)isks and select c0t3d0.
Choose (s)elect, pick SCDISK from the list presented and (l)abel the disk.
Exit format.
Or:
Run this command as root:
prtvtoc /dev/rdsk/c0t2d0s2 | fmthard -s - /dev/rdsk/c0t3d0s2

3. Install the SDS/SVM packages. (Note that SVM is bundled with Solaris 9 and higher)

4. Add the following to the end of the /etc/lvm/md.tab file.
mddb01 -c 3 c0t2d0s4
mddb02 -c 3 c0t2d0s5
mddb03 -c 3 c0t3d0s4
mddb04 -c 3 c0t3d0s5
d10 -m d11
d11 1 1 /dev/dsk/c0t2d0s0
d12 1 1 /dev/dsk/c0t3d0s0
d20 -m d21
d21 1 1 /dev/dsk/c0t2d0s1
d22 1 1 /dev/dsk/c0t3d0s1
d30 -m d31
d31 1 1 /dev/dsk/c0t2d0s7
d32 1 1 /dev/dsk/c0t3d0s7

5. Initialize the meta-databases with:
# metadb -a -f mddb01
# metadb -a mddb02
# metadb -a mddb03
# metadb -a mddb04

6. Remove the swap device.
# swap -d /dev/dsk/c0t2d0s1

7. Initialize the disk in the mirror and the mirror itself.
# metainit d21
# metainit d20

8. Add the new swap device and modify the /etc/vfstab file.
# swap -a /dev/md/dsk/d20
/dev/md/dsk/d20 - - swap - no - vfstab entry

9. Unmount /export/install and initialize its disk and mirror.
# umount /export/install
# metainit d31
# metainit d30

10. Change /export/install's entry in /etc/vfstab to:
/dev/md/dsk/d30 /dev/md/rdsk/d30 /export/install ufs 2 yes logging

11. Perform a newfs on /export/install and mount it.
# newfs -i 8192 -m 1 /dev/md/rdsk/d30
# mount /export/install

12. Prepare the disk and mirror for root.
# metainit -f d11
# metainit d10
# metaroot d10

13. REBOOT THE SYSTEM CONTROLLER.

14. Finally, synchronize the mirrors.
# metainit d12
# metattach d10 d12
# metainit d22
# metattach d20 d22
# metainit d32
# metattach d30 d32

Solaris IP Multipathing made easy

Solaris IP Multipathing made easy

I've recently setup a bunch of machines for IP multipathing (i.e. recent to this article - Nov 28, 2001) by following the Sun blueprint paper. I thought I would share a simpler step by step approach.

1. get 2 network interface cards in your machine (some machines, like Netra T1 series have 2 builtin). It is not required that they be the same type (e.g. Sun SF280 would have an eri0 internal and an hme in a PCI slot), but it is important that they have the same speed capability.

2. Obtain 4 IP addresses in the same local lan (or vlan) segment. In Multipathing there are 2 fixed (or private) address and 2 floating (or public) addresses. The 2 fixed addresses I refer to as internal. One is assigned directly to each hardware interface. The 2 floating addresses are the external ones. If one of the NICs detects link failure, the address tied to that NIC fails over to the working NIC. When the NIC comes back up, the address fails back to its original home. Determine right now which will be your internal IPs and which will be your external. I recommend keeping the same convention for all Multipathed machines, no matter what convention you choose. Here are two typical conventions:

i. The first 2 IPs in the series are fixed and the second 2 are floating.

ii. The odd IPs are fixed and the even IPs are floating (or vice versa)

3. edit /etc/hosts with your 4 IPS. example:

4. 298.178.99.137 host-int0

5. 298.178.99.138 host-int1

6. 298.178.99.139 host-ext0 host-dummy

7. 298.178.99.140 host-ext1 host.eng.auburn.edu

In this example, the first two ips are fixed (internal) to the NICs, and the second 2 are floating. The last one is the one we use to tie to the machine name for programs that might have licensing restrictions tied to particular hostnames. (Always make the hostname tied to one of the public/external/failover NICs)

8. Configure network interfaces.

At the beginning you'll have one network interface (the secondary) that is unconfigured, and another that would initially look something like this:

hme0: flags=1000843 mtu 1500 index 2

inet 298.178.99.141 netmask fffffff0 broadcast 298.178.99.143

ether 8:0:20:ff:5b:e2

You need to configure the secondary interface and make it have a unique ether address that is persistent across reboots. I like to take the address of the hme0 (or eri0 or whatever) card and add 1 to the last octet.

# eeprom 'local-mac-address?=true'

# /sbin/ifconfig hme1 plumb

# /sbin/ifconfig hme1 ether 8:0:20:ff:5b:e3

9. Setup hostname.* files.

You can pretty much copy these two files as is and just modify them slightly to fit your naming conventions in the same way that you setup the /etc/hosts file above.

/etc/hostname.hme0

host-int0 netmask + broadcast + group production deprecated -failover up \

addif host-ext0 netmask + broadcast + failover up

/etc/hostname.hme1

host-int1 netmask + broadcast + group production deprecated -failover up \

addif host-ext1 netmask + broadcast + failover up

10. adjust failover detection timeouts

/etc/default/mpathd has a default failover timeout of 10000. This means that it should take 10 at most seconds to detect and successfully fail over an interface. I like to configure this to 2500. In my working with IP multipathing, numbers below that seem to result in excessive messages about that number being too low and lots of messages in syslog. If you change this file, you will have to restart mpathd. Now is as good a time as any to either restart mpathd or start it for the first time if it is not already running.

11. If you use a default router, it must be pingable at all times from both interfaces. mpathd will ping your default router every at second intervals. If you do not use a default router, then you need to run the router discovery daemon /usr/sbin/in.rdisc. This daemon should start automatically at boot time under the appropriate circumstances, but doesn't always (See Sun blueprint article for more thorough discussion). It is helpful to have a helper file to automatically start it if it is not already running. You can use this one if you like. Save it as /etc/rc2.d/S70rdisc and make a link in /etc/init.d

When do you want to use which? It boils down to the same choices on a non multipathed host. Do you have one router on your lan or do you have multiple? If you have only one (or a pair using HSRP or other failover protocol), then you can use a default route. If you have more than one router, then you want to use in.rdisc much as you would use routed in a non multipathed host setup. Make sure you have router discovery announcements enabled on your routers in this situation.

TIP

Plug each physical interface into a separate switch to make effective use of multipathing. After that there are several ways you can configure your high availability. You can plug each switch into 2 routers and use HSRP to do router failover. In this case, having the Sun use a default route would be fine. Or, you could have each switch singly connected to a specific router on the same lan, and run in.rdisc on the sun to detect these interfaces and perform failover. A typical configuration is illustrated at right.

12. make it active

This is the easy part. Copy and paste your /etc/hostname.hme* files to ifconfig commands as below:

# /sbin/ifconfig hme0 host-int0 netmask + broadcast + group production deprecated -failover up \

addif host-ext0 netmask + broadcast + failover up

# /sbin/ifconfig hme1 host-int1 netmask + broadcast + group production deprecated -failover up \

addif host-ext1 netmask + broadcast + failover up

13. Troubleshooting

Occassionally you will see messages like this in your syslog files:

Nov 29 16:02:10 host.eng.auburn.edu in.mpathd[32]: [ID 398532 daemon.error]

Cannot meet requested failure detection time of 2500 ms on (inet eri0) new

failure detection time is 5922 ms

Nov 29 16:12:29 host.eng.auburn.edu in.mpathd[32]: [ID 122137 daemon.error]

Improved failure detection time 3644 ms

Nov 29 16:12:29 host.eng.auburn.edu in.mpathd[32]: [ID 122137 daemon.error]

Improved failure detection time 2500 ms

I find that they are largely ignoreable. Failover still works.

There is a known issue with Solaris8 IMP where both interfaces can fail under high load if a particular patch is not installed. Reboot will not fix the situation, you must have the patch: 108528-15 (or later)

When you have a failure event of some kind, you'll see a message like this:

Nov 21 23:03:58 host.eng.auburn.edu in.mpathd[266]: [ID 832587 daemon.error]

Successfully failed over from NIC eri1 to NIC eri0

When it comes back, you'll see one like this:

Nov 23 15:25:00 host.eng.auburn.edu in.mpathd[266]: [ID 620804 daemon.error]

Successfully failed back to NIC eri0

If you see one like this, it's time to run to the switch closet:

Nov 23 15:23:56 host.eng.auburn.edu in.mpathd[266]: [ID 168056 daemon.error]

All Interfaces in group production have failed

________________________________________

Take the opportunity to test it out. Unplug one of your Cat5+ cables and watch failover work. Run a continuous ping to the machine. It's rather nice.

________________________________________

Failover with 1 public IP

Now that you know how to setup resilient balancing links, you might be interested in how to setup a group with only 1 public, failover interface.

The advantages of this are

1. easier debugging - With the previous situation, you would have to snoop on both interfaces and correlate the traffic. With only 1 interface, you snoop in one place and see all traffic

2. easier firewalling - With 2 public interfaces, the traffic could initiate from either, possibly making firewalling a bit difficult since the source traffic could change from one IP to the other mid session.

3. 1 fewer IP consumed.

The following configuration has been tested and submitted by Eric Krohn

Primary Interface

# cat /etc/hostname.hme0

DUMMY1 netmask + broadcast + \

group production deprecated -failover up \

addif REALNAME netmask + broadcast + failover up

Standby Interface

# cat /etc/hostname.hme1

DUMMY2 netmask + broadcast + \

group production deprecated -failover standby up

/etc/hosts file

# cat /etc/hosts

#

# Internet host table

#

127.0.0.1 localhost

192.168.10.10 REALNAME loghost

192.168.10.11 DUMMY1

192.168.10.12 DUMMY2

#

What does this do? It sets up two dummy (private) IP addresses that are fixed to the interfaces. It sets up a failover group named production. It adds an IP REALNAME to the group and marks it as the failover IP that will be migrated, and hme1 is set as the standby interface. In most situations, hme0 will be used to transmit and receive packets. In the case of failure (interface, switch, cable, router, etc), the IP for REALNAME will migrate to hme1 interface. When hme0 recovers, the IP will migrate back.

Sunday, June 19, 2011

Upgrading System Disk Firmware without Downtime

Applies to:
Sun Fire 12K Server
Sun Fire 15K Server
Sun Fire E20K Server
Sun Fire E25K Server
All Platforms
***Checked for relevance on 11-May-2011***
Goal
This procedure describes how to upgrade system disk firmware when the disks are mirrored under LVM.
Solution
This procedure was performed on a system controller of a Sun Fire[TM] E20K Server. There are three metamirrors: d10, d20 and d30, each consisting of submirrors named d[1,2,3][1,2]. For example, d11 and d12 are submirrors of d10.

PREPARATION:

1. Using the format command in combination with a search on http://support.oracle.com, the external user will be able to locate the correct patch for downloading.

In this example, the disks are 72 GB drives. Each required the patch numbered 116370-XX.

2. Once the patch is located, it should be downloaded to the system where the patch will be applied.

3. Change your focus to the patch directory and work from there. (This is suggested for the convienence of the author rather than for technical reasons).

4. Determine the metadevices available:

(to be brief only the first metadevice will be shown)

root@ssc1-aopc # metastat
d30: Mirror
Submirror 1: d32
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 122223936 blocks (58 GB)d32: Submirror of d30
State: Okay
Size: 122223936 blocks (58 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t3d0s7 0 No Okay Yes d31: Submirror of d30
State: Okay
Size: 122223936 blocks (58 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s7 0 No Okay Yes Note: The system is using two physical devices, c0t3d0 and c0t2d0. Also, other metadevices, d10 and d20 are not displayed.

5. Either of the sub-mirrors may be detached from the mirror, provided that all detached sub-mirrors are part of the same physical device; say c0t2d0.

root@ssc1-aopc # detattach d30 d31
root@ssc1-aopc # detattach d20 d21
root@ssc1-aopc # detattach d10 d11Which frees the sub-mirrors, d11, d21, and d31 from the respective mirrors d10, d20 and d30.

6. The finall preparation will be to remove any metadb's from the the system disk slices. As seen below, metadb's exist on slices 4 and 5 of the disk to be patched: c0t2d0.

root@ssc1-aopc # metadb -i flags first blk block count
a m p luo 16 8192 /dev/dsk/c0t2d0s4
a p luo 8208 8192 /dev/dsk/c0t2d0s4
a p luo 16400 8192 /dev/dsk/c0t2d0s4
a p luo 16 8192 /dev/dsk/c0t2d0s5
a p luo 8208 8192 /dev/dsk/c0t2d0s5
a p luo 16400 8192 /dev/dsk/c0t2d0s5
a p luo 16 8192 /dev/dsk/c0t3d0s4
a p luo 8208 8192 /dev/dsk/c0t3d0s4
a p luo 16400 8192 /dev/dsk/c0t3d0s4
a p luo 16 8192 /dev/dsk/c0t3d0s5
a p luo 8208 8192 /dev/dsk/c0t3d0s5
a p luo 16400 8192 /dev/dsk/c0t3d0s5
r - replica does not have device relocation information
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errorsSo, the metadb's must be removed:

root@ssc1-aopc # metadb -d /dev/dsk/c0t2d0s4
root@ssc1-aopc # metadb -d /dev/dsk/c0t2d0s5
root@ssc1-aopc # metadb -i flags first blk block count
a p luo 16 8192 /dev/dsk/c0t3d0s4
a p luo 8208 8192 /dev/dsk/c0t3d0s4
a p luo 16400 8192 /dev/dsk/c0t3d0s4
a p luo 16 8192 /dev/dsk/c0t3d0s5
a p luo 8208 8192 /dev/dsk/c0t3d0s5
a p luo 16400 8192 /dev/dsk/c0t3d0s5
r - replica does not have device relocation information
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errorsNow the preparations are complete and we are ready to continue patching the system disk c0t2d0.

PATCHING THE DISK:

1. Run following command:

root@ssc1-aopc # pwd
/116370-132. Execute the download utility:

root@ssc1-aopc # ./download Firmware Download Utility, V4.3.1
************************** WARNING **************************
NO OTHER ACTIVITY IS ALLOWED DURING FIRMWARE UPGRADE!!! No other programs
including any volume manager (e.g. Veritas, SDS, or Vold) should be running.
Other host systems sharing any I/O bus with this host
must either be offline or disconnected. Any interruption (e.g. power loss)
during upgrade can result in damage to devices being upgraded. Any disk
to be upgraded should first have its data backed up.
***************************************************************Searching for devices...DISK DEVICES
Device Rev Product
c0t2d0-c0t3d0: 0507 ST373307L -- SUN72G
Total Devices: 2****************Although two devices show requiring this patch, only c0t2d0 will be upgraded.

HELP will list the commands available in the utility.

****************Enter command: helpExamples:
program - program all devices
program disk - program all disk devices
c1 - program disk devices on c1
c1t2d0 - program disk c1t2d0
c1t2d0-c2t3d0 - program disks c1t2d0-c2t3d0
c1t2d0- - program disks from c1t2d0
rmt/0 - program tape rmt/0
inquiry - display device inquiry data
list disk - display only disk devices
list ? - display help for list command
quit - exitCommands:
program [options] - program devices
list [options] - list devices
inquiry [options] - list device inquiry data
exclude [device] - exclude devices
include [device] - include devices
mode select [options] - reset all mode page valuesDevice/Options:
disk - disk devices all - all devices
cd - cd devices ata - ATA devices
tape - tape devices scsi - FC-AL/SCSI devices
? - command help rsm - A1000/3000 devices*************
We wish to exclude the system disk that is currently running the system: c0t3d0s0. Recall, all slices of c0t2d0 were detached.
*************Enter command: exclude c0t3d0
c0t3d0: device excluded*************
Update the firmware on the drive
*************Enter command: program
Upgrading devices...c0t2d0s4: Open failed, you must be the sole user! <--------- if you see this
check for more
metadb's.Suppose the error did not occur, then you should see:

c0t2d0: Successful download
c0: recovery delay, 45 sec.**************
Listing the drives remaining to be updated
**************Enter command: list

DISK DEVICES
Device Rev Product
c0t3d0: ST373307L -- SUN72G
Total Devices: 1

Enter command: quit

*************At this point the firmware on c0t2d0 has been upgraded. The metadbs must be recreated, the sub-mirrors, d12, d22 and d32 must be reattached, allowed to resync, and then the procedure may be safely completed using the drive c0t3d0 while running on c0t2d0.

**************RECREATING THE METADBs

1. Enter the commands:

root@ssc1-aopc # metadb -c 3 -a /dev/dsk/c0t2d0s4
root@ssc1-aopc # metadb -c 3 -a /dev/dsk/c0t2d0s5To verify that the metadbs have been created, enter:

root@ssc1-aopc # metadb -i flags first blk block count
a u 16 8192 /dev/dsk/c0t2d0s4
a u 8208 8192 /dev/dsk/c0t2d0s4
a u 16400 8192 /dev/dsk/c0t2d0s4
a u 16 8192 /dev/dsk/c0t2d0s5
a u 8208 8192 /dev/dsk/c0t2d0s5
a u 16400 8192 /dev/dsk/c0t2d0s5
a p luo 16 8192 /dev/dsk/c0t3d0s4
a p luo 8208 8192 /dev/dsk/c0t3d0s4
a p luo 16400 8192 /dev/dsk/c0t3d0s4
a p luo 16 8192 /dev/dsk/c0t3d0s5
a p luo 8208 8192 /dev/dsk/c0t3d0s5
a p luo 16400 8192 /dev/dsk/c0t3d0s5
r - replica does not have device relocation information
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errorsREATTACHING THE SUB-MIRRORS

1. Enter the following commands:

root@ssc1-aopc # metattach d10 d11
d10: submirror d11 is attached
root@ssc1-aopc # metattach d20 d21
d20: submirror d21 is attached
root@ssc1-aopc # metattach d30 d31
d30: submirror d31 is attached2. Verify that the sub-mirrors are resyncing.

root@ssc1-aopc # metastatFeb 9 15:02:56 ssc1-aopc scman: NOTICE: scman1 Link upd30 submirror 0: d31
State: Resyncing
Submirror 1: d32
State: Okay
Resync in progress: 0 % done
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 122223936 blocks (58 GB)d31: Submirror of d30
State: Resyncing
Size: 122223936 blocks (58 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s7 0 No Okay Yes d32: Submirror of d30
State: Okay
Size: 122223936 blocks (58 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t3d0s7 0 No Okay Yes

*************
Again, the display for d20 and d10 are not shown.
*************
To periodically check if the syncing has completed, run the command:

root@ssc1-aopc # metastat|grep %

Resync in progress: 0 % done
Resync in progress: 23 % done
Resync in progress: 6 % done

When nothing is returned, the resync of all sub-mirrors have completed. You may repeat this procedure, upgrading the remaining system disk.

IMPORTANT!!! When you rerun the ./download command to upgrade the remaining disk, you must use the include c0t3d0s (in this example) in order to see and program the disk.

Friday, June 17, 2011

Mirrored root disk on Solaris

1. Partition the first disk
# format c0t0d0
Use the partition tool (=> "p , p "!) to setup the slices. We assume the following slice setup afterwards:
# Tag Flag Cylinders Size Blocks
- ---------- ---- ------------- -------- --------------------
0 root wm 0 - 812 400.15MB (813/0/0) 819504
1 swap wu 813 - 1333 256.43MB (521/0/0) 525168
2 backup wm 0 - 17659 8.49GB (17660/0/0) 17801280
3 unassigned wm 1334 - 1354 10.34MB (21/0/0) 21168
4 var wm 1355 - 8522 3.45GB (7168/0/0) 7225344
5 usr wm 8523 - 14764 3.00GB (6242/0/0) 6291936
6 unassigned wm 14765 - 16845 1.00GB (2081/0/0) 2097648
7 home wm 16846 - 17659 400.15MB (813/0/0) 819504
2. Copy the partition table of the first disk to its future mirror disk
# prtvtoc /dev/rdsk/c0t0d0s2 fmthard -s - /dev/rdsk/c0t1d0s2
3. Create at least two state database replicas on each disk
# metadb -a -f -c 2 c0t0d0s3 c0t1d0s3
Check the state of all replicas with metadb:
# metadb
Notes:
A state database replica contains configuration and state information about the meta devices. Make sure that always at least 50% of the replicas are active!

4. Create the root slice mirror and its first submirror
# metainit -f d10 1 1 c0t0d0s0
# metainit -f d20 1 1 c0t1d0s0
# metainit d30 -m d10
Run metaroot to prepare /etc/vfstab and /etc/system (do this only for the root slice!):
# metaroot d30
5. Create the swap slice mirror and its first submirror
# metainit -f d11 1 1 c0t0d0s1
# metainit -f d21 1 1 c0t1d0s1
# metainit d31 -m d11
6. Create the var slice mirror and its first submirror
# metainit -f d14 1 1 c0t0d0s4
# metainit -f d24 1 1 c0t1d0s4
# metainit d34 -m d14
7. Create the usr slice mirror and its first submirror
# metainit -f d15 1 1 c0t0d0s5
# metainit -f d25 1 1 c0t1d0s5
# metainit d35 -m d15
8. Create the unassigned slice mirror and its first submirror
# metainit -f d16 1 1 c0t0d0s6
# metainit -f d26 1 1 c0t1d0s6
# metainit d36 -m d16
9. Create the home slice mirror and its first submirror
# metainit -f d17 1 1 c0t0d0s7
# metainit -f d27 1 1 c0t1d0s7
# metainit d37 -m d17
10. Edit /etc/vfstab to mount all mirrors after boot, including mirrored swap

/etc/vfstab before changes:
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c0t0d0s1 - - swap - no -
/dev/md/dsk/d30 /dev/md/rdsk/d30 / ufs 1 no logging
/dev/dsk/c0t0d0s5 /dev/rdsk/c0t0d0s5 /usr ufs 1 no ro,logging
/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /var ufs 1 no nosuid,logging
/dev/dsk/c0t0d0s7 /dev/rdsk/c0t0d0s7 /home ufs 2 yes nosuid,logging
/dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /opt ufs 2 yes nosuid,logging
swap - /tmp tmpfs - yes -
/etc/vfstab after changes:
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/md/dsk/d31 - - swap - no -
/dev/md/dsk/d30 /dev/md/rdsk/d30 / ufs 1 no logging
/dev/md/dsk/d35 /dev/md/rdsk/d35 /usr ufs 1 no ro,logging
/dev/md/dsk/d34 /dev/md/rdsk/d34 /var ufs 1 no nosuid,logging
/dev/md/dsk/d37 /dev/md/rdsk/d37 /home ufs 2 yes nosuid,logging
/dev/md/dsk/d36 /dev/md/rdsk/d36 /opt ufs 2 yes nosuid,logging
swap - /tmp tmpfs - yes -
Notes:
The entry for the root device (/) has already been altered by the metaroot command we executed before.

11. Reboot the system
# lockfs -fa && init 6
12. Attach the second submirrors to all mirrors
# metattach d30 d20
# metattach d31 d21
# metattach d34 d24
# metattach d35 d25
# metattach d36 d26
# metattach d37 d27
Notes:
This will finally cause the data from the boot disk to be synchronized with the mirror drive.
You can use metastat to track the mirroring progress.

13. Change the crash dump device to the swap metadevice
# dumpadm -d `swap -l tail -1 awk '{print $1}'
14. Make the mirror disk bootable
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0

Solaris Tips

Solaris Tips

Mostly Solaris 10

Stoping console break
edit /etc/default/kbd

and remove the KEYBOARD_ABORT=disable comment

or add the above line

then activate with kbd -i

Shuting down a zone
do not do a zoneadm -z halt - it's nasty

zlogin init 0

Starting a zone
zoneadm -z boot

Zone status
zoneadm list -cv

Adding filesystems to a zone
from the GZ
root@pbigz00 [root] # zonecfg -z
zonecfg:pbidb01> add fs
zonecfg:pbidb01:fs> set dir=/optware/oracle
zonecfg:pbidb01:fs> set special=/dev/md/dsk/d52
zonecfg:pbidb01:fs> set raw=/dev/md/rdsk/d52
zonecfg:pbidb01:fs> set type=ufs
zonecfg:pbidb01:fs> add options [nodevices,logging]
zonecfg:pbidb01:fs> end
zonecfg:pbidb01> verify
zonecfg:pbidb01> commit

/optware/oracle will be pulled into the zone
note you cannot mount the filesystem in the global zone at the sametime

Removing a filesystem from a zone
zonecfg:pbidb01> remove fs dir=/optware
Dumping a zone config to a text file
zonecfg -z pbidb01 info
zonename: pbidb01
zonepath: /export/zones/pbidb01
autoboot: true
pool:
limitpriv:
fs:
dir: /optware/oracle
special: /dev/md/dsk/d52
raw: /dev/md/rdsk/d52
type: ufs
options: [nodevices,logging]
fs:
dir: /oradata/ICPB1
special: /dev/md/dsk/d53
raw: /dev/md/rdsk/d53
type: ufs
options: [nodevices,logging]
fs:
dir: /oradata/BIEMEA
special: /dev/md/dsk/d55
raw: /dev/md/rdsk/d55
type: ufs
options: [nodevices,logging]
fs:
dir: /oradata/POEUI
special: /dev/md/dsk/d56
raw: /dev/md/rdsk/d56
type: ufs
options: [nodevices,logging]
fs:
dir: /oradata/POEUI3
special: /dev/md/dsk/d57
raw: /dev/md/rdsk/d57
type: ufs
options: [nodevices,logging]
fs:
dir: /oradata/exp
special: /dev/md/dsk/d54
raw: /dev/md/rdsk/d54
type: ufs
options: [nodevices,logging]
net:
address: 1.2.3.4
physical: eri0
device
match: /dev/tlock
rctl:
name: zone.cpu-shares
value: (priv=privileged,limit=20,action=none)
attr:
name: comment
type: string
value: "Zone pbidb01 created by mkzone"

Login into zones
ssh as normal

or is you cannot
as root :
zlogin zonename

will give you root on the zone

Console login to a zone
zlogin -C zonename

Run a command on a zone
zlogin zonename "command"

Growing a zone exported filesystem (sds)
shutdown the zone
double check the metadevice that the file system exists on on the global zone
check for space on the parent device

metastat d50

d50: Soft Partition
Device: d100
State: Okay
Size: 125829120 blocks (60 GB)
Extent Start Block Block count
0 20384 125829120

d100: Mirror
Submirror 0: d101
State: Okay
Submirror 1: d102
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 1432373760 blocks (683 GB)

now we know it's part of d100

check the unallocated space of d100

metarecover -v -n /dev/md/rdsk/d100 -p
Verifying on-disk structures on d100.
The following extent headers were found on d100.
Name Seq# Type Offset Length
d50 0 ALLOC 20383 125829121
d51 0 ALLOC 125849535 125829121
d52 0 ALLOC 251678687 41943041
d53 0 ALLOC 293621759 387973121
d54 0 ALLOC 681594911 104857601
d55 0 ALLOC 786452543 136314881
d56 0 ALLOC 922767455 272629761
d56 1 ALLOC 1363169439 68132865
d57 0 ALLOC 1195397247 167772161
NONE 0 END 1432373759 1
NONE 0 FREE 125829121 125829121
there is 125829121 blocks free space! (60gb)
now increase the metadevice
metattach d50 60g
metastat d50

d50: Soft Partition
Device: d100
State: Okay
Size: 251658242 blocks (120 GB)
Extent Start Block Block count
0 20384 125829121
1 125829121 125829121

now growfs the softpartition

growfs /dev/md/rdsk/d56
(dont panic it just looks like a newfs!)

test mount on global zone , umount and restart zone with new volume size

Metasets
Normal metacommands will function as expected but will need to be called with –s to allow for metaset use
Eg: metastat –s

Create metaset (you can have 4 hosts)

metaset –s -a h

add disks to metaset

metaset –s -a ….

At this point normal SDS commands can be used to create volumes on the metasets

Eg: metainit –s dXXX 1 1 cXtXdX cYtYdY

Failover
Failover from one node to another is done with metoffline and metaonline
On host 1
Metaoffline –s dX

On host 2
Metaonline –s dX

Dtrace stuff

Dtrace - watch new processes and print full ags
dtrace -n 'proc:::exec-success { trace(curpsinfo->pr_psargs); }'

New command options under Sol 10
prstat -Z - show zone info
df -h - show df in more human readable form

Enable Resource Pools
/usr/sbin/svcadm enable svc:/system/pools:default

Display current pool settings
poolcfg -dc info

system default
string system.comment
int system.version 1
boolean system.bind-default true

pool pool_default
int pool.sys_id 0
boolean pool.active true
boolean pool.default true
int pool.importance 1
string pool.comment
pset pset_default

pset pset_default
int pset.sys_id -1
boolean pset.default true
uint pset.min 1
uint pset.max 65536
string pset.units population
uint pset.load 93
uint pset.size 4
string pset.comment

cpu
int cpu.sys_id 1
string cpu.comment
string cpu.status on-line

cpu
int cpu.sys_id 0
string cpu.comment
string cpu.status on-line

cpu
int cpu.sys_id 3
string cpu.comment
string cpu.status on-line

cpu
int cpu.sys_id 2
string cpu.comment
string cpu.status on-line

Reset Pools to default
/usr/sbin/pooladm –x

Hack Hostid on the fly

get the target host id :

/usr/bin/adb -w -k /dev/ksyms /dev/mem

hw_serial /D
hw_serial /D+4
hw_serial /D+8

/usr/bin/adb -w -k /dev/ksyms /dev/mem << EOF hw_serial /W 32312539 hw_serial+4 /W 30355635 hw_serial+8 /W 39344400 EOF defeats flexlm ! and others ! E450 disk to slot map prtdiag -v shows this Front Status Panel: ------------------- Keyswitch position is in On mode. System LED Status: POWER GENERAL ERROR ACTIVITY [ ON] [OFF] [ ON] DISK ERROR THERMAL ERROR POWER SUPPLY ERROR [OFF] [OFF] [OFF] Disk LED Status: OK = GREEN ERROR = YELLOW DISK 18: [EMPTY] DISK 19: [EMPTY] DISK 16: [EMPTY] DISK 17: [EMPTY] DISK 14: [EMPTY] DISK 15: [EMPTY] DISK 12: [EMPTY] DISK 13: [EMPTY] DISK 10: [EMPTY] DISK 11: [EMPTY] DISK 8: [EMPTY] DISK 9: [EMPTY] DISK 6: [EMPTY] DISK 7: [EMPTY] DISK 4: [EMPTY] DISK 5: [EMPTY] DISK 2: [OK] DISK 3: [OK] DISK 0: [OK] DISK 1: [OK] But you know your have at least 12 internal disks in yout e450 !!!! Sun E450 owners manual At the ok prompt enter the following command: setenv disk-led-assoc 0 x y where:n x is an integer between 1 and 10 identifying the PCI slot number where the lower UltraSCSI controller is installed n y is an integer between 1 and 10 identifying the PCI slot number where the upper UltraSCSI controller is installed For example, if the controller cards are installed in PCI slots 5 and 7, enter the following: setenv disk-led-assoc 0 5 7 then reset system for changes to take effect Sendmail - without daemon echo "MODE=" > /etc/default/sendmail

then bounce sendmail , you wull now notice it's got to -bd in the args

now we need to alter the submit.cf , best to use the macros lest it get clobbered with a patch

perl -p -i*.bak -e 's/127.0.0.1/mailhost/g' /usr/lib/mail/cf/submit.cf
(assming mailhost is the name of your smtp server)

Compile the new submit.cf file
cd /usr/lib/mail/cf
/usr/ccs/bin/m4 ../m4/cf.m4 submit.mc > submit.cf
Copy this new submit.cf file into place
cp /usr/lib/mail/cf/submit.cf /etc/mail/submit.cf

Remember every time you apply a sendmail patch on this machine, rebuild the submit.cf file.
- caveat -
only tested on Solaris 9 ;) .... YMMV

New (Since 10U2) networking commands

## show interface , link and status
# dladm show-dev
e1000g0 link: up speed: 1000 Mbps duplex: full
e1000g1 link: unknown speed: 0 Mbps duplex: unknown
e1000g2 link: unknown speed: 0 Mbps duplex: unknown
e1000g3 link: unknown speed: 0 Mbps duplex: unknown

# dladm show-link
e1000g0 type: non-vlan mtu: 1500 device: e1000g0
e1000g1 type: non-vlan mtu: 1500 device: e1000g1
e1000g2 type: non-vlan mtu: 1500 device: e1000g2
e1000g3 type: non-vlan mtu: 1500 device: e1000g3

Link aggregation with solaris 10
prior to Sol10 U2? you had to buy suntrunking to truck interfaces

unplumb the e1000g0 we used to build the system
# ifconfig e1000g0 unplumb
also the ipv6 version if used
# ifconfig e1000g0 inet6 unplumb

# dladm create-aggr -d e1000g0 -d e1000g1 1
# dladm show-aggr
key: 1 (0x0001) policy: L4 address: 0:21:28:f:be:7a (auto)
device address speed duplex link state
e1000g0 0:21:28:f:bb:7a 1000 Mbps full up standby
e1000g1 0:21:28:f:bb:7b 1000 Mbps full up standby
# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
lo0: flags=2002000849 mtu 8252 index 1
inet6 ::1/128

now plumb the aggr interface

# ifconfig aggr1 plumb
# ifconfig aggr1 192.168.1.2 netmask 255.255.255.0

setup a hostname.aggr1 file to ensure the interfaces are setup on boot

Convert SDxx to cxtxdxsx paths
annoying when you have many disks and the san guys add more and your hit with a deluge of this messages:

Sep 10 13:22:48 blue Corrupt label; wrong magic number

Sep 10 13:22:48 blue scsi: [ID 107833 kern.warning] WARNING: /pci@0/pci@0/pci@8/pci@0/pci@a/SUNW,qlc@0,1/sd@0,b (sd223):

Sep 10 13:22:48 blue Corrupt label; wrong magic number

Sep 10 13:22:48 blue scsi: [ID 107833 kern.warning] WARNING: /pci@0/pci@0/pci@8/pci@0/pci@a/SUNW,qlc@0,1/sd@0,b (sd223):

the following script helps map the SDxx instance to scsi hardware location

#!/bin/sh
#
# @(#) wd 1.1 99/03/09
#
# Used to convert sd instance numbers (eg. sd20) to SCSI hardware location
# (eg. c1t5d0s0) or the other way around. Can also list all such devices
# by using the -all option.
#
# Credits:
# hw2inst() function based on "phylog" script by Nick Hindley
# inst2hw() function based on "whatdev" script from Sun Microsystems
#
# Revision History
# 1.1 1999-03-09 Mike van der Velden
# Original Version. Converts sdxx to cxdxtxsx and back.
# Optionally lists all such devices.
#
# Feedback, bug fixes, enhancements? Send to mvanderv@yahoo.com
#

inst2hw ()
{
# Convert the sd instance number into the SCSI hardware location.
# Grabbed from the "phylog" script by Nick Hindley, 1998-08-04

DEVNAME=$1
TYPE=`echo $DEVNAME | cut -c 1-2`
NUM=`echo $DEVNAME | cut -c3-255`

DEVPATH=`sed 's/"//g' /etc/path_to_inst | \
nawk -v type=$TYPE -v num=$NUM \
'{if (($2==num) && ($3==type)) print $1;}'`

if [ -z "$DEVPATH" ]; then
echo "No such device $DEVNAME"
exit 2
fi

# now get all the devices out of /dev.
# No way that I know of to map this back.

for p in /dev/dsk /dev/rdsk /dev/rmt /dev/osa/dev/dsk /dev/osa/dev/rdsk;
do
if [ -d $p ]; then
DEV=`ls -l $p | \
nawk -v device=$DEVPATH \
'{if ($NF ~ device) {print $(NF - 2);exit;}}'`
if [ ! -z "$DEV" ]; then
# still need work on the st/rmt devices, which currently
# prints out as a simple number, not as, say rmt/0.
echo $DEV
break
fi
fi
done
}

hw2inst ()
{
# Convert the SCSI hardware location into the sd instance number.
# From the script "whatdev" from the Solaris 2.X on Sun Hardware
# Answerbook (or http://docs.sun.com)

devname=$1

for p in /dev /dev/osa/dev/dsk /dev/osa/dev/rdsk /dev/dsk /dev/rdsk /dev/rmt;
do
if [ -h $p/$devname ]; then
DEVPATH=$p/$devname
break
fi
done

if [ -z "$DEVPATH" ]; then
echo "No such device $devname"
exit 2
fi

# print out the drive name - st0 or sd0 - given the /dev entry
# first get something like "/iommu/.../.../sd@0,0"
DEV=`/bin/ls -l $DEVPATH | \
nawk '{ n = split($11, a, "/"); split(a[n],b,":"); \
for(i = 4; i < n; i++) printf("/%s",a[i]); \ printf("/%s\n", b[1]) }'` if [ ! -z "$DEV" ]; then # get the instance number and concatenate with the "sd" nawk -v dev=$DEV \ '$1 ~ dev { n = split(dev, a, "/"); split(a[n], b, "@"); \ printf("%s%s\n", b[1], $2) }' /etc/path_to_inst fi } ############################################################### # # MAIN # USAGE="$0 | -all"

# "verbose" is an unadvertised option, useful for debugging
if [ "$1" = "-v" ]; then
set -x
shift
fi

if [ -z "$1" ]; then
echo "Usage: $USAGE"
exit 1
fi

case $1 in

s*) # make sure slice number is *not* part of the name
DEVNAME=`echo $1 | sed "s/$s[dt][0-9]*$[a-h]$/\1/"`
inst2hw $DEVNAME
;;

c*) # make sure slice number *is* part of the name
DEVNAME=`echo $1 | sed "s/$c[0-9]t[0-9]d[0-9]$$/\1s0/"`
hw2inst $DEVNAME
;;

-all) if [ -d /dev/osa ]; then
PREFIX="/dev/osa"
fi
for d in ${PREFIX}/dev/rdsk/c?t?d?s0; do
DEVNAME=`basename $d`
printf "%s --- " $DEVNAME
hw2inst $DEVNAME
done
;;

esac

exit 0
Stoping Break on a SUN
Power-switch key method:On Enterprise-type Suns, the power switch has four positions: off, on, diagnostic and secure. With the power switch in the secure position, the system ignores BREAKs generated by keyboard reconnect, serial terminal loss, Stop-a or a serial terminal BREAK key.

Command method:
In /etc/default/kbd, add the variable KEYBOARD_ABORT=disable then use the command kbd -i
which reads /etc/default/kbd and disables keyboard abort.

Playing With Tapes

cfgadm -al
cfgadm -c configure c3
cfgadm -o show_FCP_dev -al
cfgadm -o show_SCSI_LUN -al
cfgadm -o unusable_FCP_dev -c unconfigure c4::2101001b32b4eb6c
cfgadm -c unconfigure c4::2101001b32b4eb6c
cfgadm -alo show_FCP_dev

luxadm -e port
luxadm -e forcelip /dev/cfg/c4luxadm -e dump_map /dev/cfg/c4
devfsadm -C

# luxadm probe
# luxadm qlgc >>show Sun/Qlogic HBA’s

Sun Volume Manager

SVM / DS / Solstice Disk Suite or what ever sun call the other disk manager technology today :)

Replacing a failed disk - (old on still online and readable)
capture the vtoc from the disk:

prtvtoc /dev/rdsk/c4t0d0s2 > savedisk.vtoc

now check if it's part of the db:

metadb

metadb
flags first blk block count
a m p luo 16 1034 /dev/dsk/c2t0d0s7
a p luo 16 1034 /dev/dsk/c3t2d0s7
a p luo 16 1034 /dev/dsk/c3t3d0s7
a p luo 16 1034 /dev/dsk/c4t0d0s7
a p luo 16 1034 /dev/dsk/c4t1d0s7
a p luo 16 1034 /dev/dsk/c4t2d0s7
a p luo 16 1034 /dev/dsk/c4t3d0s7

ok it is so lets delete it:

metadb -d /dev/rdsk/c4t0d0s6

replace failed mech

import vtoc from saved file :

cat savedisk.vtoc|hmthard -s - /dev/rdsk/c4t0d0s2

recreate metadb (this option is for 2 copy on slice 6 , may not be required if you have more than 3 disks in the metadb - you can get arounf this with

metadb -a -c 2 /rdev/c4t0d0s6

find the degraded mirror:

Invoke: metareplace d5 c4t2d0s0
metastat|grep 'metareplace'

Now replace and resilver the mirrors:

metareplace -e d5 c4t2d0s0

Disk Replacement of dead mech

mostly the same as before but you need to force it

rebuild the vtoc (naturally you will have saved all the vtocs of the disks to a save place ;-) )

yank the disk out of the metadb

metadb -f -d /dev/rdsk/cXtYdZs6

metareplace -f d4 c4t3d0s0

Avoid database replica issues (more of value on root mirrors)

echo "set md:mirrored_root_flag=1" >> /etc/system

Corrupt boot block
boot from the mirror and do (or cd/net)

/usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/cXtYdZs0

Bogus Svm needs maintainance printed on metastat output

Sometimes if the svm sync has not been started correctly you will see

metastat d50
d50: Mirror
Submirror 0: d51
State: Needs maintenance
Submirror 1: d52
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 8405376 blocks (4.0 GB)

d51: Submirror of d50
State: Needs maintenance
Invoke: metareplace d50 c1t2d0s0
Size: 8405376 blocks (4.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t2d0s0 0 No Maintenance Yes

but iostat shows up clean !!

(you can use the format command to do a non destructive test if you like)

once tested or trusted ;-)

simply run svm sync - /etc/rc.local for older svm or /etc/rc2.d/svm.sync for newer versions

followed by

metareplace -e d50 cXtXdXsX to resync the mirrors.

metastat d50
d50: Mirror
Submirror 0: d51
State: Okay
Submirror 1: d52
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 8405376 blocks (4.0 GB)

http://dlc.sun.com/osol/docs/content/LOGVOLMGRADMIN/tasks-state-db-replicas-11.html#troubleshoottasks-31036

5120's & LDOMs

New Stuff from sun with T5120 T5140's

Power on via the ilom

start /SYS

then switch to the console

start /SP/console

Default ilom password
root , changeme

Changing the ilom default password
set /SP/users/root password

Enter new Password: *******

Reset machine back to factory defaults
with a running machine
ldm set-spconfig factory-default
or from the service processor
bootmode config="factory-default"

Enable/ Disable SSH
set /SP/services/ssh state=[enable|disable]

Display the active ILOM sessions
show /SP/sessions

Display information about commands
show /SP/cli/commands

Add a local user
create /SP/users/bob password=password role=administrator|operator

Delete a local user
delete /SP/users/fred

change the ip address to static

cd /SP/network
set pendingipdiscovery=static
set pendingipaddress=xxx.xxx.xxx.xxx
set pendingipnetmask=yyy.yyy.yyy.yyy
set pendingipgatwat=zzz.zzz.zzz.zzz

show to verify setings

set commitpending=true

Enable web interface
set /SP/services/http state=enabled

Reset Service Processor
reset /SP

Boot System
start /SYS
(reset /SYS - for a vulgar system reset)

How do i break into the obp on a 5120

set /HOST send_break_action=break
start /SP/console
r)eboot, o)k prompt, h)alt?
o

- why , every domain gets it's own obp

the older revision procedure is to send a break , select r to reboot , then send another break to get to the obp for that ldom

one other way
set /HOST/bootmode script="setenv auto-boot? false"
then init 0 or / reset /sys

Jumpstart from the control domain ldom

so you have build your jet/jumpstart server and your first ldom host , and try and boot net install

but all you get is arp timeouts

you even snoop your interfaces looking for packets ... but still nothing!

ok here is the fix

(my example uses two interfaces)

first you need to use other interfaces than your e1000g0 g1 etc

remove your vsw interfaces from primary control domain (if configured)

then examine the real mac addesses for your interfaces

-bash-3.00# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000

e1000g0: flags=1000843 mtu 1500 index 2
inet xx.yy.zz.40 netmask fffff000 broadcast x.x.x.x
ether xx:xx:xx:xx:be:7b

e1000g1: flags=1000843 mtu 1500 index 3
inet zz.yy.zz.40 netmask fffff000 broadcast x.x.x.x
ether xx:xx:xx:xx:be:7a

now unplumb both interfaces

-bash-3.00# ifconfig e1000g0 unplumb

-bash-3.00# ifconfig e1000g1 unplumb

now configure the virtual switch interfaces

-bash-3.00# ldm add-vsw mac-addr= net-dev=e1000g0 vsw0 primary
-bash-3.00# ldm add-vsw mac-addr= net-dev=e1000g0 vsw0 primary

now ifconfig vsw0 and vsw1

-bash-3.00# ifconfig vsw0 plumb

-bash-3.00# ifconfig vsw1 plumb

and ifconfig your interfaces as you add them before

-bash-3.00# ifconfig vsw0 up xx.yy.zz.40 netmask + broadcast +

-bash-3.00# ifconfig vsw1up zz.yy.zz.40 netmask + broadcast +

no ifconfig will show :

-bash-3.00# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000

vsw0: flags=1000843 mtu 1500 index 2
inet xx.yy.zz.40 netmask fffff000 broadcast x.x.x.x
ether xx:xx:xx:xx:be:7b

vsw1: flags=1000843 mtu 1500 index 3
inet zz.yy.zz.40 netmask fffff000 broadcast x.x.x.x
ether xx:xx:xx:xx:be:7a
copy your interface files from the old e1000g0 files so your changes survive a reboot

-bash-3.00# mv /etc/hostname.e1000g0 /etc/hostname.vsw0

-bash-3.00# mv /etc/hostname.e1000g1 /etc/hostname.vsw1

you should now be able to jumpstart from your control domain

these steps are not required if your jumpstart is remote from the ldom host

Set up the hardware mirror (t2000 & t5120)
think really carefully before you do this ... if you dont leave a slice for liveupgrade you will not be able to split the mirror for patching or os upgrade

(best done from cdrom/net boot as you get into the well known catch 22 problem)

/usr/sbin/raidctl [-f] -c cXtXdX cYtYdY

# raidctl
Controller: 1
Disk: 0.0.0
Disk: 0.1.0
# format < /dev/null Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0
/pci@0/pci@0/pci@2/scsi@0/sd@0,0
1. c1t1d0
/pci@0/pci@0/pci@2/scsi@0/sd@1,0
Specify disk (enter its number):
# raidctl -c c1t0d0 c1t1d0
Creating RAID volume will destroy all data on spare space of member disks, proceed (yes/no)? y
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Physical disk 0 created.
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Physical disk 1 created.
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 created.
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Physical disk (target 1) is |out of sync||online|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||degraded|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||resyncing||degraded|
Volume c1t0d0 is created successfully!
# raidctl
Controller: 1
Volume:c1t0d0
Disk: 0.0.0
Disk: 0.1.0

List Ldoms
# ./ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-c- SP 32 8064M 0.0% 3m

Setup ldom default service

# /opt/SUNWldm/bin/ldm add-vds primary-vds0 primary
# /opt/SUNWldm/bin/ldm add-vcc port-range=5000-5100 primary-vcc0 primary
# /opt/SUNWldm/bin/ldm add-vsw net-dev=e1000g0 primary-vsw0 primary
# /opt/SUNWldm/bin/ldm set-mau 1 primary
# /opt/SUNWldm/bin/ldm set-vcpu 4 primary

List ldom services

# /opt/SUNWldm/bin/ldm list-services primary
VCC
NAME PORT-RANGE
primary-vcc0 5000-5100

VSW
NAME MAC NET-DEV DEVICE MODE
primary-vsw0 00:14:4f:fb:37:32 e1000g0 switch@0

VDS
NAME VOLUME OPTIONS DEVICE
primary-vds0

Listing ldom sp config

# /opt/SUNWldm/bin/ldm list-spconfig
factory-default [current]

Saving sp config
# /opt/SUNWldm/bin/ldm add-spconfig initial
# /opt/SUNWldm/bin/ldm list-spconfig
factory-default [current]
initial [next]

(remember to reboot if you have made changes)

List ldom service bindings
bash-3.00# ./ldm list-bindings primary
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv SP 4 1G 0.0% 3m

MAC
00:21:28:ZZ:YY:XX

VCPU
VID PID UTIL STRAND
0 0 0.0% 100%
1 1 0.0% 100%
2 2 0.0% 100%
3 3 0.0% 100%

MAU
ID CPUSET
0 (0, 1, 2, 3, 4, 5, 6, 7)

MEMORY
RA PA SIZE
0x8000000 0x8000000 1G

VARIABLES
auto-boot?=false
boot-device=/pci@0/pci@0/pci@2/scsi@0/disk@0,0:a disk net
keyboard-layout=US-English

IO
DEVICE PSEUDONYM OPTIONS
pci@0 pci
niu@80 niu

VCC
NAME PORT-RANGE
primary-vcc0 5000-5100

VSW
NAME MAC NET-DEV DEVICE MODE
primary-vsw0 00:14:4f:fb:37:32 e1000g0 switch@0

VDS
NAME VOLUME OPTIONS DEVICE
primary-vds0

VCONS
NAME SERVICE PORT
SP

Example simple Ldom
bash-3.00# ldm add-domain secondary
bash-3.00# ldm add-vcpu 12 secondary
bash-3.00# ldm add-memory 1G secondary
bash-3.00# ldm add-vnet vnet1 primary-vsw0 secondary
bash-3.00# ldm add-vdsdev /dev/zvol/rdsk/domains/secondary vol1@primary-vds0
bash-3.00# ldm add-vdisk vdisk1 vol1@primary-vds0 secondary
bash-3.00# ldm set-variable auto-boot\?=false secondary
bash-3.00# ldm set-variable boot-device=/virtual-devices@100/channel-devices@20
0/disk@0 secondary
bash-3.00# ldm bind-domain secondary
bash-3.00# ldm start-domain secondary

LDom secondary started
bash-3.00# telnet localhost 5000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Connecting to console "secondary" in group "secondary" ....
Press ~? for control options ..

{0} ok banner

SPARC Enterprise T5120, No Keyboard
Copyright 2008 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.28.0, 1024 MB memory available, Serial #66796533.
Ethernet address 0:14:4f:fb:3b:f5, Host ID: 83fb3bf5.

##list physical cdrom / cdr drives
-bash-3.00# cdrw -l
Looking for CD devices...
Node Connected Device Device type
----------------------+--------------------------------+-----------------
/dev/rdsk/c0t0d0s2 | TSSTcorp CD/DVDW TS-T632A SR03 | CD Reader/Writer

Set up an iso image as a vdisk (to boot from)
-bash-3.00# ldm add-vdsdev /var/tmp/sol-10-u5-ga-sparc-dvd.iso iso@primary-vds0
-bash-3.00# ldm add-vdisk iso iso@primary-vds0 secondary

Yes it will run a sparc linux system:)
this was a quick test using the iso tip abobe and booting from the cdrom

~ # cat /proc/cpuinfo
cpu : UltraSparc T2 (Niagara2)
fpu : UltraSparc T2 integrated FPU
prom : OBP 4.28.0 2008/01/22 21:10
type : sun4v
ncpus probed : 12
ncpus active : 1
D$ parity tl1 : 0
I$ parity tl1 : 0
Cpu0ClkTck : 00000000457646c0
MMU Type : Hypervisor (sun4v)

ilom/Sp Is LINUX !

U-Boot 1.1.1 (Apr 3 2008 - 19:06:21)

CPU: MPC885ZPnn at 133 MHz: 8 kB I-Cache 8 kB D-Cache FEC present
Board: SPARC885
Watchdog enabled
I2C: ready
DRAM:
trying 128 MBytes
trying 64 MBytes
(64 MB SDRAM) 64 MB
Memory Tests: DA A1 A2 00 FF 55 AA T2 T3 T4
POST memory PASSED
FLASH: 32 MB
In: serial
Out: serial
Err: serial
Net: FEC ETHERNET
POST i2c c d 18 20 23 2a 2d 2e 30 40 43 46 51 53 54 56 68 69 6a 6b 70 71 PASSED
POST cpu PASSED
POST ethernet PASSED
Booting linux in 5 seconds...
## Booting image at fe080000 ...
Image Name: Linux-2.4.22
Image Type: PowerPC Linux Kernel Image (gzip compressed)
Data Size: 814987 Bytes = 795.9 kB
Load Address: 00000000
Entry Point: 00000000
Verifying Checksum ... OK
Uncompressing Kernel Image ... OK

Solaris Patching (SPARC) - non liveupgrade

Prep work - Backup the entire system :
Flar , ufsdump , networker , etc

I find flar the fastest as it will give you an option for complete recovery in the event of a total abort

a good practice is to copy your /etc/system file somewhere save , I would also grab the metastat -p , metastat , metadb and eeprom , outputs and keep them off the machine

metastat -p |mailx -s "meta db for target x" me@company

repeat for any metasets !

and while we are at it .. grab the /etc/lvm directory too!

Verify Quorum flag in /etc/system
set md:mirrored_root_flag=1

backup metadb's

metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s3
a p luo 8208 8192 /dev/dsk/c1t0d0s3
a p luo 16 8192 /dev/dsk/c1t0d0s4
a p luo 8208 8192 /dev/dsk/c1t0d0s4
a p luo 16 8192 /dev/dsk/c1t1d0s4
a p luo 8208 8192 /dev/dsk/c1t1d0s4

dd if=/dev/rdsk/c1t0d0s3 of=//metadb.c1t0d0s3 bs=2048k
dd if=/dev/rdsk/c1t0d0s4 of=//metadb.c1t0d0s4 bs=2048k
dd if=/dev/rdsk/c1t1d0s3 of=//metadb.c1t1d0s3 bs=2048k

DETACH ROOT MIRROR DISK

Check which is the root meta device: df –k /
Filesystem kbytes used avail capacity Mounted on
/dev/md/dsk/d10 66419890 8134395 57621297 13% /

# metastat -p d10

a grab the second copy

# metadetach d10 d12
d10: submirror d12 is detached

verify with a metastat command the detach successful

remove disk from metadb

# metadb -d /dev/dsk/c1t1d0s7

You are now ready to patch , i find localcopy to work best as Solaris 10 had some issues wth patching accross nfs!

Patching is Successful

resilver the submirror

# metattach d10 d12
Add disk back into metadb

# metadb -a -c3 /dev/dsk/c1t1d0s7

resilver will take a while , you can monitor in the usual way

DISASTER - Patching has Failed , total recal required

(you need to back out the patches patchrm them one by one)

Boot using corrupt/patched environment. If the patched environment is not bootable, boot using net , cdrom.
now mount the basic device file from the old mirror disk and replace the rootdev entry in the /etc/system file (remember to edit the correct etc/system file)

vi system
details are from your saved eeprom outputs!

The details of below can be seen in step 3 output (add whatever 100,blk comes after the disk id)
example :
rootdev:/pseudo/md@0:0,100,blk
change to:
rootdev:/ssm@0,0/pci@18,600000/scsi@2/disk@1,0:a,100,blk

revert your vfstab back to pre svm

Replace disk entries of / with the new disk (comment out the original and make a copy with the following)

/dev/md/dsk/d10 /dev/md/rdsk/d10 / ufs 1 no logging
change to
/dev/dsk/c1t1d0s0 /dev/rdsk/c1t1d0s0 / ufs 1 no logging

lockfs -fa;sync;sync;reboot

now boot from your mirror disk

if there are any remaning metadevices remove them

system will now be in a svm less pre patch state

rebuild the svm state from scratch using the now new boot disk as the primary mirror drive

Recover Metadb from backup

if all copies of the metadb have a problem, use dd to restore the dd backup.

Example:

dd if= /opt/sun/sdsbackup/metadb.c0t0d0s7.dd of=/dev/rdsk/c0t0d0s7 bs=2048k

Jumpstarting Zones

PREP

Zone host:
mkdir /usr/lib/jetzone/
ln -s /net//opt/SUNWjet/Products/zones/smf/jetzonefinish.xml /usr/lib/jetzone/jetzonefinish.xml
(or copy it)

add /etc/default/jetzone
JET_SERVER="i"
JET_CONFIG="/opt/SUNWjet"
JET_ZONEDIR="/zones"

cp /net//opt/SUNWjet/Products/zones/jetzone /usr/local/bin
chmod +x /usr/local/bin/jetzone

Configure Zone on JET server

Jumpstart server :
make_zone_template -f -T SUS_ZONE
make_client -f

BUILDING
jetzone -F bluecat
JumpStart Enterprise Toolkit Zone creation: bluecat
Cleaning out zone bluecat
Uninstall zone bluecat
zoneadm: zone 'bluecat': is already in state 'configured'.
Cleaning up zone area
Mounting JET configuration area on /tmp/jet.8791
Loading zone bluecat configuration data
Forcing permissions to 700 for /zones/bluecat
Building zone bluecat configuration
Add network (aggr1/192.168.1.38/255.255.255.0) to zone
Creating zone bluecat
Installing zone bluecat
Preparing to install zone .
Creating list of files to copy from the global zone.
Copying <3135> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <585> packages on the zone.
Initialized <585> packages on zone.
Zone is initialized.
Installation of <2> packages was skipped.
The file contains a log of the zone installation.
Populating zone bluecat sysidcfg
Disable NFSv4 prompt
Adding JET hook to zone bluecat
cp: cannot access /usr/lib/jetzone/jetzonefinish.xml
Booting zone bluecat
To monitor the installation of the zone, please run the command
zlogin -C bluecat

[Connected to zone 'bluecat' console]
142/142
Reading ZFS config: done.
Creating new rsa public/private host key pair
Creating new dsa public/private host key pair
Configuring network interface addresses: aggr1.

rebooting system due to change(s) in /etc/default/init

[NOTICE: Zone rebooting]

SunOS Release 5.10 Version Generic_142909-17 64-bit
Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
Hostname: bluecat
Reading ZFS config: done.
Starting JumpStart Enterprise Toolkit helper script
Mounting JET filesystem
Starting JET finish script
Installation of bluecat at 12:28 on 07-Dec-2010
Loading JumpStart Server variables
JumpStart Enterprise Toolkit version 4.8
Loading Client configuration file
Creating directory: /var/opt/sun/jet/post_install
Creating directory: /var/opt/sun/jet/Utils
Dec 7 12:28:18 bluecat sendmail[22716]: My unqualified host name (bluecat) unknown; sleeping for retry
Creating directory: /var/opt/sun/jet/config
Creating directory: /var/opt/sun/jet/js_media/patch
Creating directory: /var/opt/sun/jet/js_media/pkg
Dec 7 12:28:18 bluecat sendmail[22755]: My unqualified host name (bluecat) unknown; sleeping for retry
Copying file Clients/bluecat/sysidcfg to //var/opt/sun/jet/config/sysidcfg
Copying file Clients/bluecat/host.config to //var/opt/sun/jet/config/host.config
Copying file Utils/solaris/releaseinfo to //var/opt/sun/jet/config/releaseinfo
Copying functions to //var/opt/sun/jet/Utils/lib
Copying file Clients/bluecat/module_hints to //var/opt/sun/jet/config/module_hints
NFS Mounting Media Directories
Mounting nfs://192.168.1.86/export/install/patches on //var/opt/sun/jet/js_media/patch
Mounting nfs://192.168.1.86/export/install/pkgs on //var/opt/sun/jet/js_media/pkg
BASE_CONFIG: Running base_config install script....
BASE_CONFIG: Product base_config started
BASE_CONFIG: Trying to use external matrix cmd /tmp/jet/Products/zones/empty-matrix
BASE_CONFIG: External matrix cmd returned non-zero status or no entries
BASE_CONFIG: Unable to locate package.matrix file... exiting.
BASE_CONFIG: No HW specific packages for platform SUNW,SPARC-Enterprise-T5220
BASE_CONFIG: Trying to use external matrix cmd /tmp/jet/Products/zones/empty-matrix
BASE_CONFIG: External matrix cmd returned non-zero status or no entries
BASE_CONFIG: Unable to locate patch.matrix file... exiting.
BASE_CONFIG: No HW specific patches for platform SUNW,SPARC-Enterprise-T5220
BASE_CONFIG: Set root password
BASE_CONFIG: Setting netmask for primary interface
BASE_CONFIG: Add netmask 192.168.1.0 / 255.255.255.0
BASE_CONFIG: Disabling power management
BASE_CONFIG: Creating directory: /var/opt/sun/jet/post_install/n-post
BASE_CONFIG: Register postinstall script 'setupdumpdevice' for boot n
BASE_CONFIG: Creating directory: /var/opt/sun/jet/post_install/n
BASE_CONFIG: Register postinstall script 'run_sshkeygen' for boot n
BASE_CONFIG: Setting system terminal type to vt100
BASE_CONFIG: Register postinstall script 'console' for boot n
BASE_CONFIG: Setting NFSv4 domain
BASE_CONFIG: Creating directory: /var/opt/sun/jet/system.add
BASE_CONFIG: Product base_config finished
BASE_CONFIG: Running base_config install script....
BASE_CONFIG: Product base_config started
BASE_CONFIG: Trying to use external matrix cmd /tmp/jet/Products/zones/empty-matrix
BASE_CONFIG: External matrix cmd returned non-zero status or no entries
BASE_CONFIG: Unable to locate package.matrix file... exiting.
BASE_CONFIG: No HW specific packages for platform SUNW,SPARC-Enterprise-T5220
BASE_CONFIG: Trying to use external matrix cmd /tmp/jet/Products/zones/empty-matrix
BASE_CONFIG: External matrix cmd returned non-zero status or no entries
BASE_CONFIG: Unable to locate patch.matrix file... exiting.
BASE_CONFIG: No HW specific patches for platform SUNW,SPARC-Enterprise-T5220
BASE_CONFIG: Set root password
BASE_CONFIG: Setting netmask for primary interface
BASE_CONFIG: Disabling power management
BASE_CONFIG: Register postinstall script 'setupdumpdevice' for boot n
BASE_CONFIG: Register postinstall script 'run_sshkeygen' for boot n
BASE_CONFIG: Setting system terminal type to vt100
BASE_CONFIG: Register postinstall script 'console' for boot n
BASE_CONFIG: Setting NFSv4 domain
BASE_CONFIG: Product base_config finished
CUSTOM: Running custom install script....
CUSTOM: Copying file Clients/bluecat/../common.files/ntp.conf to //etc/inet/ntp.conf
CUSTOM: Copying file Clients/bluecat/../common.files/etc_profile to //etc/profile
CUSTOM: Copying file Clients/bluecat/../common.files/ITC_jet_post to //post_install/ITC_jet_post
CUSTOM: Copying file Clients/bluecat/../common.files/deploy_foo_solaris to //post_install/deploy_altiris_solaris
CUSTOM: Copying file Clients/bluecat/../common.files/syslog.conf to //etc/syslog.conf
CUSTOM: Copying file Clients/bluecat/../common.files/jfdi to //post_install/jfdi
CUSTOM: Register postinstall script 'ITC_jet_post' for boot n
----------------------------------------------------------
Product modules processed, finish up installation tasks
Creating directory: /var/opt/sun/jet/system.add/updated
Copying file etc/jumpstart.conf to //var/opt/sun/jet/config/jumpstart.conf
Copying file Utils/smf/jetjump.xml to //var/svc/manifest/site/jetjump.xml
Copying file Utils/S99jumpstart to //var/opt/sun/jet/post_install/S99jumpstart
NFS Unmounting Media Directories
Unmounting /var/opt/sun/jet/js_media/pkg
Unmounting /var/opt/sun/jet/js_media/patch
Make a link to finish log...
Updating boot-archive
/: not a boot archive based Solaris instance
Disable & delete SMF tag svc:/site/jetzonefinish

bluecat console login: Dec 7 12:28:28 bluecat reboot: rebooted by LOGIN
Dec 7 12:28:28 bluecat syslogd: going down on signal 15
Dec 7 12:28:28 /usr/lib/snmp/snmpdx: received signal 15

[NOTICE: Zone rebooting]

SunOS Release 5.10 Version Generic_142909-17 64-bit
Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
Hostname: bluecat
Loading smf(5) service descriptions: 1/1
Reading ZFS config: done.
JumpStart (/var/opt/sun/jet/post_install/S99jumpstart) started @ Tue Dec 7 12:28:59 MET 2010
Loading JumpStart Server variables
Loading Client configuration file
No more reboots required
Running additional install files for reboot n
NFS Mounting Media Directories
Mounting nfs://192.168.1.86/export/install/patches on /var/opt/sun/jet/js_media/patch
Dec 7 12:28:59 bluecat sendmail[24360]: My unqualified host name (bluecat) unknown; sleeping for retry
Dec 7 12:28:59 bluecat sendmail[24364]: My unqualified host name (bluecat) unknown; sleeping for retry
Mounting nfs://192.168.1.86/export/install/pkgs on /var/opt/sun/jet/js_media/pkg
BASE_CONFIG: Running 001.base_config.001.run_sshkeygen
BASE_CONFIG: Running 001.base_config.002.console
BASE_CONFIG: Setting terminal type to vt100
BASE_CONFIG: Running 001.base_config.003.run_sshkeygen
BASE_CONFIG: Running 001.base_config.004.console
BASE_CONFIG: Setting terminal type to vt100
CUSTOM: Running 002.custom.001.ITC_jet_post
Dec 7 12:29:31 bluecat sendmail[25179]: My unqualified host name (bluecat) unknown; sleeping for retry
Dec 7 12:29:59 bluecat sendmail[24360]: unable to qualify my own domain name (bluecat) -- using short name
Dec 7 12:29:59 bluecat sendmail[24364]: unable to qualify my own domain name (bluecat) -- using short name
Dec 7 12:30:14 bluecat sshd[25284]: Failed keyboard-interactive for root from 10.128.101.58 port 56954 ssh2
Dec 7 12:30:28 bluecat login: ROOT LOGIN /dev/pts/4
Dec 7 12:30:31 bluecat sendmail[25179]: unable to qualify my own domain name (bluecat) -- using short name
Dec 7 12:31:12 bluecat ntpdate[25718]: can't find host ntp1.ch.sus.local
Dec 7 12:31:12 bluecat ntpdate[25718]: can't find host ntp2.ch.sus.local
Dec 7 12:31:12 bluecat ntpdate[25718]: can't find host ntp3.ch.sus.local
Dec 7 12:31:12 bluecat ntpdate[25718]: can't find host ntp4.ch.sus.local
Dec 7 12:31:12 bluecat ntpdate[25718]: no servers can be used, exiting
Dec 7 12:31:12 bluecat xntpd[25720]: sched_setscheduler(): Not owner
Dec 7 12:31:12 bluecat xntpd[25720]: loop_config: ntp_adjtime() failed: Not owner
Dec 7 12:31:12 bluecat last message repeated 1 time
Dec 7 12:31:14 bluecat xntpd[25721]: couldn't resolve `ntp1.ch.sus.local', giving up on it
Dec 7 12:31:14 bluecat xntpd[25721]: couldn't resolve `ntp2.ch.sus.local', giving up on it
Dec 7 12:31:14 bluecat xntpd[25721]: couldn't resolve `ntp3.ch.sus.local', giving up on it
Dec 7 12:31:14 bluecat xntpd[25721]: couldn't resolve `ntp4.SUS', giving up on it

BASE_CONFIG: Running 001.base_config.001.setupdumpdevice
BASE_CONFIG: Running 001.base_config.002.setupdumpdevice
NFS Unmounting Media Directories
Unmounting /var/opt/sun/jet/js_media/pkg
Unmounting /var/opt/sun/jet/js_media/patch
Disable & delete SMF tag svc:/site/jetjump

bluecat console login: JumpStart is complete @ Tue Dec 7 12:32:44 MET 2010