JumpStart is a really cool way to install Solaris on SUNs. If you do it right you can set it up so that the entire installation and all local modifications you want to have happen to your machines is done by the JumpStart procedure. To take care of installing Solaris on a Workstation you just need to type boot net - install on the Workstation's console and then walk away. Anywhere from an hour to two hours later (depends mostly on network bandwidth and speed of the Workstation) the machine is completely ready to use.
JumpStart relies on principles of "Diskless Workstation" operation to boot a specially configured version of Solaris from an NFS server (the Boot Server). The startup scripts the machine will process as a result of booting that image of Solaris will then guide the machine through the JumpStart procedure. It will contact the JumpStart Server and follow a well defined sequence of events that allow you to set up the target configuration of the machine being JumpStart-ed (disk layout, Solaris packages that get installed, etc). Pieces of that well-defined sequence of events have "stubs" provided that you control completely in the form of shell scripts you write being run at certain spots in the JumpStart procedure. You also provide "Machine Profiles" that specify what Solaris packages to install as well as the disk partition layout.
As part of the "Diskless Workstation Principles", the machine being JumpStarted will be provided its IP address and name (configured in /etc/ethers and DNS. It will then find out where to mount its root filesystem from (via NFS), and a few extra things that get configured into /etc/bootparams by the add_install_client script mentioned later.
The JumpStarting Workstation will now mount its root filesystem through NFS and load the Solaris kernel. The startup scripts in the root filesystem then take over. Now the machine will look to the JumpStart Server to get some very basic information known as its sysidcfg (see sysidcfg(4) for what can be put here and the syntax for it which changes from release to release). Unlike Solaris-7 so far one sysidcfg file appears to be suitable for all of our machines. Its contents currently are:
system_locale=en_US
timezone=US/Eastern
terminal=vt100
name_service=DNS {domain_name=cse.Buffalo.EDU name_server=128.205.32.12}
timeserver=128.205.32.12
network_interface=PRIMARY {netmask=255.255.255.0 protocol_ipv6=no}
security_policy=NONE
root_password="*****"
Note that this file is visible from a wide variety of places, DO NOT put the "real" root password in there! For our purposes that root password will not wind up being used for any "server" class machines being JumpStarted (they will be retaining their current /etc/{passwd,shadow} files) and all other machines will wind up having new administrative files rdist-ed to them by some central server relatively shortly after the machine finishes JumpStarting.
Once the machine gets this basic information, it will check through the file /export/jumpstart_8/rules.ok on the JumpStart server to identify a 'Begin Script', 'Machine Profile', and 'Finish Script'. The rules.ok file is not what you edit, it gets built from the text file rules with the command ./check rules after you edit the file rules. There are a mind-boggling number of
ways you can set it up for the machines to find their entry in this file, but we use a relatively simple subset of the possibilities. Note that the first match "wins" so put the more restrictive rules higher in the file than the less restrictive rules. For example keep the rules where one specific machine is targeted by name close to the top so
that machine will match its rule fast and not "fall through" to a more generic "network based" rule and match that "network based" rule. An example of a machine targeted by name would be:
hostname jeeves.cse.Buffalo.EDU \
&& totaldisk 0-999999 \
begin.client jeeves finish.client
We usually have totaldisk listed there just in case something is drastically wrong, we could probably drop it for the machines targeted by name. It does get used heavily to differentiate more generic workstations in labs. The last line specifies the 'Begin Script', 'Machine Profile', and 'Finish Script' respectively. An example of a machine targeted by network is:
network 128.205.51.0 \
&& karch sun4u \
begin.client n19_u5 finish.client
Here, in addition to the network number, the machine needs to be the Ultra architecture to match this rule. A Sparc-5, for example, would not match this rule. Note however that denali.cse.buffalo.edu would match the above rule and we want to have a different machine profile for denali so the rule for denali specifying it by name must appear above the more generic rule shown above.
There are a few special lines at the top of the rules file. They are:
probe network # SI_NETWORK gets used heavily in finish_client
probe rootdisk # Used to try and find old files
These are needed to guarantee the shell variables SI_NETWORK and SI_ROOTDISK get set. They are used heavily in our scripts and are necessary.
Once the JumpStarting machine finds its rule, it has all the information it needs to handle the JumpStart. It will first run the 'Begin Script' as a shell script on the machine as it exists. We use this script to try to preserve information from a machine being JumpStarted in order to upgrade it from a previous version of Solaris. Then information from the 'Machine Profile' is used to partition the disk(s) which are
then mounted as what will become the machine's filesystem tree under the
directory /a (for example if the machine is to have the filesystems
/, /var, and /usr the partitions will be mounted as /a for the root filesystem, /a/var for the /var
filesystem, and /a/usr for the /usr filesystem. Information in the 'Machine Profile' also determines what Solaris packages need to be loaded.
The packages will be loaded at this point using pkgadd -R /a.
The Machine Profile file may wind up being very short or quite long depending on what is needed. See the examples currently in /export/jumpstart_8. Using the one for the Ultra-5's in Norton-19 as an example, it starts off with what kind of JumpStart
is being done:
install_type initial_install
system_type standalone
partitioning explicit
We then choose to install the "Software Cluster" SUNWCall:
cluster SUNWCall
We could just leave it at that but there are various packages included in that Cluster that we don't want, so we specify a few to delete from the set that would have been installed. There are more than just these deleted but a few for example:
# Delete stuff that goes in /opt
package SUNWrtvcu delete
package SUNWebnfs delete
Last but not least the disk partition layout. Note we can specify filesystems that will be mounted by NFS in the machine's /etc/vfstab file after installation - these will not be mounted during the software installation phase.
filesys rootdisk.s0 free /
filesys rootdisk.s1 1024 swap
filesys everest:/util - /util ro
filesys everest:/opt - /opt ro
filesys everest:/var/mail - /var/mail rw,actimeo=0
filesys everest:/u0 - /u0 rw
filesys everest:/u1 - /u1 rw
There are other nifty things that can be done while defining the filesystems. Taking the one from jeeves.cse.buffalo.edu as one more example:
filesys rootdisk.s0 existing /
filesys rootdisk.s1 existing swap
filesys rootdisk.s3 existing /tmp
filesys rootdisk.s4 existing /var
filesys rootdisk.s6 existing /usr
filesys rootdisk.s7 existing /home preserve
filesys milo:/util - /util ro
filesys milo:/opt - /opt ro
Here we're telling JumpStart to use the existing partition layout of the disks instead of creating it from scratch - useful for server type machines we care more about than generic Lab Workstations. The "preserve" keyword tells JumpStart to leave the current contents of the partition intact, if that is not present JumpStart will newfs the partition before doing the software install.
After doing the software installation, JumpStart will run the 'Finish Script', which drives all the local modifications to the machines. When that script finishes, the machine will be rebooted and, if you did everything right, the machine will be completely ready for use with all local modifications done. See the "Local Modifications" below for what we have the scripts involved in JumpStart do for us.
Start off by loading the Solaris-8 CDs onto the JumpStart server itself somewhere in /export. Our JumpStart server is milo.cse.buffalo.edu so you use the scripts SUN provides on the CDs to load the two Solaris-8 software CDs:
./setup_install_server /export/install/sparc_8
./add_to_install_server /export/install/sparc_8
The JumpStart server that results will be capable of letting clients on the same physical network JumpStart completely from that server so milo can handle all JumpStart issues for machines on the 128.205.32.0 network (which makes that a convenient network to do the necessary test/configuration stuff when rolling out a new version of Solaris...). You will find the add_install_client script mentioned below in the directory /export/install/sparc_8/Solaris_8/Tools on the JumpStart server instead of the path mentioned below.
Now the fun begins... The JumpStart configuration needs to be set up in /export/jumpstart_8 on the JumpStart Server. If you don't know what you're doing or don't have an existing setup to steal from a test machine and an iterative process of adding/testing stuff winds up being educational. It can take a while to get it right the first time but it saves you a lot of work later... Once it's set up right only minor adjustments and upkeep are needed.
The setup_install_server script handles this as well. On a boot server (one per physical network, we pick the central fileserver/gateway for each net) with the first Solaris-8 CD loaded, run:
./setup_install_server -b /export/boot_8
This takes a REALLY long time. Once you have done it on one server it is usually quicker to copy the result from one server to the others. To do this, copy /export/boot_8. Then edit /etc/dfs/dfstab on the target server, adding the line:
share -F nfs -o ro,anon=0 /export/boot_8/Solaris_8/Tools/Boot
After running shareall, the target boot server should be all set.
Before doing too much of this, be sure to read and understand the 'Local Modifications' section, below!!! It describes what we have the 'Begin Script' and 'Finish Script' set up to do, and details what stuff you may need to check on or modify or add to /export/jumpstart_8 for you to get the desired result when the machine finishes JumpStarting.
Setting up a JumpStart Client is a simple matter of running the add_install_client script in /export/boot_8/Solaris_8/Tools
on the boot server for the network the client is on. If the machine you want to JumpStart is on the same network as the JumpStart server, then you use the same script but on the JumpStart server it is in /export/install/Solaris_8/Tools. Before doing that, enter the client's Ethernet address and hostname in /etc/ethers.
Using dnoces.cse.buffalo.edu as an example the command would be:
./add_install_client -c milo.cse.buffalo.edu:/export/jumpstart_8 \
-p milo.cse.buffalo.edu:/export/jumpstart_8 \
-s milo.cse.buffalo.edu:/export/install/sparc_8 \
dnoces.cse.Buffalo.EDU sun4u
Because that gets done so often and it's easy to have a typo in all that there is usually a script available in /export/boot_8/Solaris_8/Tools named do_it_local which just needs the machine name and architecture as arguments.
Despite all Internet standards saying hostnames are case insensitive there is still a bug or two in SUN's support of network boots so be sure to use "Buffalo.EDU" in the hostname.
Now edit /etc/bootparms to change the hostname of the server for the client you just added to guarantee it uses the right network interface of the server. For example if the boot server is hadar, change the line of /etc/bootparams for the client so that the server name is hadar-35 instead of just hadar. At this point it shouldn't be necessary but it doesn't hurt to run:
make in /var/yp
Now boot the client with:
boot net - install
... to start up the JumpStart-based installation.
Except for what filesystems will be mounted via NFS from a server in /etc/vfstab, all localization of the JumpStarted machines happens as a result of the 'Begin Script' and 'Finish Script'. So far, every JumpStartable machine is handled by the same scripts, though that is not necessary and if some particular machine winds up being too complicated, it might be necessary for it to have its own custom script(s). The
scripts being used now are:
milo:/export/jumpstart_8/begin_client
milo:/export/jumpstart_8/finish_client
The begin_client script tries to preserve some information from the machine before it gets upgraded. This will only work if the machine is not a brand new machine, or an existing machine that needed to have its disk drive(s) replaced but begin_client is set up to fail gracefully for those cases. The things it will try to save are:
- /etc/passwd
- /etc/shadow
- /etc/group
- ssh keys (from /var/local/etc
- data files from the calendar program (directory /var/spool/calendar)
begin_client is set up to check for a "root disk". If it is able to mount that, it checks for the above stuff on that root partition. It also checks to see if there is a /var partition listed in the /etc/vfstab file in the root partition, and will try mounting that to check there as well. If any files of interest are found, they get copied to /tmp. Note that the Administrative Data files may not wind up being copied back, the finish_client script only copies them back if the machine being JumpStarted is identified as a "server" (more on that later). That is pretty much it for begin_client.
finish_client is more complex. It starts off defining a few shell variables. It also depends on some shell variables provided by JumpStart. JumpStart always defines SI_CONFIG_DIR and for the current setup that winds up being /export/jumpstart_8. All pathnames given for the rest of this section will be relative to that directory. finish_client is also heavily dependent on the network number being defined to put the right files in place for various things. This variable is SI_NETWORK and we make sure it gets set properly by having the line:
probe network
... in the rules file.
To take care of installing files to the JumpStarting machine's filesystems, several macros are defined to call the UCB version of install. The generic SYS-V install isn't as nice. Note this means all machine configurations will need to have the Solaris packages that install the BSD tools for finish_client
to work.
Read through the script once or twice to see what it does. The rest of this section will just give a general overview of what is where, and what places you might need to add things for specific machines to get the results you want.
The directory local_tree contains all files and directories that get copied to every machine regardless of its status, what network it winds up on, etc. For example, it contains sudo support, has our modified /etc/inet/inetd.conf file, etc. If any of those files need to be changed on all our machines (e.g. we add some
new network service handled by inetd), then we need to remember to change the file local_tree/etc/inet/inetd.conf. The entire contents of that directory will be copied to the root filesystem of the JumpStarting machine using tar so you MUST make sure that files placed in local_tree have the right owner, group, permissions, etc. For Ken's convenience while upgrading servers manually, this whole tree will be copied to the servers as well so make sure anything you add there is suitable for machines like armstrong, hadar, etc as well. Files here should not be network dependent (e.g. /etc/resolv.conf is different on our different subnets).
The directory depend_trees contains machine-specific trees that will be copied to only that machine, again using tar. For example in depend_trees/jeeves.cse.Buffalo.EDU are files and directories needed because jeeves is our print server (extra configuration files, and /var/spool/lpd populated for all our printers). This directory will be checked and copied towards the end of finish_client.
The directory depend_files is where most stuff goes that is either network-dependent (e.g. /etc/resolv.conf) or could wind up being useful on more than just one machine (e.g. support for being an XDM Server). The current set of directories there are:
- auto_direct : support for /etc/auto_direct which is network
dependent (mostly where to get /scratch from)
- defaultrouter : per-network default router
- etc_system : customized /etc/system files
- gated.conf : per-network /usr/local/etc/gated.conf file
- hosts.deny : per-network /usr/local/etc/hosts.deny file which
gets used to deny FTP access to NFS client machines
- named : Nameserver configuration files, these will only be set up
if there is a hostname-specific resolv.conf (see below)
- ntp.conf : host-specific /etc/inet/ntp.conf file for our
main time servers, all other machines get a default one from
local_tree
- printcap : network-specific /usr/local/etc/printcap file
that determines the default printer, the real list of printers
is in /util/etc/printcap under our current LPRng based
setup.
- resolv.conf : per-network /etc/resolv.conf files for generic
machines not running nameservers plus per-machine versions for
machines that need to be set up to run nameservers of their own,
finish_client uses a per-machine version existing to indicate
it needs to install the files to configure a nameserver
- sendmail.cf : per-network and per-machine versions of
/etc/mail/sendmail.cf
- server : if a machine's name appears here as a file that machine is
identified as a server; finish_client will as a result of
it existing do the following extra things:
- touch /a/NEWBOOT_SERVER which will trigger a few
extra things in the script that runs the first
time the machine boots (see below)
- copy back the passwd, shadow, and group files if
they were found by begin_client, and if
the shadow file was found a flag is set
so JumpStart won't try to set the root user's
password as it finishes up
- installs an /etc/auto_master file that doesn't
allow the /net map
- xdmserver : per-machine empty file for machines that need to be set
up as XDM servers, support for it (startup scripts and a directory
tree) are in the directory local_packages)
Part of what gets copied to the JumpStarting machine from local_tree
is a local startup script local_tree/etc/init.d/_newboot, which
is hardlinked to local_tree/etc/rc2.d/S90_newboot. This script
checks to see if the file /NEWBOOT exists and exits immediately
if it does not. If that file does exist then this script runs, taking
care of things that can't be done until the machine has rebooted with
its "real" disk configuration and network identity. This script
will:
- log stuff to /var/sadm/system/logs/newboot.something
- check to see if /NEWBOOT_SERVER exists, turning off dtlogin
if it does
- run /etc/init.d/_newboot.local if it exists
- build the manual windex files for /usr/man, /usr/openwin/man,
/usr/dt/man
- do the fix to /tmp/.X11-unix for Xvnc
- installs the system patches using Davin's patch support stuff
- removes /NEWBOOT and reboots
Ken is still nervous about JumpStarting the big central fileservers but has begun to use it to allow network-based manual installations. For this, the server still needs to be set up as an Install Client but you do not want it to even try and match any of the Rules. This is done by using:
./add_install_client \
-s milo.cse.buffalo.edu:/export/install/sparc_8 \
demo0.cse.Buffalo.EDU sun4u
With just that, a manual installation of Solaris can be done through the network.
The above scenario is used on some of the central fileservers that normally have a second ethernet interface attached to the 128.205.32.0 net and need to boot off that. For Ultra-60 machines with the SunSwift card in them at the PROM monitor do:
nvalias net2 /pci@1f,2000/pci@1/SUNW,hme@0,1
boot net2 - install
(only do the nvalias command once). The equivalent for an Ultra-1 with a SunSwift in it is:
nvalias net2 /sbus@1f,0/SUNW,hme@2,8c00000
boot net2 - install
The same is likely to work on Ultra-2s but this has not been tested yet.