Patch-ID# 107926-04 Keywords: netra ft1800 osdog Synopsis: Netra ft1800 6.7: OSdog patch Date: Jul/26/99 Solaris Release: 2.6 SunOS release: 5.6 Unbundled Product: Netra ft1800 Unbundled Release: 6.7 Topic: Netra ft1800 OSdog patch BugId's fixed with this patch: 4187726 4188172 4193139 4197838 4200586 4200954 4201230 4203815 4206240 4206317 4207341 4207878 4207904 4210739 4210761 4210792 4211673 4211676 4212126 4212784 4213525 4214340 4214918 4216288 4216795 4217258 4217265 4217279 4219764 4220735 4221113 4224818 4227426 4227448 4228179 4228281 4229411 4230315 4231942 4234404 4234421 4236872 4238320 4239178 4239299 4239336 4243212 4243236 4243326 4243643 4243689 4243949 4244332 4245194 4245833 4245835 4247294 4247319 4248470 4252069 4250437 Changes incorporated in this version: Relevant Architectures: sparc NOTE: sparc.sun4u Patches accumulated and obsoleted by this patch: Patches which conflict with this patch: Patches required with this patch: 107369-17 Obsoleted by: Files included with this patch: platform/SUNW,Ultra-4FT/kernel/drv/u4ftdog platform/SUNW,Ultra-4FT/kernel/drv/u4ftdog.conf platform/SUNW,Ultra-4FT/kernel/genunix platform/SUNW,Ultra-4FT/kernel/misc/swapgeneric platform/SUNW,Ultra-4FT/kernel/unix usr/platform/SUNW,Ultra-4FT/SUNWcms/etc/cmsdef.d/ft_core/ft_core.cmsdef usr/platform/SUNW,Ultra-4FT/SUNWcms/etc/cmsdef.d/ft_core/ft_core.so usr/platform/SUNW,Ultra-4FT/SUNWcms/lib/u4ftsplitd usr/platform/SUNW,Ultra-4FT/lib/netra_ft_1800.flash.update-09 usr/platform/SUNW,Ultra-4FT/sbin/u4ftdogctl Problem Description: 4187726 PROM test all-leds-on fails to illuminate system LEDs on both CAFs. 4188172 PROM fails OBP_vtt test, OBPVTS_math 4193139 sheffield system takes too long to boot up 4197838 4p cpusets sometimes don't see all processors. 4200586 nvram read from post menu fails when given count and increment 4200954 Exec POST from memory causes exception if run twice 4201230 Osdog in solaris always uses the 85 second timeout 4203815 inconsistent behaviour on 'init 5' 4206240 OBP should update eeprom bridge/rcp partnumbers when restoring default images 4206317 Osdog barked after "shutdown -i0 -g0 -y" followed by "reset-all" 4207341 NVRAM on sheffield system is not consistent with other Sun platforms 4207878 'Stack Underflow' after rcp repair 4207904 i2c buses not used correctly with motherboard powered off 4210739 prom corruption in post area not detected 4210761 osdog-get-timeout does not work in combined mode 4210792 Default NVRAM parameters are for a workstation. 4211673 osdog not disabled during driver offline 4211676 osdog reset not reported in ft system with power supplies 4212126 Obp does not provide functionality to clear boot-list eeprom property 4212784 system hangs with callb_delete blocked on mutex_enter 4213525 OBP stops if lock cable is not present. 4214340 u4ftdog source code should be lint and cstyle clean 4214918 osdog patting fails with 1 motherboard powered off 4216288 Inconsistency in CAF output for a-probe-mbus and b-probe-mbus tests 4216795 [ab]-caf-test fails serial and network tests 4217258 banner and suppress-banner (in nvramrc) work incorrectly on Sheffield 4217265 console-list helper functions do not check for initialised consoles 4217279 failure to access pci device nodes results in stack underflow 4219764 PCI class code for network device nodes is wrong 4220735 some pci slot maintenance bus words don't work. 4221113 [ab]-caf-test network test (net-selftest) jams 10Mb hub 4224818 failure to 'probe-all' results in osdog not being patted 4227426 Not all of C prom is checksummed. 4227448 OBP gives away ownership of slot 6 slots at reset. 4228179 motherboard osdog tests fail when primary tests secondary's motherboard 4228281 .onboard-versions word decodes UPA config register incorrectly 4229411 help for the 'netload' command has a typing error 4230315 mbrd_flags field (in sh_nvram.h) is hardcoded to zero by OBP 4231942 u4ftdogctl reports status to standard error 4234404 faulty rcp motherboard fpga images are not mended by cpuset B for mbrd B 4234421 obp does not recognised motherboard fpga download failures 4236872 enable/disable of osdog using u4ftdogctl should update /u4ft-options node 4238320 OBP crashes when initialising with it's own motherboard broken 4239178 set-conf-osdog-a/b need replacing by set-conf-osdog 4239299 OSdog driver should comply with the osdog design document 4239336 tickint_clnt_add fix needs backporting to the netra ft1800 patch tree 4243212 OSdog barks when post memory tests are executed 4243236 POST memory test fails on system with two memory banks populated 4243326 Make deadman kernel work on MP Netra FT 1800 (Solaris 5.6) 4243643 POST occasionally fails (CPU Module 1 Error)processor 1 when diag-level=max 4243689 core dump lost after panic 4243949 level 12 interrupt lockup halt system 4244332 OBP calibrates clock interrupts incorrectly on 2P 512M systems 4245194 The Osdog did not trigger when a priority 10 hang was generated using causefault 4245833 Multiprocessor post sometimes fails with RED-state exception. 4245835 OSdog goes off in dtag init test if diag-level is max. 4247294 Remove debugging messages from u4ftdog 4247319 Need fix for 4128397 (kernel crash dump causes another panic in bio.c::getblk() 4248470 Patch 107926-02, README.107926-02 contains errors 4250437 OSDog barked durring boot 4252069 OSdog triggers when debugger is entered as a result of a breakpoint Patch Installation Instructions: -------------------------------- Refer to the Install.info file within the patch for instructions on using the generic 'installpatch' and 'backoutpatch' scripts provided with each patch. Any other special or non-generic installation instructions should be described below. Special Install Instructions: ----------------------------- None. Non-generic install instructions -------------------------------- OSDOG features delivered ------------------------ The word "osdog" is an abbreviation for "Operating System Watchdog". The osdog features consist of hardware in the motherboards and associated software in the firmware and operating system software. The hardware consists essentially of a countdown timer that automatically power-cycles the motherboard if the timer counts down to zero - to stop this happening, the timer must be periodically reset (this is known as "patting" the osdog). The function of the osdog and related software is to increase the system availability by automatically rebooting the system when it has ceased to provide service. Currently a system is considered to have ceased to provide service if: - the hardware has failed in such a way that the software either cannot run at all, or cannot access the necessary resources to pat the osdog timer hardware: in this case the hardware osdog will fire and automatically reset the system, which will then reboot; - the operating system software has failed in such a way that no useful operations are being performed (this can happen if the clock thread ceases to run, or if processes are no longer being scheduled): in this case the system will panic, and be rebooted. The second failure detection algorithm is not selective - all it does is to check that the operating system appears to be scheduling processes, without prioritising or distinguishing the processes in any way. There is no way (so far) to arrange for the osdog to trigger if a specific process or group of processes ceases to provide service. The osdog is also used to ensure availability while the system is booting: if the single cpuset that is booting suffers a hang, then its osdog timer hardware will trigger, and the other cpuset will take over as the booting cpuset (if there is no other cpuset, then the booting cpuset gets to try again). Introduction ------------ Part of this procedure is designed to upgrade CPUsets on systems that are already installed with firmware up to the necessary level to run release 6.7 - that is CPUsets with version 17 or later PROMs. If a CPUset does not have version 17 or later PROMs then it may be necessary to bring it up to this level first: consult a field service engineer to arrange this. NB: the upgrade utilities cannot be run if the system was booted from a read-only device, such as a CD-ROM or a network: the system must be running from a writeable device such as a disk before this procedure can be performed. In particular, if the system has just been fully installed from CD-ROM, then it should be rebooted from disk before performing this procedure. The remainder of this section describes the following: . Special procedure for performing the patch install including: - instructions for the OBP/Prom upgrade utility. . CPUset integration instructions Patch Installation Procedure ---------------------------- This procedure assumes you have access to the RCP port and/or the ft1800's Console Alarms and Fans (CAF) module/FRU. This procedure assumes that the CPUsets are already in sync. To determine if the CPUsets are in sync, use the procedure described in Note 3, Determining If In Sync. 1. Use cmsconfig to disable the CPUset in A-CPU (i.e. CPU 0). 2. Install the patch using the procedure described below in note 1, Installing The Patch. 3. Upgrade the PROM of the CPUset in B-CPU using the procedure described below in note 2, Upgrading PROM. NOTE: This is a MANDATORY patch, therefore when asked for confirmation, enter 'yes'. Failure to upgrade will leave the system with an unsupported configuration. 4. Use cmsconfig to enable the CPUset in A-CPU (i.e. CPU 0). Wait for the system to go into sync using the procedure described below in Note 3, Determining If In Sync. 5. After the CPUsets have gone into sync, use cmsconfig to disable the CPUset in B-CPU (i.e. CPU 1). 6. Upgrade the PROM of the CPUset in A-CPU using the procedure described below in note 2, Upgrading PROM. NOTE: This is a MANDATORY patch, therefore when asked for confirmation, enter 'yes'. Failure to upgrade will leave the system with an unsupported configuration. 7. Use cmsconfig to enable the CPUset in B-CPU (i.e. CPU 1). Wait for the system to go into sync using the procedure described below in Note 3, Determining If In Sync. 8. Shutdown system to run level 0 (i.e. shutdown -g0 -i0 -y). 9. When the system has reached run level 0 disable auto boot (i.e. {0} ok setenv auto-boot? false). 10. Reset the system using the reset-all command (i.e. {0} ok reset-all). Note. The 'reset-all' command is required to prevent failure of the patch installation. 11. Allow the system to return to the PROM prompt. Enable the osdog (i.e. {0} ok h# 4f set-conf-osdog). 12. Re-enable auto boot (i.e. {0} ok setenv auto-boot? true). 13. Reset the system using the reset-all command (i.e. {0} ok reset-all). Note. The 'reset-all' command is required to prevent failure of the patch installation. 14. Allow the system to boot. 15. Use 'prtconf -V' to verify the system is running the latest PROM (prtconf -V should result in the following text being displayed: OBP 3.7.23.0 1999/06/18 14:29) Note 1 - Installing The Patch ----------------------------- 1. Copy the compressed patch file onto the target system, into the following directory: /var/tmp/ 2. Uncompress, extract and install the patch files using one of the following two methods: (a) If the patch delivered is a compressed cpio file, i.e file name of ".cpio.Z" format, use the following commands: cd /var/tmp/ zcat .cpio.Z | cpio -iBVdmcu cd ./installpatch . (b) If the patch delivered is a compressed tar file, i.e file name of ".tar.Z" format, use the following commands: cd /var/tmp/ zcat .tar.Z | tar -xvf - cd ./installpatch . where is the identifier of the patch (e.g. 107926-04) Note 2 - Upgrading PROM ----------------------- 1. The flash update utility shipped with this patch is called: netra_ft_1800.flash.update-09 and can be found in the directory: /usr/platform/SUNW,Ultra-4FT/lib The utility CANNOT be executed whilst running in sync. 2. At the command line execute the upgrade-utility, cd /usr/platform/SUNW,Ultra-4FT/lib ./netra_ft_1800.flash.update-09 The update utility will verify the integrity of the binary delivered and will return reporting the current version of the PROM running and the version available to be upgraded to. 3. You will be asked if you wish to continue. Answer 'yes' to upgrade, 'no' to abort. Answering 'yes' will display a list of NVRAM variables for the user to note, and reset later if different from the default values. Note these for safety, though it should not be necessary to use them. 4. You will be asked if you wish to continue. Answer 'yes' to upgrade, 'no' to abort. Answering 'yes' will perform the upgrade, and indicate the success or failure of the upgrade, 5. If the update fails note any error messages and contact your Sun Microsystems representative. 6. You will be asked "Do you wish to halt the system now". Answer 'no'. Note 3 - Determining If In Sync ------------------------------- To determine if the CPUsets are in sync, the leds on the front panels of the CPUsets should be examined. If the CPUsets are in sync the small amber LEDs marked 'Diag' will flash together. Alternatively use the following command to determine if running in-sync: /usr/platform/SUNW,Ultra-4FT/SUNWcms/lib/u4ftvmctl -c which will report "CPUsets running combined" if running in-sync. CPUset Integration ------------------ In order to reduce the time during final CPUset integration, in which the system is not performing system services (i.e. "stop-dead time"), it is necessary to alter the default setting using cmsconfig. In cmsconfig: Select the 'ft_core' sub-system, select pri_stop_time_msecs (menu item 4), select 200 (menu item 1), Quit from cmsconfig. From now on the system will use the lower value to achieve the lowest possible 'stop-dead' time during final CPUset integration. Please note, although 200ms is selected it is still not possible to achieve that time. However, it does play a significant role in deciding at what point to start final integration.