Objective: To clear ILOM FMA faults and to reset the SP.
Applies to Engineered Systems (Exadata X2, X3) and other machines where ILOM FMA are used.
Solution:
1. Log into the node ILOM and once you are at the SP prompt
% ssh -l root <IP address of Service Processor>
2. List all known faults in the system.
Example:
-> show /SP/faultmgmt
* Enter the fault management shell to obtain pertinent information about the fault.
-> start /SP/faultmgmt/shell
3. Start the Fault Management:
Are you sure you want to start /SP/faultmgmt/shell(y/n)? y
faultmgmtsp>
* Use the 'fmadm faulty' command to identify the faulty component/FRU.
4. Example of clearing a fan fault:
Show fault:
faultmgmtsp> fmadm faulty
Will list the /SYS/MB fault with UUID
------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2012-12-15/21:53:29 68a0c563-e609-e8fb-9fae-c03c46867474 SPX86-8002-2J Critical
Fault class : fault.chassis.domain.boot.power-off-unexpected
FRU : /SYS/MB
(Part Number: 511-1213-06)
(Serial Number: 0328MSL-1106BA12XY)
Description : Power to server is not available due to a malfunctioning component detected by CPLD.
Use the above UUID to clear the fault.
5. Clear the fault:
faultmgmtsp> fmadm repair 68a0c563-e609-e8fb-9fae-c03c46867474
show faults again / repeat till empty:
faultmgmtsp> fmadm faulty
Exit out of Fault Manager Shell:
faultmgmtsp> exit
After clearing the actual fault, please continue to reset the SP.
6. Reset the SP:
->reset /SP
Legend:
ILOM - Integrated Lights Out Manager
FMA - Fault Management Architecture
SP - Service Processor
You are all set! All your faults are clear now.
Reference / Read More:
URL - http://docs.oracle.com/cd/E20815_01/html/E20894/gjuqk.html
Title - How to Clear Faults Using the Oracle ILOM Command-Line Interface
URL - http://docs.oracle.com/cd/E20689_01/html/E20695/z40000971312677.html
Title - Access the SP (Oracle ILOM)
URL - http://docs.oracle.com/cd/E20815_01/html/E20894/gjshy.html
Title - How to Reset the Oracle ILOM SP Using the Web Interface
Sun Server X2-8 (formerly Sun Fire X4800 M2) Diagnostics Guide, Sun Server X2-8 (formerly Sun Fire X4800 M2) Documentation Library
Section: How to Clear Faults Using the Oracle ILOM Command-Line Interface
URL: http://docs.oracle.com/cd/E20815_01/html/E20894/gjuqk.html
Applies to Exadata as well.
Oracle Solaris Administration: Common Tasks, Oracle Solaris 11 Information Library
Section: Fault Management Overview
URL: http://docs.oracle.com/cd/E23824_01/html/821-1451/gliqg.html
Applies to Exadata FMA as well.
Writing Device Drivers, Oracle Solaris 11 Information Library
Section: Oracle Fault Management Architecture I/O Fault Services
URL: http://docs.oracle.com/cd/E23824_01/html/819-3196/fmaiofs.html
Applies to Exadata FMA as well.
Applies to Engineered Systems (Exadata X2, X3) and other machines where ILOM FMA are used.
Solution:
1. Log into the node ILOM and once you are at the SP prompt
% ssh -l root <IP address of Service Processor>
2. List all known faults in the system.
Example:
-> show /SP/faultmgmt
* Enter the fault management shell to obtain pertinent information about the fault.
-> start /SP/faultmgmt/shell
3. Start the Fault Management:
Are you sure you want to start /SP/faultmgmt/shell(y/n)? y
faultmgmtsp>
* Use the 'fmadm faulty' command to identify the faulty component/FRU.
4. Example of clearing a fan fault:
Show fault:
faultmgmtsp> fmadm faulty
Will list the /SYS/MB fault with UUID
------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2012-12-15/21:53:29 68a0c563-e609-e8fb-9fae-c03c46867474 SPX86-8002-2J Critical
Fault class : fault.chassis.domain.boot.power-off-unexpected
FRU : /SYS/MB
(Part Number: 511-1213-06)
(Serial Number: 0328MSL-1106BA12XY)
Description : Power to server is not available due to a malfunctioning component detected by CPLD.
Use the above UUID to clear the fault.
5. Clear the fault:
faultmgmtsp> fmadm repair 68a0c563-e609-e8fb-9fae-c03c46867474
show faults again / repeat till empty:
faultmgmtsp> fmadm faulty
Exit out of Fault Manager Shell:
faultmgmtsp> exit
After clearing the actual fault, please continue to reset the SP.
6. Reset the SP:
->reset /SP
Legend:
ILOM - Integrated Lights Out Manager
FMA - Fault Management Architecture
SP - Service Processor
You are all set! All your faults are clear now.
Reference / Read More:
URL - http://docs.oracle.com/cd/E20815_01/html/E20894/gjuqk.html
Title - How to Clear Faults Using the Oracle ILOM Command-Line Interface
URL - http://docs.oracle.com/cd/E20689_01/html/E20695/z40000971312677.html
Title - Access the SP (Oracle ILOM)
URL - http://docs.oracle.com/cd/E20815_01/html/E20894/gjshy.html
Title - How to Reset the Oracle ILOM SP Using the Web Interface
Sun Server X2-8 (formerly Sun Fire X4800 M2) Diagnostics Guide, Sun Server X2-8 (formerly Sun Fire X4800 M2) Documentation Library
Section: How to Clear Faults Using the Oracle ILOM Command-Line Interface
URL: http://docs.oracle.com/cd/E20815_01/html/E20894/gjuqk.html
Applies to Exadata as well.
Oracle Solaris Administration: Common Tasks, Oracle Solaris 11 Information Library
Section: Fault Management Overview
URL: http://docs.oracle.com/cd/E23824_01/html/821-1451/gliqg.html
Applies to Exadata FMA as well.
Writing Device Drivers, Oracle Solaris 11 Information Library
Section: Oracle Fault Management Architecture I/O Fault Services
URL: http://docs.oracle.com/cd/E23824_01/html/819-3196/fmaiofs.html
Applies to Exadata FMA as well.
I think that thanks for the valuabe information and insights you have so provided here. Fault Location Services Sandy Hook
ReplyDelete