Manual Chapter : Diagnostic agent overview

Applies To:

Show Versions Show Versions


  • 1.5.0, 1.4.0, 1.2.0
Manual Chapter

Diagnostic agent overview

The Diagnostic Agent is the entity responsible for performing a diagnostic process. Internally, the agent has a component model which represents the system. Each component represents the health of the system. The agent has the ability to load a diagnostic profile and execute it. The profile consists of a set of tasks to carry out work of updating components.
The Diagnostic Agent maintains the list of components that make up the platform. This list varies, depending on whether it is for an appliance, blade, or system controller, and each platform type defines its own set of hardware components. The components can be services, firmware, or hardware blocks. Components that represent services can be discovered and created dynamically.
Example components include:
  • NVME Drives
  • FPGAs (ATSE & VQF)
  • LOP Application
  • Memory
  • CPU
  • Sensors
  • Platform HAL Service
  • FPGA Manager

Component overview

Components adhere to these guidelines and enable multiple tasks to update component health:
  • A component container can have one or more components.
  • A component can have zero or more child components.
  • A component can have zero or more attributes.
  • Each hardware type has a fixed set of hardware components.
  • Each component has a Health and Severity value.
  • Each attribute has a Health and Severity value.
A component encapsulates some object in the system and includes these fields:
  • A unique key that identifies the component. The key is a lower-case, Unix path separated name: blade/hardware/drives, for example. The root is a more generic name, and the leaf is generally more specific
    • By convention, we have the following root component nodes:
      • controller/
      • blade/
      • chassis/
    • By convention we have three major subject nodes:
      • hardware/
        • Components associated with the physical hardware
      • services/
        • Components associated with a running service
      • firmware/
        • Components associated with a firmware element
  • A user-?friendly name or description
  • A health value
  • A severity value
  • 0..N attribute values (0 - N?)
  • 0..M child components (0 - M?)
  • A parent component
The component structure is hierarchical. Each component can have a single parent and 0 to N child components. A root component is said to have no parent. Examples of root components include: blade, chassis, controller.
By convention, components are separated by the Unix path separator, a forward slash (/).

Component health overview

A component can have these health states:
The component is considered healthy. This is the initial health state.
The component is no longer healthy.
The health status does not apply to this component.
You can configure component health to impact the health of the parent component. For example, if the "nvme0n1" component is unhealthy, then the health state for the "blade/hardware/drives," "blade/hardware," and "blade" components become unhealthy.

Component severity levels and health

Each component has a severity value, which adds weight to the health status, and follows standard syslog severity levels. The severity value also determines the health of the attribute and/or component.
The system is unusable.
Create RMA SO#.
A problem has occurred and must be remedied immediately.
Create support SR#.
A problem has occurred and must remedied soon.
Create support SR#.
A problem has occurred that needs attention.
Run foreground diagnostics (currently unavailable, but to be implemented in a future version).
A possible problem has occurred that needs attention.
A condition has occurred that might need attention. For example, voltage limits that are out of range do not fail and are instead marked as Notice.
The component is operating normally.
The component does not reflect health conditions.

Component attributes overview

An attribute of a component encapsulates a specific value for a moment in time. Each attribute includes:
  • A unique key that identifies the health attribute
    • Attributes are similar fields of a class or structure; they have a name and a single basic value type.
    • Attributes keys are all lowercase.
    • Attributes can contain generic and specific names, separated by a colon (:). For example: switch:port:link-status.
  • A user-friendly name
  • An optional reference to a criteria object
  • A health status
  • A severity value
  • A current value
  • An updated-at time stamp
  • An updated count
  • Extra data
Each attribute can have an association with a criteria object. A criteria object can have 1..M limits, which can be applied to a data object. Each criteria limit includes:
  • A unique key, which describes the limit within the criteria.
  • A user-friendly message associated with the limits.
  • A string expression that can be applied to a data object.
  • A severity to set the attribute to if the above expression evaluates to "true".
Each criteria object can have multiple limits. The limits are evaluated in the order in which they are defined within the framework. If one expression evaluates to "true", then the evaluator stops processing the remaining limits.