Monitoring Platform Health

Prev Next

Maintaining robust platform health is critical for reliable operations and optimal user experience. Our platform provides multiple tools for effective health monitoring, each catering to different needs and levels of detail. In this article, we’ll outline the three main ways to monitor platform health today and preview upcoming enhancements on our roadmap.

Installer Admin Console: Application Service State Monitoring

The installer admin console serves as your first line of insight into the platform’s operational status. Through its intuitive interface, administrators can:

  • Check application service states: View the status of core services, including containers, to quickly spot any service that is stopped, failing, or restarting abnormally.
  • Diagnose basic issues: The console presents clear indications of unhealthy or degraded services, allowing for rapid troubleshooting.
  • Check container logs
  • Analyze and generate support bundles

This method is ideal for administrators who need a quick, high-level overview of application components.

Cluster Status
image.png

Support Bundle Analysis
image.png


Platform Health User Interface: KPI-driven Insights

For a more detailed analysis, the dedicated platform health UI provides:

  • Key Performance Indicators (KPIs): Real-time metrics that reflect the health of critical flows, such as data ingestion and query processing.
  • Issue Highlighting: Visual cues and alerts make it easy to identify potential bottlenecks or failures in core platform operations.
  • Focused troubleshooting: By surfacing relevant KPIs, the UI helps you pinpoint the area (ingestion or query) where attention is needed.

This interface is designed for users who want to proactively manage platform performance and address issues before they impact end users. It is only available to admins


Local Grafana Dashboards: Deep Dive with Service KPIs

For advanced monitoring and custom analysis, each deployment includes a local Grafana instance:

  • Linked from the health UI: Easy access from the platform health interface.
  • Default health dashboards: Pre-configured dashboards display service KPIs for a comprehensive view of platform health.
  • Powered by Prometheus: Grafana dashboards draw metrics from a local Prometheus service, enabling powerful visualization and historical analysis.

This approach is best for technical teams who need granular metrics and the ability to create custom visualizations and alerts. It is only available to admins.

image.png


Summary Table

Method Purpose Level of Detail Data Source Best For
Installer Admin Console Service/container status Basic/overview Application services Admins, support staff
Platform Health UI KPI-based issue detection Intermediate Application KPIs Ops, engineering
Local Grafana Dashboards In-depth KPI analysis Advanced/granular Prometheus DevOps, SRE

Example Use Case:

If a user reports slow data queries, start with the platform health UI to check query flow KPIs. If issues are detected, follow links to the local Grafana dashboard for a deeper analysis of backend service performance.


Roadmap: External Notifications for Platform Health

Looking ahead, we are developing support for notifications to external systems when platform health issues are detected. This will enable:

  • Automated incident response: Integrations with external monitoring, alerting, or ticketing systems.
  • Faster remediation: Immediate awareness for teams outside the platform’s native interfaces.

Stay tuned for updates as we enhance our platform health monitoring with seamless external notification capabilities.


© 2025 Cisco and/or its affiliates. All rights reserved.

For more information about trademarks, please visit: Cisco trademarks
For more information about legal terms, please visit: Cisco legal terms

For legal information about Accedian Skylight products, please visit: Accedian legal terms and trademarks