Monitoring Platform Health

Maintaining robust platform health is critical for reliable operations and optimal user experience. Our platform provides multiple tools for effective health monitoring, each catering to different needs and levels of detail. In this article, we’ll outline the three main ways to monitor platform health today and preview upcoming enhancements on our roadmap.

Installer Admin Console: Application Service State Monitoring

The installer admin console serves as your first line of insight into the platform’s operational status. Through its intuitive interface, administrators can:

Check application service states: View the status of core services, including containers, to quickly spot any service that is stopped, failing, or restarting abnormally.
Diagnose basic issues: The console presents clear indications of unhealthy or degraded services, allowing for rapid troubleshooting.
Check container logs
Analyze and generate support bundles

This method is ideal for administrators who need a quick, high-level overview of application components.

Cluster Status

Support Bundle Analysis

Platform Health User Interface: KPI-driven Insights

For a more detailed analysis, the dedicated platform health UI provides:

Key Performance Indicators (KPIs): Real-time metrics that reflect the health of critical flows, such as data ingestion and query processing.
Issue Highlighting: Visual cues and alerts make it easy to identify potential bottlenecks or failures in core platform operations.
Focused troubleshooting: By surfacing relevant KPIs, the UI helps you pinpoint the area (ingestion or query) where attention is needed.

This interface is designed for users who want to proactively manage platform performance and address issues before they impact end users. It is only available to admins

Local Grafana Dashboards: Deep Dive with Service KPIs

For advanced monitoring and custom analysis, each deployment includes a local Grafana instance:

Linked from the health UI: Easy access from the platform health interface.
Default health dashboards: Pre-configured dashboards display service KPIs for a comprehensive view of platform health.
Powered by Prometheus: Grafana dashboards draw metrics from a local Prometheus service, enabling powerful visualization and historical analysis.

This approach is best for technical teams who need granular metrics and the ability to create custom visualizations and alerts. It is only available to admins.

Summary Table

Method	Purpose	Level of Detail	Data Source	Best For
Installer Admin Console	Service/container status	Basic/overview	Application services	Admins, support staff
Platform Health UI	KPI-based issue detection	Intermediate	Application KPIs	Ops, engineering
Local Grafana Dashboards	In-depth KPI analysis	Advanced/granular	Prometheus	DevOps, SRE

Example Use Case:

If a user reports slow data queries, start with the platform health UI to check query flow KPIs. If issues are detected, follow links to the local Grafana dashboard for a deeper analysis of backend service performance.

Roadmap: External Notifications for Platform Health

Looking ahead, we are developing support for notifications to external systems when platform health issues are detected. This will enable:

Automated incident response: Integrations with external monitoring, alerting, or ticketing systems.
Faster remediation: Immediate awareness for teams outside the platform’s native interfaces.

Stay tuned for updates as we enhance our platform health monitoring with seamless external notification capabilities.

© 2025 Cisco and/or its affiliates. All rights reserved.

For more information about trademarks, please visit: Cisco trademarks
For more information about legal terms, please visit: Cisco legal terms
For legal information about Accedian Skylight products, please visit: Accedian legal terms and trademarks