Quiz: Observability, Monitoring, and Automated Discovery

Test your understanding of observability pillars, telemetry, OpenTelemetry, eBPF, automated discovery, dynamic topology, and configuration drift detection with these review questions.

1. How does observability differ fundamentally from traditional monitoring in its approach to understanding system behavior?

Observability replaces monitoring entirely by using machine learning to predict failures before they occur, eliminating the need for threshold-based alerting
Observability collects only structured metric data, while traditional monitoring collects unstructured logs that require manual parsing
Observability enables investigation of unknown failure modes by providing data to answer arbitrary questions about system state, while monitoring tracks predefined metrics and alerts on known failure conditions
Observability requires agents installed on every component, while traditional monitoring uses agentless network scanning that does not affect system performance

Show Answer

The correct answer is C. The fundamental distinction is between answering "Is there a problem?" (monitoring) and "Why is there a problem?" (observability). Monitoring tracks predefined thresholds—CPU above 80%, disk below 10%—and alerts when known failure conditions occur. Observability provides logs, metrics, and traces rich enough to investigate unexpected failures whose causes were not anticipated when the system was built. In complex distributed systems where failure modes are combinatorially large, observability's ability to answer arbitrary questions is essential.

Concept Tested: Observability / Monitoring

2. The three pillars of observability—logs, metrics, and traces—each answer a different class of question. Which pillar most directly reveals service-to-service dependencies in a distributed application?

Logs, because application log lines include the hostname of every service called during request processing
Metrics, because time-series graphs show correlated latency spikes that reveal which services affect each other
Traces, because distributed traces record the complete path a request travels across services, including each service-to-service call and its timing
Metrics, because error rate correlations between services mathematically prove dependency relationships

Show Answer

The correct answer is C. Distributed traces record the full journey of a request through a system—a trace spans from the initial API call through every downstream service invocation, database query, and external call required to complete the request. Each service-to-service call appears as a "span" in the trace with timing and metadata. This trace data directly reveals dependency relationships that can be automatically extracted and used to populate IT management graph edges, making traces the most direct source for service topology discovery.

Concept Tested: Telemetry / Observability

3. OpenTelemetry has become the industry standard for observability instrumentation. What problem does it solve that motivated its creation?

OpenTelemetry provides a visual dashboard interface that replaces proprietary monitoring UIs from vendors like Datadog and New Relic
OpenTelemetry solves vendor lock-in by providing vendor-neutral APIs, SDKs, and data formats, so instrumented applications can export telemetry to any observability backend without code changes
OpenTelemetry provides automatic anomaly detection that identifies unusual patterns in telemetry data without manual threshold configuration
OpenTelemetry encrypts all telemetry data in transit and at rest, solving the security problem of sensitive operational data being transmitted in plaintext

Show Answer

The correct answer is B. Before OpenTelemetry, organizations faced vendor lock-in: instrumenting an application for Datadog meant Datadog-specific SDKs; switching to a different backend required re-instrumenting the entire application. OpenTelemetry defines a standard set of APIs and SDKs for generating traces, metrics, and logs, plus a standard protocol (OTLP) for transmitting them. The OpenTelemetry Collector routes this data to any backend. Organizations can change observability vendors without touching application code, and a single instrumented application can send data to multiple backends simultaneously.

Concept Tested: OpenTelemetry

4. eBPF (Extended Berkeley Packet Filter) enables automated dependency discovery without requiring changes to application code. Which mechanism makes this possible?

eBPF installs a lightweight proxy sidecar container alongside each application that intercepts all network traffic before forwarding it to the destination
eBPF uses machine learning to infer application dependencies from historical network traffic patterns stored in monitoring databases
eBPF runs sandboxed programs in Linux kernel space that attach to kernel events, allowing observation of actual network connections and system calls as they happen without modifying applications
eBPF reads application configuration files and environment variables to discover which services are configured as dependencies in deployment manifests

Show Answer

The correct answer is C. eBPF's power comes from operating in kernel space, where all network connections and system calls must pass regardless of the application making them. An eBPF program attached to the appropriate kernel events sees every network connection establishment, every socket operation, and every DNS lookup—without any cooperation from the application. This means even legacy applications with no observability instrumentation, third-party SaaS agents, and background processes are visible. The discovered connections represent actual communication patterns, not documented or intended ones.

Concept Tested: eBPF / Extended Berkeley Packet Filter / Automated Discovery

5. Automated discovery combines multiple complementary techniques rather than relying on a single method. Why is this multi-source approach essential for comprehensive graph accuracy?

Regulatory compliance frameworks require at least three independent discovery methods before infrastructure data can be used for compliance reporting
Different discovery methods have different blind spots—network scanning finds devices but misses application-layer dependencies, while traces reveal service calls but may miss infrastructure relationships—so combining sources achieves coverage no single method can
Multiple discovery sources generate redundant data that graph databases require for consistency validation before accepting new nodes and edges
Using multiple sources increases discovery throughput by distributing scanning tasks across agents, reducing the time required to complete a full inventory

Show Answer

The correct answer is B. Each discovery technique has inherent coverage gaps. Network SNMP/LLDP scanning discovers physical devices and their connections but knows nothing about which application processes run on them or how those processes depend on each other. OpenTelemetry traces reveal service-to-service calls but only for instrumented applications. eBPF discovers actual network connections but may not know the business purpose of each connection. Cloud APIs provide authoritative resource inventories but only for their own platform. Combining these sources produces a graph where each method's blind spots are covered by others, achieving the comprehensive accuracy that no single method can provide.

Concept Tested: Auto-Discovery / Automated Discovery

6. Network topology and service topology provide complementary but distinct views of IT infrastructure. Which scenario illustrates a case where service topology is critical but network topology alone is insufficient?

Determining which physical switches to replace during a data center hardware refresh project
Identifying which microservices will be impacted when an API gateway is updated to a new version
Planning cable routes for new server installations in a colocation facility
Verifying that redundant network paths exist between primary and backup data centers

Show Answer

The correct answer is B. API gateway version updates affect the microservices that call through that gateway—a software dependency relationship. Network topology shows physical connections between devices but cannot reveal which microservices route API calls through a specific API gateway instance, what authentication protocols they use, or what behavior changes in the new version might break. Service topology—discovered through distributed traces and service mesh telemetry—maps these application-layer dependencies that determine the blast radius of software changes. The other options (cable routing, hardware refresh, redundant paths) are inherently physical/network-layer questions where network topology is the appropriate view.

Concept Tested: Network Topology / Service Topology

7. Dynamic topology is contrasted with static dependency maps in the context of cloud-native infrastructure. What specific capability makes dynamic topology essential for Kubernetes environments?

Dynamic topology supports three-dimensional visualization of pod relationships that static maps cannot render in two dimensions
Kubernetes pod instances are created and destroyed constantly based on scaling and rolling deployments, making topology change at a rate where static maps become stale within minutes—dynamic topology uses Kubernetes watch APIs to reflect these changes in near real-time
Dynamic topology encrypts dependency information using rotating keys that static map formats cannot support
Kubernetes requires dynamic topology because its network policy engine will reject static map configurations that do not match real-time pod state

Show Answer

The correct answer is B. Kubernetes environments are characterized by rapid, continuous change: auto-scalers add and remove pod replicas based on load, rolling deployments replace old pod versions with new ones, and failed pods are automatically rescheduled. A static dependency map created at 9 AM may be significantly wrong by 9:15 AM. Dynamic topology addresses this by subscribing to the Kubernetes watch API, which delivers real-time notifications of every pod creation, termination, and state change. The graph is updated automatically within seconds of each change, ensuring the topology remains accurate even in highly dynamic environments.

Concept Tested: Dynamic Topology

8. Configuration drift is described as particularly dangerous in IT environments. Which characteristic of configuration drift makes it more hazardous than other types of IT problems?

Configuration drift causes immediate, noticeable performance degradation that typically triggers monitoring alerts within seconds of the first deviation
Configuration drift creates security vulnerabilities, compliance violations, and operational inconsistencies that are invisible until an audit, security breach, or incident exposes them—allowing undetected risk to accumulate over time
Configuration drift is irreversible once it occurs, because the original baseline configuration cannot be restored without a complete system rebuild
Configuration drift only affects cloud infrastructure because on-premises systems use hardware-enforced configuration locking that prevents unauthorized changes

Show Answer

The correct answer is B. Drift's primary danger is its invisibility. A server missing a security patch, a firewall rule that was temporarily opened and never closed, or an application configured to point to a test database instead of production may operate normally for months—passing monitoring checks, serving requests, and appearing healthy—while creating a widening security or compliance exposure. The problem is discovered only when an attacker exploits the gap, when a compliance audit reveals the deviation, or when the misconfiguration triggers an incident under specific conditions. This delayed discovery means organizations are often operating with significantly more risk than they realize.

Concept Tested: Configuration Drift / Drift Detection

9. A drift detection system compares actual server configurations against baseline definitions stored in infrastructure-as-code (Terraform templates). A deviation is found: a production server has an additional open port (8443) not present in the Terraform template. How does IT management graph context enhance the response to this alert?

The graph identifies which Terraform module contains the port definition, enabling engineers to update the template to match the drifted configuration and close the alert
The graph traversal reveals which business services are exposed through this server, enabling risk prioritization—a port deviation on a server backing a revenue-critical business service demands a faster response than one on a low-priority internal tool
The graph generates a network diagram showing the physical path from the internet to the drifted port, enabling network engineers to trace the exact cable route for remediation
The graph calculates the age of the Terraform template, enabling engineers to determine whether the deviation predates the template or was introduced after it was created

Show Answer

The correct answer is B. Drift detection identifies that a deviation exists; IT management graph context determines how urgently it must be remediated. Not all servers carry equal business risk—a misconfigured port on a server backing a healthcare patient records application that touches 50,000 patient records and supports a Tier-1 HIPAA-covered business service is an emergency. The same deviation on an internal development tool used by three engineers may be resolved in the next sprint. Graph traversal from the drifted server upward through applications and business services, combining topology with business impact properties, provides the risk context that converts a raw alert into an appropriately prioritized action.

Concept Tested: Drift Detection / Configuration Drift

10. An organization achieves 98% graph accuracy for its IT management graph using automated discovery, compared to 40-50% accuracy with its previous manually maintained CMDB. Which combination of automated discovery techniques most directly explains this accuracy improvement?

Hiring more experienced CMDB administrators who apply stricter data entry validation and conduct more frequent manual audits of configuration item records
Switching from a relational database to a native graph database, which enforces referential integrity constraints that prevent inaccurate relationship records from being stored
Continuous observation of actual infrastructure behavior through eBPF network monitoring, OpenTelemetry traces, and cloud platform APIs—replacing documentation of intended configuration with evidence of actual current state
Implementing JSON Schema validation on all incoming CMDB records, which rejects entries that do not conform to the defined schema before they can be stored

Show Answer

The correct answer is C. The accuracy gap between manual and automated approaches stems from their fundamental data sources. Manual CMDBs document intended or remembered configuration—how someone thinks the system is configured, often weeks or months after the actual configuration was deployed. Automated discovery observes actual behavior: eBPF captures real network connections happening right now, OpenTelemetry traces record actual service calls, and cloud APIs return authoritative resource states. When a developer deploys a new microservice that immediately begins calling an undocumented database, automated discovery detects and records the relationship within seconds. No human intervention is needed, no ticket must be filed, and the graph reflects reality rather than documentation.

Concept Tested: Automated Discovery / Auto-Discovery