SysFlow: Scalable System Telemetry for Improved Security Analytics
Abstract
In this talk, we introduce SysFlow as a new data representation for system behavior introspection for scalable security, compliance, and performance analytics. SysFlow is a compact open data format that lifts the representation of system activities into a flow-centric, object-relational mapping that records how applications interact with their environment—analogous to how NetFlow summarizes network communications. However, unlike NetFlow, which only captures network interactions, SysFlow connects network behaviors to processes and file access information, providing a richer context for analysis. This additional context facilitates deeper introspection into attack kill chains, resulting in analyses that yield lower false positives, and higher detection rates than traditional network-based approaches. SysFlow supports single-event and volumetric flow representations of process control flows, file interactions, and network communications. The new telemetry format drastically reduces storage requirements as compared to existing system telemetry sources, thereby enabling feature-filled analytics, process-level provenance tracking, and long-term data archival for threat hunting and forensics. We present a new open-source telemetry pipeline built atop SysFlow. The pipeline provides a set of reusable components and APIs that enable ease deployment of telemetry probes for bare-host and container workload monitoring, as well as the export of SysFlow records to S3-compliant object stores feeding into distributed security analytics jobs based on Apache Spark. Specifically, the analytics framework provides an extensible policy engine that ingests customizable security policies described in a declarative input language, providing facilities for defining higher-order logic expressions that are checked against SysFlow records. This allows practitioners to easily define security and compliance policies that can be deployed on a scalable, out-of-the-box analysis toolchain while supporting extensible programmatic APIs for the implementation of custom analytics algorithms. As a result, the pipeline enables researchers and analysts to redirect their efforts to developing and sharing analytics, rather than building support infrastructure for telemetry. The SysFlow probe has been optimized to incur minimal performance overheads and does not require program instrumentation or system call interposition for data collection, therefore having negligible impact on monitored workloads. The implementation has been validated under multiple stress test profiles. We will demonstrate use cases for the identification of advanced and persistent threats, security vulnerabilities, performance bottlenecks, and service outages.