Skip to main content
The series_pearson_correlation function calculates the Pearson correlation coefficient between two numeric dynamic arrays (series). This measures the linear relationship between the two series, returning a value between -1 and 1, where 1 indicates perfect positive correlation, -1 indicates perfect negative correlation, and 0 indicates no linear correlation. You can use series_pearson_correlation when you need to measure the strength and direction of linear relationships between time-series datasets. This is particularly useful for identifying related metrics, detecting causal relationships, validating hypotheses about system behavior, or finding leading indicators of performance issues.

For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
In Splunk SPL, you would typically need to export data and use external statistical tools to calculate correlation. In APL, series_pearson_correlation provides built-in correlation analysis for array data.
... | stats list(metric1) as m1, list(metric2) as m2 by group
... (manual correlation calculation or external tool)
In SQL, correlation functions exist but typically operate on row-based data. In APL, series_pearson_correlation works directly on array columns, making time-series correlation analysis more straightforward.
SELECT CORR(metric1, metric2) AS correlation
FROM measurements
GROUP BY group_id;

Usage

Syntax

series_pearson_correlation(series1, series2)

Parameters

ParameterTypeDescription
series1dynamicA dynamic array of numeric values.
series2dynamicA dynamic array of numeric values.

Returns

A numeric value between -1 and 1 representing the Pearson correlation coefficient:
  • 1: Perfect positive linear correlation
  • 0: No linear correlation
  • -1: Perfect negative linear correlation

Use case examples

  • Log analysis
  • OpenTelemetry traces
  • Security logs
In log analysis, you can use series_pearson_correlation to identify relationships between request durations across different geographic regions, helping understand if performance issues are correlated.Query
['sample-http-logs']
| extend city1 = iff(['geo.city'] == 'Tokyo', req_duration_ms, 0)
| extend city2 = iff(['geo.city'] == 'Nagasaki', req_duration_ms, 0)
| summarize tokyo_times = make_list(city1), nagasaki_times = make_list(city2)
| extend correlation = series_pearson_correlation(tokyo_times, nagasaki_times)
| project correlation
Run in PlaygroundOutput
correlation
0.87
This query calculates the correlation between request durations in Tokyo and Nagasaki, revealing if performance issues in one region tend to coincide with issues in another.
  • series_magnitude: Calculates the magnitude of a series. Use when you need vector length instead of correlation.
  • series_stats: Returns comprehensive statistics. Use when you need variance and covariance components separately.
  • series_subtract: Performs element-wise subtraction. Often used to compute deviations before correlation analysis.
  • series_multiply: Performs element-wise multiplication. Use for weighted combinations instead of correlation.