Towards a General Method for Logical Rule Extraction from Time Series | by Kamal Acharya

In the ever-expanding field of data science, the extraction of meaningful patterns from time series data is a pivotal task, particularly in sectors reliant on temporal analytics, such as healthcare, finance, and environmental monitoring. This research work simplifies the intricacies of time series data into digestible timelines, setting the stage for the extraction of logical rules through Temporal APRIORI algorithm.

Historically, algorithms designed to extract rules from time series data were often heavily dependent on domain-specific knowledge and tailor-made solutions. Breaking away from these constraints, the team’s approach adopts a domain-independent temporal abstraction method. This innovation enables the transformation of multivariate time series into timelines, which can subsequently be mined for logical temporal rules using an adaptation of the renowned APRIORI algorithm, a classic technique in machine learning known for its efficacy in uncovering association rules in large datasets.

Temporal APRIORI is an extension of the classic APRIORI algorithm, which is widely used for mining frequent itemsets and association rules in transactional databases. The classic APRIORI targets datasets where the transaction order does not matter; it’s concerned with the co-occurrence of items within transactions, not with the sequence or timing of those transactions. Temporal APRIORI, on the other hand, is designed to handle data where the timing and order of events are significant — namely, time series data

Here’s a detailed look at how Temporal APRIORI works:

Step 1: Data Preparation
Time series data is prepared by creating a timeline, which involves abstracting raw data into intervals or segments with certain labels that describe the state or the change in state of a variable over time. This is done through statistical analysis and the application of domain-specific thresholds.

Imagine we have a multivariate time series data from a group of patients with heart disease. The dataset includes daily recordings of several biometrics such as heart rate, blood pressure, cholesterol levels, and body weight.

Using temporal abstraction, we transform the daily recordings into a timeline. Each biometric is abstracted into intervals with labels such as “high,” “normal,” or “low,” based on the statistical measures for each day:

An interval with a high average heart rate is labeled HighHR.
An interval with a normal blood pressure reading is labeled NormalBP.
An interval where cholesterol levels are elevated is labeled HighChol.

The temporal relation between these intervals can be used as mentioned by Allen’s relations as shown in figure below:

Step 2: Discovering Frequent Patterns
Temporal APRIORI then searches for patterns across these timelines that occur more frequently than a user-defined threshold, known as the support. These patterns consist of ordered sets of interval-based events. For example, in a stock market dataset, a frequent pattern might be that a dip in one stock is often followed by a rise in another within a certain time frame.

Step 3: Rule Generation
Once these frequent patterns are identified, Temporal APRIORI generates association rules. These rules predict the occurrence of an event based on the presence of another event and the temporal relationship between them. A rule might look like this:
“If stock A decreases and within two days stock B has not increased, then stock C will increase within the next three days.”

Step 4: Rule Evaluation
Each generated rule is evaluated based on two criteria:

Confidence: This measures the conditional probability that the consequent (outcome) of a rule will occur given the antecedent (condition).

Support: This measures how often a rule is applicable in the dataset as a whole.

Step 5: Rule Pruning
Rules that don’t meet the minimum confidence or support thresholds are discarded. This pruning step is crucial for reducing the number of potential rules to a subset of the most predictive and significant ones.

A general, domain-independent, temporal abstraction algorithm.

Algorithm Process:

Initialization: A set T is initialized to hold the abstracted timelines.
Iteration Over Series: The algorithm loops over each time series Fi(t) from 1 to n, where n is the number of series to be abstracted.

3. Abstraction Function: Each time series Fi(t) is passed through an abstraction function Abs, which takes the series Fi(t), the degree of derivative z, the number of labels l, and the displacement k.

4. Creation of Abstracted Timeline:

The Abs function processes the series and produces an abstracted timeline Ti, which is then added to the set T.

5. Output:

The algorithm returns the set T, which contains the abstracted timelines for all the time series data.

Abstraction Function Details:

The function Abs creates an abstracted representation of each time series. It does this by applying a statistical abstraction based on the mean μ and the standard deviation σ, considering the number of labels l and the displacement k. For each interval within the time series, it assigns a label based on where the mean μxy_j of that interval falls relative to the overall mean μj and the standard deviation σj for the j-th variable.

Example:

Imagine a time series that captures the daily temperature readings of a region over a month. Suppose we choose 𝑧=0(for the raw data), 𝑙=3 labels (low, medium, high), and a displacement k of 0.5. The algorithm would proceed as follows:

Calculate the mean and standard deviation for the entire series.
Segment the series into intervals (perhaps days or weeks).
Calculate the mean for each interval.
Assign a label to each interval based on the calculated mean 𝜇𝑥𝑦 and how it compares to the series’ overall mean 𝜇 and standard deviation 𝜎 according to the labeling conditions.
The output could be a sequence such as “low-medium-high-low”, representing a simplified view of the temperature changes over the month.

Sciavicco, G., Stan, I.E. and Vaccari, A., 2019, May. Towards a general method for logical rule extraction from time series. In International Work-Conference on the Interplay Between Natural and Artificial Computation (pp. 3–12). Cham: Springer International Publishing.

Source link