Introduction

What is tspy?

tspy is the Python wrapper to the Apache Spark-powered time-series library.

The library is provided as part of IBM Analytics Engine and can be used via IBM Watson Studio.

To load the library, we run:

import tspy

However, we may also need to load other modules.

What is a time series: STS and MTS?

A time series is a sequence of data values measured at successive, though not necessarily regular, points in time. A time point is called a timestamp, and its combination with the associated sequence of data values is called an observation.

We can have one or many data columns at each time point.

\[ \begin{align}\begin{aligned}\begin{split}\begin{array}{cc} timestamp & data1 \\ t1 & o1_1 \\ t2 & o1_2 \\ t3 & o1_3 \\ ... \\ tn & o1_n \\ \end{array}\end{split}\\\begin{split}\begin{array}{ccc} timestamp & data1 & data2 \\ t1 & o1_1 & o2_1 \\ t2 & o1_2 & o2_2 \\ t3 & o1_3 & o2_3 \\ ... \\ tn & o1_n & o2_n \\ \end{array}\end{split}\end{aligned}\end{align} \]

In general, you time-series data can be organized as a single time-series (STS) object, as given above, or a multiple time-series (MTS) object, as given below. In MTS, each time-series belongs to a particular grouping value

\[\begin{split}\begin{array}{cccc} group & timestamp & data1 & data2 \\ g1 & t1 & o1_1 & o2_1 \\ g1 & t2 & o1_2 & o2_2 \\ g1 & t3 & o1_3 & o2_3 \\ g1 & ... \\ g1 & tn & o1_n & o2_n \\ g2 & t'1 & o1'_1 & o2'_1 \\ g2 & t'2 & o1'_2 & o2'_2 \\ g2 & t'3 & o1'_3 & o2'_3 \\ g2 & ... \\ g2 & t'n & o1'_n & o2'_n \\ \end{array}\end{split}\]

What is a timestamp in tspy?

A timestamp (or time-tick) can be, but must not necessarily be, associated with a time reference system (TRS), which defines the granularity of each timetick and the start time.

The time is stored as a long type. In the simplest scenario, a timestamp is just an integer value, and can be inferred from the index of the data

import tspy
values = [1.0, 2.0, 4.0]
x = tspy.time_series(values)
x
TimeStamp: 0     Value: 1.0
TimeStamp: 1     Value: 2.0
TimeStamp: 2     Value: 4.0

To make the long value human-readable, it needs to be mapped to a time-reference-system (TRS).

TRS is a local, regional or global system used to identify time. A time reference system defines a specific projection for forward and reverse mapping between timestamp and its numeric representation. A common example that most of us are familiar with is UTC time, which maps a timestamp (Jan 1, 2019 12am midnight GMT) into a 64-bit integer value (1546300800000) that captures the number of milliseconds that have elapsed since Jan 1, 1970 12am (midnight) GMT. Generally speaking, the timestamp value is better suited for human readability, while the numeric representation is better suited for machine processing.

What is an observation in tspy?

An observation is a combination of a timestamp and a value which can be any, e.g. numeric value, categorical value, or an array of numeric/categorical values.

In Python, a numeric value is represented in the computer using one of the following types:

  • built-in: int, float

  • np.ndarray, pd.dataframe: int32, int64, float64

In tspy, an observation is of type Observation. However, you don’t create it directly. Instead, an observation is created using observation() API.

import tspy
# simple timestamp ~ an int
x = tspy.observation(1, 1.0)

What is an observation collection in tspy?

An observation collection is a sequence of observation, with certain properties. It is described in class ObservationCollection. However, we don’t create it directly from the class. Instead, an observation collection is created using tspy.observations().

import tspy
observations = tspy.observations(
   tspy.observation(1, 1.0),
   tspy.observation(2, 2.0),
   tspy.observation(3, 3.0),
   tspy.observation(4, 4.0)
)

Another option is to use the single-time-series builder tspy.builder() to create a single-time-series object, from which we can extract the observation collections using result() API.

import tspy
ts_builder = tspy.ts_builder()
ts_builder.add(tspy.observation(1,1))
ts_builder.add(tspy.observation(2,2))
ts_builder.add(tspy.observation(1,3))
observations = ts_builder.result()

What is a segment in tspy?

A segment is an observation collection with:

  • extra information: start time and end time [the start/end time needs not equal to the first/last timestamp]

  • observations are sorted in order.

It is represented by Segment class. Generally, we don’t create a segment directly and we don’t store an individual segment separately. Instead, a segment can be created by segmenting or windowing a TimeSeries object or MultiTimeSeries object, which returns a new type such as SegmentTimeSeries and SegmentMultiTimeSeries.

# .segment(window_size, offset)
seg_ts = ts.segment(3,2)
  • segment by silence:

# N/A
seg_ts = ts.segment_by(lambda x: x % 2 == 0)

seg_ts = ts.segment_by_time(3, 3)

seg_ts = ts.segment_by_anchor(lambda d: d%2 == 0, 1, 1)

What is a (single) time-series (STS) in tspy?

It is represented by TimeSeries class. To create a time-series, however, we use through the builder which accepts data in different forms. Eventually, the data is converted to an internal representation for STS and MTS.

import tspy
values = [1.0, 2.0, 4.0]
x = tspy.builder.time_series(values)
x

The example belows shows how to create a simple STS where each index denotes a day after the start time of 1990-01-01 (TRS).

import tspy
import datetime
granularity = datetime.timedelta(days=1)
start_time = datetime.datetime(1990, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone.utc)
x = tspy.time_series([1, 2, 3], granularity=granularity, start_time=start_time)

REF: builders.time_series.time_series()

What is a multi-time-series (MTS) in tspy?

It is represented by MultiTimeSeries class. To create a time-series, however, we use through the builder tspy.builders.multi_time_series.

tspy accepts data in different forms which can be converted to an internal representation for STS and MTS.

data = np.array([['', 'letters', 'timestamp', "numbers"],
      ['', "a", 1, 27],
      ['', "b", 3, 4],
      ['', "a", 5, 17],
      ['', "a", 3, 7],
      ['', "b", 2, 45]
     ])
df = pd.DataFrame(data=data[1:, 1:],
       columns=data[0, 1:]).astype(dtype={'letters': 'object', 'timestamp': 'int64', 'numbers': 'float64'})
x =  tspy.multi_time_series(df, ts_column='timestamp')

REF: builders.multi_time_series.multi_time_series()

What is a segment time-series (SegTS) in tspy?

A segment time-series, represented by either SegmentTimeSeries or SegmentMultiTimeSeries class, is a special form of time-series, as a result of segmenting the STS/MTS object.