tspy package

Entry point to time-series data-structure creation functions

Use

  • time-series related data builders, e.g.

function

description

tspy.observation()

create one observation

tspy.observations()

create one observation collection

tspy.record()

create one record

tspy.time_series()

create one time-series

tspy.multi_time_series()

create one multi-time-series

tspy.builder()

create a time-series builder

  • tspy.stream_time_series module : APIs for connecting stream data into time-series form

  • tspy.stream_multi_time_series module : APIs for connecting stream data into multi-time-series form

  • tspy.functions module : APIs for performing different time-series related operations, e.g. reduce, transforms, segment

  • tspy.models module : APIs for loading/creating a time-series-based model

  • tspy.forecasters module : APIs for different forecasting models

  • tspy.exceptions module :

  • tspy.ml module : APIs for different machine-learning methods

tspy.builder()

create a time-series builder

Returns
TSBuilder

a new time-series builder

tspy.multi_time_series(*args, **kwargs)

creates a multi-time-series object using data in either

  1. dict

  2. dataframe

  3. time-series-reader

  4. observation collection

Parameters
data:

type dict or pandas.DataFrame

  1. group by values of a given column [e.g. temperature time series and a bunch of locations (keys)],

  2. each column is turned into its own time-series [a single timestamp and multiple metrics, e.g. temperature and humidity columns]

key_columnstring

(only use when data is a pandas’s DataFrame and use-case (1)) column name containing the key, each key value is used for grouping data into a single-time-series. IMPORTANT: key_column and key_columns are used exclusively.

key_columnslist, optional

(only use when data is a pandas’s DataFrame and use-case (2)) columns to use in multi-time-series creation (default is all columns), i.e. each column is turned into its own time-series component. IMPORTANT: key_column and key_columns are used exclusively.

ts_columnstring, optional

(only use when data is a pandas’s DataFrame) column name containing time-ticks (default: time-tick is based on index into dataframe)

value_columnlist or string, optional

(only use when data is a pandas’s DataFrame and use-case (1)) column name(s) containing values (default is all columns)

granularitydatetime.timedelta, optional

the granularity for use in time-series TRS (default is None if no start_time, otherwise 1ms)

start_timedatetime, optional

the starting date-time of the time-series (default is None if no granularity, otherwise 1970-01-01 UTC)

Returns
MultiTimeSeries

a new multi-time-series

Raises
ValueError

If there is an error in the input arguments, e.g. not a supporting data type

Examples

create a dict with observation-collection values

>>> import tspy
>>> my_dict = {"ts1": tspy.time_series([1,2,3]).collect(), "ts2": tspy.time_series([4,5,6]).collect()}
>>> my_dict
{'ts1': [(0,1),(1,2),(2,3)], 'ts2': [(0,4),(1,5),(2,6)]}

create a multi-time-series from dict without a time-reference-system

>>> mts = tspy.multi_time_series(my_dict)
>>> mts
ts2 time series
------------------------------
TimeStamp: 0     Value: 4
TimeStamp: 1     Value: 5
TimeStamp: 2     Value: 6
ts1 time series
------------------------------
TimeStamp: 0     Value: 1
TimeStamp: 1     Value: 2
TimeStamp: 2     Value: 3
  • create a simple df with a single index

>>> import numpy as np
>>> import pandas as pd
>>> data = np.array([['', 'letters', 'timestamp', "numbers"],
             ...['', "a", 1, 27],
             ...['', "b", 3, 4],
             ...['', "a", 5, 17],
             ...['', "a", 3, 7],
             ...['', "b", 2, 45]
            ...])
>>> df = pd.DataFrame(data=data[1:, 1:],
              ...columns=data[0, 1:]).astype(dtype={'letters': 'object', 'timestamp': 'int64', 'numbers': 'float64'})
  letters  timestamp  numbers
0       a          1     27.0
1       b          3      4.0
2       a          5     17.0
3       a          3      7.0
4       b          2     45.0

create a multi-time-series from a df using instants format

>>> mts = tspy.multi_time_series(df, ts_column='timestamp')
>>> mts
numbers time series
------------------------------
TimeStamp: 1     Value: 27.0
TimeStamp: 2     Value: 45.0
TimeStamp: 3     Value: 4.0
TimeStamp: 3     Value: 7.0
TimeStamp: 5     Value: 17.0
letters time series
------------------------------
TimeStamp: 1     Value: a
TimeStamp: 2     Value: b
TimeStamp: 3     Value: b
TimeStamp: 3     Value: a
TimeStamp: 5     Value: a
  • create a simple df with a single index

>>> import numpy as np
>>> import pandas as pd
>>> data = np.array([['', 'letters', 'timestamp', "numbers"],
             ...['', "a", 1, 27],
             ...['', "b", 3, 4],
             ...['', "a", 5, 17],
             ...['', "a", 3, 7],
             ...['', "b", 2, 45]
            ...])
>>> df = pd.DataFrame(data=data[1:, 1:],
              ...columns=data[0, 1:]).astype(dtype={'letters': 'object', 'timestamp': 'int64', 'numbers': 'float64'})
  letters  timestamp  numbers
0       a          1     27.0
1       b          3      4.0
2       a          5     17.0
3       a          3      7.0
4       b          2     45.0

create a multi-time-series from a df using observations format where the key is letters

>>> mts = tspy.multi_time_series(df, key_column="letters", ts_column='timestamp')
a time series
------------------------------
TimeStamp: 1     Value: {numbers=27.0}
TimeStamp: 3     Value: {numbers=7.0}
TimeStamp: 5     Value: {numbers=17.0}
b time series
------------------------------
TimeStamp: 2     Value: {numbers=45.0}
TimeStamp: 3     Value: {numbers=4.0}
tspy.observation(time_tick, value)

create an observation

Parameters
time_tickint

observations time-tick

valueany

observations value

Returns
Observation
tspy.observations(*varargs)

returns an ObservationCollection

Parameters
observationsvarargs

either empty or a variable number of observations

Returns
ObservationCollection

a new observation-collection

tspy.record(**kwargs)

create a record type (similar to dict)

Parameters
kwargsnamed args

key/value arguments

Returns
record

a dict-like structure that is handled for high performance in time-series

tspy.time_series(*args, **kwargs)

creates a single-time-series object using data in either

  1. list

  2. dataframe

  3. time-series-reader

  4. observation collection

Parameters
data:

type list or pandas.DataFrame or TimeSeriesReader or ObservationCollection

ts_funcfunc, optional=None

(only use with data is a list) if used, it is the function to combine duplicate time-ticks (default is do not combine)

granularitydatetime.timedelta, optional

the granularity for use in time-series TRS (default is None if no start_time, otherwise 1ms)

start_timedatetime, optional

the starting date-time of the time-series (default is None if no granularity, otherwise 1970-01-01 UTC)

ts_columnstring, optional

(only use with data is a pd.DataFrame) the name of the column containing timestamps used in retrieving timestamps (default is using timestamps based on record index)

value_columnstring or list, optional

(only use with data is a pd.DataFrame) the name of the column containing values used in retrieving values (default is create value using all columns)

Returns
TimeSeries

a new time-series

Examples

  • create a simple pandas dataframe

>>> import numpy as np
>>> import pandas as pd
>>> data = np.array([['', 'key', 'timestamp', "value"],                     ['', "a", 1, 27],                     ['', "b", 3, 4],                     ['', "a", 5, 17],                     ['', "a", 3, 7],                     ['', "b", 2, 45]                    ])
>>> df = pd.DataFrame(data=data[1:, 1:],                      index=data[1:, 0],                      columns=data[0, 1:]).astype(dtype={'key': 'object', 'timestamp': 'int64', 'value': 'float64'})
>>> df
key  timestamp  value
  a          1   27.0
  b          3    4.0
  a          5   17.0
  a          3    7.0
  b          2   45.0

create a time-series from a dataframe specifying a timestamp and value column

>>> ts = tspy.time_series(df, ts_column="timestamp", value_column="value")
>>> ts
TimeStamp: 1     Value: 27.0
TimeStamp: 2     Value: 45.0
TimeStamp: 3     Value: 4.0
TimeStamp: 3     Value: 7.0
TimeStamp: 5     Value: 17.0

create a time-series from a dataframe specifying only a timestamp column - it will uses all other columns and stores as value as a single dictionary.

>>> ts = tspy.time_series(df, ts_column="timestamp")
>>> ts
TimeStamp: 1     Value: {value=27.0, key=a}
TimeStamp: 2     Value: {value=45.0, key=b}
TimeStamp: 3     Value: {value=4.0, key=b}
TimeStamp: 3     Value: {value=7.0, key=a}
TimeStamp: 5     Value: {value=17.0, key=a}

create a time-series from a dataframe specifying no timestamp or value column

>>> ts = tspy.time_series(df)
>>> ts
TimeStamp: 0     Value: {value=27.0, key=a, timestamp=1}
TimeStamp: 1     Value: {value=4.0, key=b, timestamp=3}
TimeStamp: 2     Value: {value=17.0, key=a, timestamp=5}
TimeStamp: 3     Value: {value=7.0, key=a, timestamp=3}
TimeStamp: 4     Value: {value=45.0, key=b, timestamp=2}

create a time-series from a dataframe specifying a timestamp column and using a time-reference-system

>>> import datetime
>>> start_time = datetime.datetime(1990, 7, 6)
>>> granularity = datetime.timedelta(weeks=1)
>>> ts = tspy.time_series(df, ts_column="timestamp", granularity=granularity, start_time=start_time)
>>> ts
TimeStamp: 1990-07-13T00:00Z     Value: {value=27.0, key=a}
TimeStamp: 1990-07-20T00:00Z     Value: {value=45.0, key=b}
TimeStamp: 1990-07-27T00:00Z     Value: {value=4.0, key=b}
TimeStamp: 1990-07-27T00:00Z     Value: {value=7.0, key=a}
TimeStamp: 1990-08-10T00:00Z     Value: {value=17.0, key=a}
  • create a time-series from a list of values

>>> ts = tspy.time_series([0, 1])
>>> ts
TimeStamp: 0     Value: 0
TimeStamp: 1     Value: 1

create a time-series from a list of values with a time-reference system

>>> import datetime
>>> granularity = datetime.timedelta(days=1)
>>> start_time = datetime.datetime(1990,7,6)
>>> ts = tspy.time_series([0, 1], granularity=granularity, start_time=start_time)
>>> ts
TimeStamp: 1990-07-06T00:00Z     Value: 0
TimeStamp: 1990-07-07T00:00Z     Value: 1
  • create a collection of observations

>>> import tspy
>>> observations = tspy.builder().add(tspy.observation(0,0)).add(tspy.observation(1,1)).result()
>>> observations
[(0,0),(1,1)]

create a time-series from observations

>>> ts = tspy.time_series(observations)
>>> ts
TimeStamp: 0     Value: 0
TimeStamp: 1     Value: 1

create a time-series from observations with a time-reference system

>>> import datetime
>>> granularity = datetime.timedelta(days=1)
>>> start_time = datetime.datetime(1990,7,6)
>>> ts = tspy.time_series(observations, granularity=granularity, start_time=start_time)
>>> ts
TimeStamp: 1990-07-06T00:00Z     Value: 0
TimeStamp: 1990-07-07T00:00Z     Value: 1