ML package

Submodules

ML.Stream_Learn module

class ML.Stream_Learn.Stream_Learn(data_train, data_out, train_func, predict_func, min_window_size, max_window_size, step_size, num_features, filter_func=None, all_func=None)[source]

Stream framework for machine learning.

This class supports machine learning for streaming data using PSTREAMS. Given data for training and predicting along with functions to learn and predict, this class will output a stream of predictions. Both batch and continual learning is supported.

Parameters:

data_train : Stream or numpy.ndarray or other

A object containing data to be trained on. In the case of Stream, the object contains tuples of values where each tuple represents a row of data. Each tuple must have at least num_features values. The object can also contain non-tuples provided filter_func is used to extract the tuples in correct format. In the case of a numpy array, the array must have at least num_features columns. Any additional values / columns correspond to the output y data. If this is not a Stream or numpy array, the data will not be split into x and y.

data_out : Stream

A Stream object containing data to generate predictions on. The Stream contains tuples of values where each tuple represents a row of data and must have at least num_features values.

train_func : function

A function that trains a model. This function takes parameters x and y data, a model object, and a window_state tuple, and returns a trained model object. In the case of data_train as a Stream, this function has the signature (numpy.ndarray numpy.ndarray Object) -> (Object). The first parameter x will have dimensions i x num_features, where min_window_size <= i <= max_window_size. The second parameter y will have dimensions i x num_outputs, where num_outputs refers to the number of y outputs for an input. For example, num_outputs is 1 for 1 scalar output. For unsupervised learning, num_outputs is 0. In the case of data_train as a numpy array, this function has the signature (numpy.ndarray numpy.ndarray Object) -> (Object). The first parameter x will have dimensions N x num_features, where N refers to the total number of training examples. The second parameter y will have dimensions N x num_outputs where num_outputs is defined as before. If data_train is none of the above, the function has the signature (Object None Object) -> (Object). The first parameter is data_train. The third parameter is a model defined by this function. The fourth parameter is a window_state tuple with the values (current_window_size, steady_state, reset, step_size, max_window_size), where current_window_size describes the number of points in the window, steady_state is a boolean that describes whether the window has reached max_window_size, and reset is a boolean that can be set to True to reset the window.

predict_func : function

A function that takes as input 2 tuples corresponding to 1 row of data and a model and returns the prediction output. This function has the signature (tuple tuple Object) -> (Object). The first tuple x has num_features values and the second tuple y has num_outputs values, where num_outputs refers to the number of y outputs for an input. In the case of unsupervised learning, y is empty.

min_window_size : int

An int specifying the minimum size of the window to train on for continual learning. This will be ignored for batch learning.

max_window_size : int

An int specifying the maximum size of the window to train on for continual learning. This will be ignored for batch learning.

step_size : int

An int specifying the number of tuples to move the window by for continual learning. This will be ignored for batch learning.

num_features : int

An int that describes the number of features in the data.

filter_func : function, optional

A function that filters data for training. This function takes parameters x and y data and a model object, and returns a tuple with signature (boolean, tuple). The first value in the output describes if the data is to be trained on (True) or if it is an outlier (False). The second value is the tuple of data in correct format as described for data_train. If data_train is a Stream that contains tuples, this function has the signature (tuple tuple Object) -> (tuple). The first tuple x has num_features values and the second tuple y has num_outputs values, where num_outputs refers to the number of y outputs for an input. The third parameter is a model defined by train_func. If data_train is a Stream that does not contain tuples, this function has the signature (Object None Object) -> (tuple), where the first parameter has the same type as the values in data_train.

all_func : function, optional

A function that processes the data for usage such as visualization. This function takes parameters x and y data, a model object, a state object, and a window_state tuple and returns an updated state object. This function has the signature (np.ndarray np.ndarray Object Object tuple) -> (Object). The first numpy array x has dimensions i x num_features, where min_window_size <= i <= max_window_size. The second numpy array y has dimensions i x num_outputs, where num_outputs refers to the number of y outputs for an input. The third parameter is the model object defined by train_func. The fourth parameter is a state object defined by this function. The fifth parameter is a window_state tuple with values as defined in description for train_func.

Methods

reset() Resets the training window to min_window_size.
run() Runs the framework and returns a Stream of outputs.
reset()[source]

Resets the training window to min_window_size.

This function resets the training window to min_window_size. After resetting, the window has the last min_window_size points in the Stream x_train. For example, if max_window_size is 100, min_window_size is 2, and the window contains points [1, 100], after resetting the window contains points [98, 99].

Notes

If reset() is called before the window has reached max_window_size, the window will continue increasing in size until it reaches max_window_size. Then, the window will reset to min_window_size.

run()[source]

Runs the framework and returns a Stream of outputs.

Returns:

y_predict : Stream

A Stream containing outputs as returned by predict_func.

Module contents