PlaneTree Tree classifier using (hyper)planes – UGen or language-side
// Server-side (for use in synths):
PlaneTree.kr(treebuf, in, gate)
// Language-side:
PlaneTree.classify(point, treedata)
Given a specially-formatted dataset representing a hyperplane-based binary tree classifier, this unit classifies incoming data points. It can handle any dimensionality of data (the dimensionality of the incoming data points needs to match that of the tree) and can branch to arbitrary depths including some branches being deeper than others.
- treebuf - a multichannel Buffer holding data in the special tree representation. The required data format is described near the bottom of this document.
- in - the input signal to be classified (typically a multichannel signal).
- gate - on or off. While gate>0, classification is performed; otherwise the previous value is held constant.
Example
This simple example splits 2D data into four classes:
(
// Three nodes in this data. First is always the root.
// So the first is applied to decide whether to branch down to the second or third.
// Then those nodes decide whether to classify the datum as 1/2/3/4.
~treedata = [
/* xoff,yoff, xvec,yvec, lisl, lidx, risl, ridx */
[0.5, 0.5, -0.7, 0.4, 0, 1, 0, 2],
[0.3, 0.7, 0.4, 0.7, 1, 1, 1, 2],
[0.5, 0.2, -0.8,-0.8, 1, 3, 1, 4]
];
)
// Now to demonstrate language-side classification, we'll iterate over a grid and classify:
(
(0, 0.1 .. 1).collect{|y|
(0, 0.1 .. 1).collect{|x|
PlaneTree.classify([x,y], ~treedata)
}
}.do(_.postln);""
)
// Server-side: run this then move the mouse left/right/up/down - classification should follow same pattern.
s.boot;
~treebuf = Buffer.loadCollection(s, ~treedata.flat, ~treedata[0].size);
(
x = {
PlaneTree.kr(~treebuf, [MouseX.kr(0,1), MouseY.kr(0,1)]).poll
}.play
)
x.free;
Data format
Each node in the tree is represented by a multichannel frame in a buffer (or in the language-side representation, by an array in an array). The first node must be the root node and is where classification begins. Each node consists of data defining the hyperplane used for splitting, and then flags indicating whether the split data should recurse further to a different node or whether a numeric classification result is to be returned. More explicitly, for a D-dimensional tree, each node consists of:
[
D numbers - the offset vector, subtracted from the incoming data point
D numbers - representing the normal vector, multiplied by the incoming data point (after subtraction) for classification
0 or 1 - if 1, the "left" split is a leaf and that branch terminates
1 number - if the "left" split is a leaf, the number to return; ELSE the frame index of the node to branch to
0 or 1 - if 1, the "right" split is a leaf and that branch terminates
1 number - if the "right" split is a leaf, the number to return; ELSE the frame index of the node to branch to
]
For dimensionality D the data must therefore have 2D + 4 channels.
Branching must always be "forwards" through the buffer (frame indexes for branching must always be greater than the current frame index) to avoid infinite loops.
Branching doesn't have to proceed to the same depth at each branch. Also the number returned is not checked, so although it will typically be a unique number for each leaf, unusually-shaped classes could be represented by more than one leaf returning the same id.
(c) 2009 Dan Stowell