This project serves as both an intellectual exercise to understand the very low level details of deep learning as well as a sandbox to test crazy ideas that might be harder to test in more mainstream toolkits! You should probably look at more mainstream toolkits like tensorflow or pytorch.
This project started as a framework to implement Random Hinge Forest which is detailed in this arXiv draft
https://arxiv.org/abs/1802.03882
For benchmark experiments in this repository, Random Hinge Forest serves as both a standalone learning machine as well as a non-linearity for consecutive layers. So you will not find a conventional activation function in this neural network toolkit (at least as of this revision, but it wouldn't be hard to add!).
NOTE: You can find a PyTorch port of RandomHingeForest here:
https://github.com/nslay/HingeTreeForTorch
Bleak has been developed and/or tested in the following environments
- Windows 10, Visual Studio 2017, OpenBLAS 0.3.6, CUDA 10.2, cuDNN 7.6.5, LMDB 0.9.70, ITK 4.13
- Uses Windows Subsystem for Linux for experiments.
- GeForce GTX 980
- FreeBSD 12.1-STABLE, clang-10.0.0, OpenBLAS 0.3.9, LMDB 0.9.70, ITK 4.13
- No GPU support on this Unix-like operating system. I don't have a spare computer to test on Linux!
To build bleak, you will need the following dependencies
- A C++14 compiler (GCC, Clang or Visual Studio 2017 or later)
- cmake 3.10 or later (ccmake recommended on Unix-like systems)
First clone this repository and its submodules
git clone https://github.com/nslay/bleak
cd bleak
git submodule init
git submodule update
Create a separate empty folder (call it build) and
mkdir build
cd build
ccmake /path/to/bleak
Press 'c' to configure, select desired build options and modules (press 'c' again for any changes) and then finally press 'g' to generate the Makefiles to build bleak.
NOTE: Bleak should build and run on Unix-like systems (I occassionally compile and run it on FreeBSD). That said, the experiment shell scripts were written for Windows Subsystem for Linux. So some script modification is likely needed to run experiments on actual Unix-like systems.
Run cmake-gui
and set the source code and build folders. For example C:/Work/Source/bleak
and C:/Work/Build/bleak
respectively.
Press "Configure", select the desired build options and modules (press "Configure" for any changes) and then finally press "Generate". You can also press "Open Project" to launch Visual Studio automatically.
NOTE: Make sure to select the "Release" build mode in Visual Studio.
- bleakUseOpenMP -- Try to enable OpenMP support in the compiler (if available).
- bleakUseCUDA -- Try to enable CUDA support (if available).
- bleakBLASType -- "slowblas" (default, built-in to bleak and very slow!) or "openblas" (OpenBLAS).
- bleakCommon -- A required module that is essentially the glue of all of bleak (Graph, Vertex, Array, BLAS wrappers, parsers, databases, etc...) and some optimizers (SGD, AdaGrad, Adam) and some basic Vertices (InnerProduct, BatchNormalization, SoftmaxLoss, etc...).
- bleakImage -- Gemm-based convolution and pooling.
- bleakTrees -- Random hinge forest, ferns, covnolutional hinge trees and ferns, feature selection and annealing.
- bleakITK -- ITK 1D/2D/3D image loader Vertex (supports PNG/JPEG, DICOM, MetaIO, Nifti, etc...). Requires ITK 4+.
- bleakCudnn -- cuDNN-based convolution and pooling. Requires cuDNN.
In bleak, neural network computation is implemented as a directed graph. Vertices implement the forward/backward operations and have names, properties, and named inputs and outputs. This enables searching for vertices by name, assigning values to named properties as well as querying inputs and outputs by name. Edges serve to store tensor inputs and outputs and their gradients. Vertices uniquely own Edges for their outputs while being assigned Edges for their inputs. Graphs in bleak can be constructed/modified in C++ or can be read from a .sad file.
A .sad file follows this general format. Sections denoted with [] are optional.
- [Variable Declarations]
- [Subgraph Declarations]
- Vertex Declarations
- [Connection Declarations]
Whitespace is ignored and all declarations are terminated with a semicolon (;) (except for includes). A file can be included at any time with an include
statement. For example
include "Config.sad"
This included file is treated as if its content were copied and pasted in place of the include. The included file by itself need not be a valid graph.
Comments are preceded by the octothorpe symbol (#). For example
# This is a comment.
They may occur anywhere outside of a string value.
Variables are declared as a key value pair. For example
batchSize = 16;
learningRateMultiplier=1.0;
imageList = "alcoholicTrainList.txt";
And they may be overwritten by subsequent declarations. For example
include "Config.sad"
batchSize=32; # Override config file
Variables in .sad files support a small collection of basic types:
- integer
- float
- boolean (true/false)
- string ("value")
- integer vector ([8, 3, 256, 256])
- float vector ([0.5, 1.0])
Variables can be referenced in a synonymous fashion as shell variables (with '$') and may be used in simple mathematical expressions if they are float or integer types. The mathematical operators available include +, -, *, /, % (modulo), ^ (exponentiation) and ** (exponentiation). Resulting types follow the behavior of the C/C++ programming languages. For example, 1/2 results in 0 while 1.0/2 results in 0.5. The addition operator (+) may also be used to concatenate strings. Here are some examples
# This expression results in an integer (features3Width is an integer)
pool1Width = ($features3Width - 2)/2 + 1;
# This concatenates two strings
imageList=$dataRoot + "/SMNI_CMI_TRAIN/alcoholicTrainList.txt";
# Variables and expressions can even be used inside of vectors
size = [ $numTrees, 2^$treeDepth - 1 ];
There are currently no built-in functions like min/max/exp or any syntax to reference vector components.
Subgraphs are declared immediately after variables (if any). They recursively define graphs which follow the structure mentioned above with some additional mechanisms to facilitate communicating properties and setting up connections. This topic will be covered in detail in section Subgraphs after vertex declarations and connection declarations are covered.
After variables and subgraphs are declared (if any), then vertices are declared. Vertices have a type name, named properties and a unique name that refers to that instance of the vertex. They are declared in a manner as follows
VertexType {
propertyName=propertValue;
propertyName2=propertyValue2;
# And so forth...
} uniqueVertexName;
If a vertex requires no properties, one may simply declare
VertexType uniqueVertexName;
Vertex types are either provided by modules (compiled into bleak) or are instances of subgraphs (discussed in Subgraphs). Some examples of vertices will be described later.
Vertex properties are used to communicate runtime settings to the Vertex. This may be information about the size of a convolution kernel or the stride or dilation of a convolution operation. Importantly, Vertex properties are not variables. They may not reference themselves and cannot be declared unexpected. Variables and expressions may be used in Vertex properties (which is the whole intention of variables!). For example
numTrees=100;
treeDepth = 7;
applyWeightDecay = false;
Parameters {
size = [ $numTrees, 2^$treeDepth - 1 ];
learnable=true;
initType="uniform";
applyWeightDecay=$applyWeightDecay;
b = 3;
a = -$b; # ERROR: Properties are not variables.
giraffe = "Not a property"; # ERROR: giraffe is not a Parameters property.
} thresholds;
Vertex properties afford a bit of flexibility in value types. Many types of values are implicitly convertible. For example
Parameters {
size = 10; # Integer convertible to one component integer vector [ 10 ].
learnable = 1; # Integer convertible to boolean.
a="-3.0"; # String representation of a float is convertible to a float.
b=[ 3 ]; # One component integer vector is convertible to a float.
} tensor;
Any type is convertible to a string and any string is (possibly) convertible to any type. Other implicit conversions are provided below.
- integer -> float
- integer -> boolean
- integer -> integer vector
- integer -> float vector
- float -> boolean
- float -> float vector
- boolean -> integer
- boolean -> float
- boolean -> integer vector
- boolean -> float vector
- integer vector -> integer (only if the vector has 1 component)
- integer vector -> float (only if the vector has 1 component)
- float vector -> float (only if the vector has 1 component)
How vertices are compiled into bleak and given named properties and named inputs/outputs will be discussed in section Implementing your own Vertex in C++.
After all vertices have been declared, they can be connected by using their unique name and a named input or output. A connection takes one of two possible forms
VertexType1 source;
VertexType2 target;
source.outputName -> target.inputName;
target.inputName <- source.outputName;
Like properties, named inputs and outputs are compiled into bleak. This detail will be discussed in section Implementing your own Vertex in C++.
The declarative nature of this .sad graph syntax can be cumbersome especially since neural networks tend to have repeated structure (e.g. lots of layers of convolution). Subgraphs attempt to reduce the pain of defining neural network architectures by enabling an author to define a standalone repeated component. A subgraph recursively defines a graph with the same structure and syntax as descibed in all sections following section Basic Graph Syntax. They are wrapped in a subgraph
directive of the form
subgraph NameOfSubgraph {
# Graph as described in all sections leading up to this example!
};
Where "NameOfSubgraph" behaves like a type of vertex that can be declared. The variables section of a graph defines the properties of a subgraph. External connections to the subgraph can be communicated through the this
keyword which refers to the instance of the subgraph itself. To better understand why this is immensely helpful, imagine the InnerProduct operation (i.e. fully connected layer). The InnerProduct includes learnable weights and bias which are used to calculate W*X + T where X is the $batchSize
set of $numInputs
-dimensional vectors, W is the $numOutputs
set of weights, and T is the $numOutputs
set of biases. So a subgraph incorpating all of these elements might look like the following
subgraph SGInnerProduct {
# Properties with default values
numInputs=10;
numOutputs=100;
Parameters {
size = [ $numOutputs, $numInputs ];
initType="gaussian";
learnable = true;
mu=0.0;
sigma=1.0;
} weights;
Parameters {
size = [ $numOutputs ];
learnable=true;
} bias;
InnerProduct innerProduct;
weights.outData -> innerProduct.inWeights;
bias.outData -> innerProduct.inBias;
# Set up external connections
this.inData -> innerProduct.inData;
innerProduct.outData -> this.outData;
};
NOTE: This is a simplified explanation of InnerProduct in bleak. It can handle more than 2D tensors!
Notice that the input and output names can be arbitrarily chosen by the subgraph author through this.name
. Now, I can use SGInnerProduct
as a kind of vertex type. For example, I might define a logistic regressor training graph for the iris data set as follows
batchSize=16;
numFeatures = 4;
numClasses = 3;
# We can hide the subgraph declaration in another file!
include "SGInnerProduct.sad"
# Incrementally read a prepared CSV file and wrap around
CsvReader {
batchSize=$batchSize;
csvFileName="train.csv";
labelColumn=4;
shuffle=true;
} csv;
SGInnerProduct {
numInputs=$numFeatures;
numOutputs=$numClasses;
} inner;
SoftmaxLoss loss;
csv.outData -> inner.inData;
inner.outData -> loss.inData;
csv.outLabels -> loss.inLabels;
While this is a simple example, it should be clear that subgraphs can considerably reduce the burden of defining graphs with repeated structures. An author need not explicitly declare Parameters
for every single operation.
One other nicety of subgraphs is that it can be used to embed a neural network architecture into training, validation, testing and production graphs without modifying the original architecture. An author need only write the architecture as a subgraph in its own standalone .sad file. Then each task-specific graph can include and use the architecture without modification. The simple iris model might instead be defined as
subgraph SGModel {
numFeatures=4;
numClasses=3;
include "SGInnerProduct.sad"
SGInnerProduct {
numInputs=$numFeatures;
numOutputs=$numClasses;
} inner;
this.inData -> inner.inData;
inner.outData -> this.outData;
};
Then the training and production graphs might look like below
# These might be better in a config file (Config.sad?)
batchSize=16;
numFeatures=4;
numClasses=3;
include "SGModel.sad"
# Incrementally read a prepared CSV file and wrap around
CsvReader {
batchSize=$batchSize;
csvFileName="train.csv";
labelColumn=4;
shuffle=true;
} csv;
SGModel {
numFeatures=$numFeatures;
numClasses=$numClasses;
} graph;
SoftmaxLoss loss;
csv.outData -> graph.inData;
graph.outData -> loss.inData;
csv.outLabels -> loss.inLabels;
numFeatures=4;
numClasses=3;
include "SGModel.sad"
# Input placeholder for C++ code... batch size = 1
Input {
size = [ 1, $numFeatures ];
} input;
SGModel {
numFeatures=$numFeatures;
numClasses=$numClasses;
} graph;
Softmax output;
input.outData -> graph.inData;
graph.outData -> output.inData;
Since the variables of subgraphs work like vertex properties, these variables must be resolved in advance to determine their type. This can lead to some confusing behavior best illustrated in this example
subgraph SGModel {
inputWidth = 100;
kernelWidth = 5;
stride = 1;
dilate = 1;
padding = 0;
outputWidth = ($inputWidth - $kernelWidth - ($kernelWidth - 1)*($dilate - 1) + 2*$padding)/$stride + 1;
# The rest of the subgraph below...
};
SGModel {
inputWidth=128;
stride=2;
kernelWidth=3;
padding = 1;
# What do you suppose outputWidth equals? It's still 96 even though the author intended it to be 64!
} graph;
# The rest of the graph below
The variable outputWidth
in SGModel
is immediately resolved to its default value of 96 regardless of what the author intended! It's important to note that only the variable declarations in subgraphs are initially resolved. Only after a subgraph is declared as a vertex and is assigned its properties do all other expressions with variables become resolved in that instance of the subgraph. Worse yet is that the variable outputWidth
is exposed as a property allowing an author to mistakenly assign it an incorrect value! For example
SGModel {
inputWidth=128;
stride=2;
kernelWidth=3;
padding = 1;
outputWidth=128; # This would be true if stride=1... this is incorrect!
} graph;
To solve both problems, bleak supports private variables that are excluded from vertex properties (so their type need not be known) while also delaying their resolution until after the vertex declaration and property assignments. A private variable is declared with the private
keyword. We can fix the first example by making outputWidth
private
subgraph SGModel {
inputWidth = 100;
kernelWidth = 5;
stride = 1;
dilate = 1;
padding = 0;
private outputWidth = ($inputWidth - $kernelWidth - ($kernelWidth - 1)*($dilate - 1) + 2*$padding)/$stride + 1;
# The rest of the subgraph below...
};
SGModel {
inputWidth=128;
stride=2;
kernelWidth=3;
padding = 1;
# What do you suppose outputWidth equals? Now it's 64!
# outputWidth=128; # ERROR: Not a property.
} graph;
# The rest of the graph below
And this gives the intended behavior. However, there are some rules for the use of private
variables and these are given below
- Global variables cannot reference private variables in expressions. Private variables may, however, reference global variables.
- Private variables cannot be redeclared as global (without
private
qualifier) nor can global variables be redeclared as private. - While normal variables are global to all the contained subgraphs, private variables are only local to their immediate graph.
When a graph is constructed and initialized, a plan is created for the order of evaluation of the vertices. The plan is simply a topologically sorted list of the vertices so that all vertices can run in order with data dependencies implicitly satisfied. A forward pass in the network is simply a for-loop over the plan (invoking Forward()). A backward pass in the network is a for-loop over the plan in reverse (invoking Backward()). Conceptually, the plan evaluates root vertices first and ends with leaf vertices.
A Vertex in a bleak Graph is considered a root vertex if it has no input edges with a source Vertex. For example, a root vertex may simply have no inputs, no connected inputs, or custom code may provide an input edge with no source vertex. A leaf Vertex is any vertex with no output edges associated with any target Vertex with an output edge (of which there may be more than one such vertices). For example, a leaf vertex may simply have no outputs, it may have an output edge that is unconnected to any target Vertex, or the target vertices all have no outputs. Vertices with no outputs are often used as operational moniters of sorts (e.g. reporting ROC/AUC, accuracy, averages, etc...). However, loss function Vertices that usually connect to such monitor Vertices need to be treated as leaf vertices too!
When optimizing a Graph, the Optimizer is initially tasked with identifying two types of Edges
- Learnable Edges
- Loss function Edges
NOTE: As a reminder, an Edge stores the input/output tensors and its corresponding gradient.
An Edge is considered learnable if it is both the output of a root vertex and has a non-empty gradient tensor. The output edge of Parameters
is usually a learnable edge (though this may be optionally suppressed). An Edge is considered a loss function edge if it's the output of a leaf Vertex and has a single element tensor (i.e. a 1D real value).
The Optimizer can query Graphs for root and leaf vertices and determine which Edges contain the learnable model parameters and which edges contain loss function outputs (there may be more than one). Importantly, multiple dangling loss function edges are implicitly treated as if they are summed together. The Optimizer can then use a Forward pass to calculate the loss function, and then a subsequent Backward pass to calculate learnable gradients (stored in the learnable edges) to use to update the learnable edges.
IMPORTANT: Loss function vertices are responsible for seeding their output gradient with 1 when they are a leaf Vertex.
TODO
TODO
TODO
TODO