NaiveBayes.md



Naive Bayes Classifier

 Some General Remarks 
The general architecture of the MIP follows a Master/Worker paradigm where many Workers, operating in multiple medical centers, are coordinated by one Master. Only Workers are allowed access to the anonymized data in each medical center and the Master only sees aggregate data, derived from the full data and sent to him by the Workers.
As a consequence, every algorithm has to be refactored in a form that fits this model.
In general, this means two things.

On the one hand, isolating the parts of the algorithm that operate on the full data and implement them in procedures that run on Workers.
On the other hand, identifying the parts of the algorithm that need to see the aggregates from all Workers and implementing these parts in procedures that run on Master.

Our naming convention is that procedures run on Workers are given the adjective local whereas those running on Master are called global.
 Notation 
Each local dataset D^(l), where l=1,...,L, is represented as a matrix of size n x p, where L is the number of medical centers, n is the number of points (patients) and p is the number of  attributes. The elements of the above matrix can either be continuous or discrete (categorical).
In each local dataset, the independent attributes are denoted as a matrix X^(l) and the dependent variable is denoted as a vector y^(l). x_(ij)^(l) is the value of the i^(th) patient of the j^(th) attribute in the l^(th) hospital, while x_(j)^(l) denotes the vector of the j^(th) attribute in the l^(th) hospital. For categorical attributes,  we use the notation C_m  { C₁, C₂, ..., C_M} for their domain.
 Algorithm Description 
In Naive Bayes algorithm the attributes of X can be both categorical and continuous, while the y is always categorical. Once we have the likelihood terms from the training procedure we can compute the maximum a posteriori probability for the class of a new query datapoint q with the following procedure:


Algorithm Implementation

Categorical Naive Bayes with Cross - Validation
Gaussian Naive Bayes with Cross - Validation