webpack.export.js
Federated Algorithms User Guide
Intro
This guide will walk you through the process of writing federated algorithms using Exareme. Exareme's focus is primarily on machine learning and statistical algorithms, but the framework is so general as to allow any type of algorithm to be implemented, provided that the input data is scattered across multiple decentralized data sources.
System Overview
Exareme consists of multiple services distributed across remote nodes. There are multiple local nodes and a single global node. Local nodes host primary data sources, while the global node is responsible for receiving data from local nodes and perform global computations.
Getting started
Federated algorithms overview
In the highest level, a federated algorithm is composed of three ingredients. Local computations, global computations and the algorithm flow. A local computation takes places in each local node and usually produces some aggregate of the primary data found in the local node. A global computation takes place in a special node called the global node, and usually consilidates the local aggregates into a global aggregate. Finally, the algorithm flow is responsible for coordinating these computations and the data exchange between nodes.
Writing a federated algorithm
In order to write a federated algorithm we need to define local and global computations and write an algorithm flow that ties the computations together.
Let's break down the steps one by one. We'll begin by writing a simple algorithm for computing the mean of a single variable. This algorithm will be written as a simple python script, running on a single machine. The purpose of this exercise is to lustrate how an algorithm is decomposed into local and global steps, and how the flow coordinates these steps. Later, we'll add the necessary ingredients in order to be able to run the algorithm in an actual federated environment, using Exareme.
Since we will run the algorithm on a single machine we will represent the
federated data as a list of dataframes data: list[pandas.DataFrame].
Local computations
A local computation is python function taking local data and returning a dictionary of local aggregates. The example below demonstrates a local computation function that calculates the sum and count of some variable.
def mean_local(local_data):
# Compute sum and count
sx = local_data.sum()
n = len(local_data)
# Pack results into single dictionary which will
# be transferred to global node
results = {"sx": sx, "n": n}
return results
The result packs the two aggregates, sx and n into a dictionary. A separate
instance of this function will run on each local node, so sx and n are
different for each node and reflect each node's local data.
Global computations
A global computation is also a python function, and it usually takes the output
of a local computation as an input. The local computations, coming from
multiple nodes, produce a list of results, from which a global aggregate is
computed. We can then perform further computations, as in the current example
where the mean is computed by dividing the global sx with the global n.
def mean_global(local_results):
# Sum aggregates from all nodes
sx = sum(res["sx"] for res in local_results)
n = sum(res["n"] for res in local_results)
# Compute global mean
mean = sx / n
return mean