Skip to content
Snippets Groups Projects
Select Git revision
  • f6f9b51c24d9af5ae323ae2f14ebb6439d7991e3
  • master default protected
2 results

webpack.export.js

Blame
  • algorithms-user-guide.md 28.96 KiB

    Federated Algorithms User Guide

    Intro

    This guide will walk you through the process of writing federated algorithms using Exareme. Exareme's focus is primarily on machine learning and statistical algorithms, but the framework is so general as to allow any type of algorithm to be implemented, provided that the input data is scattered across multiple decentralized data sources.

    System Overview

    Exareme consists of multiple services distributed across remote nodes. There are multiple local nodes and a single global node. Local nodes host primary data sources, while the global node is responsible for receiving data from local nodes and perform global computations.

    Getting started

    Federated algorithms overview

    In the highest level, a federated algorithm is composed of three ingredients. Local computations, global computations and the algorithm flow. A local computation takes places in each local node and usually produces some aggregate of the primary data found in the local node. A global computation takes place in a special node called the global node, and usually consilidates the local aggregates into a global aggregate. Finally, the algorithm flow is responsible for coordinating these computations and the data exchange between nodes.

    Writing a federated algorithm

    In order to write a federated algorithm we need to define local and global computations and write an algorithm flow that ties the computations together.

    Let's break down the steps one by one. We'll begin by writing a simple algorithm for computing the mean of a single variable. This algorithm will be written as a simple python script, running on a single machine. The purpose of this exercise is to lustrate how an algorithm is decomposed into local and global steps, and how the flow coordinates these steps. Later, we'll add the necessary ingredients in order to be able to run the algorithm in an actual federated environment, using Exareme.

    Since we will run the algorithm on a single machine we will represent the federated data as a list of dataframes data: list[pandas.DataFrame].

    Local computations

    A local computation is python function taking local data and returning a dictionary of local aggregates. The example below demonstrates a local computation function that calculates the sum and count of some variable.

    def mean_local(local_data):
        # Compute sum and count
        sx = local_data.sum()
        n = len(local_data)
    
        # Pack results into single dictionary which will
        # be transferred to global node
        results = {"sx": sx, "n": n}
        return results

    The result packs the two aggregates, sx and n into a dictionary. A separate instance of this function will run on each local node, so sx and n are different for each node and reflect each node's local data.

    Global computations

    A global computation is also a python function, and it usually takes the output of a local computation as an input. The local computations, coming from multiple nodes, produce a list of results, from which a global aggregate is computed. We can then perform further computations, as in the current example where the mean is computed by dividing the global sx with the global n.

    def mean_global(local_results):
        # Sum aggregates from all nodes
        sx = sum(res["sx"] for res in local_results)
        n = sum(res["n"] for res in local_results)
    
        # Compute global mean
        mean = sx / n
    
        return mean