ReduceOperations

Overview
Reduce operations are those that scan through many objects, condensing some
attribute into a reduced form. For example, we might use a reduce operation to
compute statistics on Vm across neurons in a population in a model
spread across multiple nodes. Another common use is to keep track of the
max field dimension in a variable size field such as the number of synapses on
a channel or an IntFire neuron.


There are two modes of operation: 
1. through a regular Reduce Msg, originating from a ReduceFinfo on a regular 
object, and terminating on any 'get' DestFinfo. ReduceFinfos are derived from
SrcFinfos. They are templated on the type of the 'get' function, and on
the type of the reduce class (for example, a triad of mean, variance and count).
The ReduceFinfo constructor takes as an argument a 'digest' function. The
job of the digest function is to take an argument of the reduce class 
(which has the contents of the entire reduction operation),
and do something with it (such as saving values into the originating object).

2. Through Shell::doSyncDataHandler. This takes the synced Elm and its
FieldElement as Ids, and a string for the field to be reduced, assumed an
unsigned int. It creates a temporary ReduceMsg from the Shell to the Elm with 
the field to be reduced. Here the digest function just takes the returned
ReduceMax< uint > and puts the max value in Shell::maxIndex. It then posts 
the ack. The calling Shell::doSyncDataHandler waits for the ack, and when it
comes, it calls a 'set' function to put the returned value into the
FieldDataHandler::fieldDimension_.

At some point I may want to embed the doSyncDataHandler into any of the
'set' functions that invalidate fieldDimension. Problem is race conditions,
where a set function would call the doSync stuff which internally has its
own call to Ack-protected functions, like 'set'. Must fix.


The setup of the Reduce functionality is like this:

- Create and define the ReduceFinfo.
	ReduceFinfo< T, F, R >( const string& name, const string& doc,
		void ( T::*digestFunc )( const Eref& er, const R* arg )
	Here T is the type of the object that is the src of the ReduceMsg,
	F is the type of the returned reduded field
	R is the Reduce class. 

- Create and define the ReduceClass. This does two things:
	- Hold the data being reduced
	- Provide three functions, for primaryReduce, secondaryReduce, and
	tertiraryReduce. We'll come to these in a little while.

- 
- 


When executing reduce operations from the Shell::doSyncDataHandler, this
	is what happens:
- Shell::doSyncDataHander does some checking, then 
	requestSync.send launches the request, and waits for ack
  	- Shell::handleSync handles the request on each node
  		- Creates a temporary ReduceMsg from Shell to target elm
		- The ReduceMsg includes a pointer to a const ReduceFinfoBase*
			which provides two functions:
			- makeReduce, which makes the ReduceBase object
			- digestReduce, which refers to the digestFunc of
			the calling object.
		- Sends a call with a zero arg on this Msg.
	- Msg is handled by ReduceMsg::exec. 
		- This extracts the fid, which points to a getOpFunc.
		- It creates the derived ReduceBase object.
		- It adds the ReduceBase object into the ReduceQ
			- indexed by thread# so no data overwrites.
		- It scans through all targets on current thread and uses the
			derived virtual function for ReduceBase::primaryReduce
			on each.
	  Overall, for each thread, the 'get' values get reduced and stored
	  into the ReduceBase derived class in the queue. There is such an
	  object for each thread.

	- Nasty scheduling ensues for clearing the ReduceQ.
		- in Barrier 3, we call Clock::checkProcState
		- this calls Qinfo::clearReduceQ
		- This marches through each thread on each reduceQ entry
			- Uses the ReduceBase entry from the zeroth thread
				as a handle.
			- Calls ReduceBase::secondaryReduce on each entry for 
				each thread.
			- This is done in an ugly way using 
				findMatchingReduceEntry, could be cleaned up. 
		- Calls ReduceBase::reduceNodes
			- If MPI is running this does an instantaneous
				MPI_Allgather with the contents of the 
				ReduceBase data.
			- Does ReduceBase::tertiaryReduce on the received data
/// New version
			- calls Element::setFieldDimension directly using ptr.
/// End of new stuff
			- returns isDataHere.
			

		- If reduceNodes returns true, calls ReduceBase::assignResult
			- This calls the digestReduce function, which is
				Shell::digestReduceMax
		
////////////////////////////////////////////////////////////////////////
// Old version
		- Shell::digestReduceMax assigns maxIndex_ and sends ack
	- ack allows doSyncDataHandler to proceed
	- calls Field::set on all tgts to set "fieldDimension" to maxIndex_.
////////////////////////////////////////////////////////////////////////
		- Should really do a direct ptr assignment, within the
		assignResult function: Assume here that we want the assignment
		to be reflected on all nodes.

The current code is grotty on several fronts:
	- The findMatchingReduceEntry stuff could be fixed using a more 
		sensible indexing.
	- Should use direct ptr assignment for fieldDimension within 
		assignResult.
	- I probably don't need to pass in both the FieldElement and its parent.

When executing reduce operations from messages, this is what happens:
- ReduceFinfo.send launches the request. No args.
	- The ReduceMsg::exec function (which is called per thread):
		- This extracts the fid, which points to a getOpFunc.
		- It creates the derived ReduceBase object.
		- It adds the ReduceBase object into the ReduceQ
			- indexed by thread# so no data overwrites.
		- It scans through all targets on current thread and uses the
			derived virtual function for ReduceBase::primaryReduce
			on each.
	- Now we go again to the scheduling system to clear ReduceQ.
		- in Barrier 3, we call Clock::checkProcState
		- this calls Qinfo::clearReduceQ
		- This marches through each thread on each reduceQ entry
			- Uses the ReduceBase entry from the zeroth thread
				as a handle.
			- Calls ReduceBase::secondaryReduce on each entry for 
				each thread.
			- This is done in an ugly way using 
				findMatchingReduceEntry, could be cleaned up. 
		- Calls ReduceBase::reduceNodes
			- If MPI is running this does an instantaneous
				MPI_Allgather with the contents of the 
				ReduceBase data.
			- Does ReduceBase::tertiaryReduce on the received data
			- returns isDataHere.

		- If reduceNodes returns true, calls ReduceBase::assignResult
			on the originating Element.
			- This calls the digestReduce function, which is
			what was given to the ReduceFinfo when it was created.
			- This does whatever field assignments are needed,
			internally to the originating element which asked for
			the field values.
	Note no acks. It happens in Barrier 3 is all.