From 0d6c0a77880ae033b556a17ff48b8e5f2b8325fa Mon Sep 17 00:00:00 2001
From: Nora Abi Akar <nora.abiakar@gmail.com>
Date: Tue, 9 Mar 2021 15:37:58 +0100
Subject: [PATCH] Docs: Fix file formats (#1418)

* Remove parts of the C++ API which leaked into `doc/fileformat/neuroml.rst`
* Add links to the relevant C++ and Python API
---
 doc/cpp/morphology.rst     | 312 +++++++++++++++++++++++++++----------
 doc/fileformat/index.rst   |  14 --
 doc/fileformat/neuroml.rst | 159 +------------------
 doc/fileformat/swc.rst     |  17 +-
 doc/index.rst              |  11 +-
 doc/python/morphology.rst  |   4 +
 6 files changed, 264 insertions(+), 253 deletions(-)
 delete mode 100644 doc/fileformat/index.rst

diff --git a/doc/cpp/morphology.rst b/doc/cpp/morphology.rst
index cc148054..30fe87e9 100644
--- a/doc/cpp/morphology.rst
+++ b/doc/cpp/morphology.rst
@@ -118,86 +118,6 @@ by two stitches:
    cell.paint("\"soma\"", "hh");
 
 
-Supported morphology formats
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Arbor supports morphologies described using the SWC file format and the NeuroML file format.
-
-SWC
-"""
-
-Arbor supports reading morphologies described using the
-`SWC <http://www.neuronland.org/NLMorphologyConverter/MorphologyFormats/SWC/Spec.html>`_ file format. And
-has three different interpretation of that format.
-
-A :cpp:func:`parse_swc()` function is used to parse the SWC file and generate a :cpp:type:`swc_data` object.
-This object contains a vector of :cpp:type:`swc_record` objects that represent the SWC samples, with a number of
-basic checks performed on them. The :cpp:type:`swc_data` object can then be used to generate a
-:cpp:type:`morphology` object using one of the following functions: (See the morphology concepts
-:ref:`page <morph-formats>` for more details).
-
-  * :cpp:func:`load_swc_arbor`
-  * :cpp:func:`load_swc_allen`
-  * :cpp:func:`load_swc_neuron`
-
-.. cpp:class:: swc_record
-
-   .. cpp:member:: int id
-
-      ID of the record
-
-   .. cpp:member:: int tag
-
-       Structure identifier (tag).
-
-   .. cpp:member:: double x
-
-      x coordinate in space.
-
-   .. cpp:member:: double y
-
-      y coordinate in space.
-
-   .. cpp:member:: double z
-
-      z coordinate in space.
-
-   .. cpp:member:: double r
-
-      Sample radius.
-
-   .. cpp:member:: int parent_id
-
-      Record parent's sample ID.
-
-.. cpp:class:: swc_data
-
-   .. cpp:member:: std::string metadata
-
-      Contains the comments of an SWC file.
-
-   .. cpp:member:: std::vector<swc_record> records
-
-      Stored the list of samples from an SWC file, after performing some checks.
-
-.. cpp:function:: swc_data parse_swc(std::istream&)
-
-   Returns an :cpp:type:`swc_data` object given an std::istream object.
-
-.. cpp:function:: morphology load_swc_arbor(const swc_data& data)
-
-   Returns a :cpp:type:`morphology` constructed according to Arbor's SWC specifications.
-
-.. cpp:function:: morphology load_swc_allen(const swc_data& data, bool no_gaps=false)
-
-   Returns a :cpp:type:`morphology` constructed according to the Allen Institute's SWC
-   specifications. By default, gaps in the morphology are allowed, this can be toggled
-   using the ``no_gaps`` argument.
-
-.. cpp:function:: morphology load_swc_neuron(const swc_data& data)
-
-   Returns a :cpp:type:`morphology` constructed according to NEURON's SWC specifications.
-
 .. _locsets-and-regions:
 
 Identifying sites and subsets of the morphology
@@ -413,3 +333,235 @@ given branch will be chosen to be the smallest number that ensures no
 CV will have an extent on the branch longer than ``max_extent`` micrometres.
 
 
+Supported morphology formats
+----------------------------
+
+Arbor supports morphologies described using the SWC file format and the NeuroML file format.
+
+.. _cppswc:
+
+SWC
+^^^
+
+Arbor supports reading morphologies described using the
+`SWC <http://www.neuronland.org/NLMorphologyConverter/MorphologyFormats/SWC/Spec.html>`_ file format. And
+has three different interpretation of that format.
+
+A :cpp:func:`parse_swc()` function is used to parse the SWC file and generate a :cpp:type:`swc_data` object.
+This object contains a vector of :cpp:type:`swc_record` objects that represent the SWC samples, with a number of
+basic checks performed on them. The :cpp:type:`swc_data` object can then be used to generate a
+:cpp:type:`morphology` object using one of the following functions: (See the morphology concepts
+:ref:`page <morph-formats>` for more details).
+
+  * :cpp:func:`load_swc_arbor`
+  * :cpp:func:`load_swc_allen`
+  * :cpp:func:`load_swc_neuron`
+
+.. cpp:class:: swc_record
+
+   .. cpp:member:: int id
+
+      ID of the record
+
+   .. cpp:member:: int tag
+
+       Structure identifier (tag).
+
+   .. cpp:member:: double x
+
+      x coordinate in space.
+
+   .. cpp:member:: double y
+
+      y coordinate in space.
+
+   .. cpp:member:: double z
+
+      z coordinate in space.
+
+   .. cpp:member:: double r
+
+      Sample radius.
+
+   .. cpp:member:: int parent_id
+
+      Record parent's sample ID.
+
+.. cpp:class:: swc_data
+
+   .. cpp:member:: std::string metadata
+
+      Contains the comments of an SWC file.
+
+   .. cpp:member:: std::vector<swc_record> records
+
+      Stored the list of samples from an SWC file, after performing some checks.
+
+.. cpp:function:: swc_data parse_swc(std::istream&)
+
+   Returns an :cpp:type:`swc_data` object given an std::istream object.
+
+.. cpp:function:: morphology load_swc_arbor(const swc_data& data)
+
+   Returns a :cpp:type:`morphology` constructed according to Arbor's SWC specifications.
+
+.. cpp:function:: morphology load_swc_allen(const swc_data& data, bool no_gaps=false)
+
+   Returns a :cpp:type:`morphology` constructed according to the Allen Institute's SWC
+   specifications. By default, gaps in the morphology are allowed, this can be toggled
+   using the ``no_gaps`` argument.
+
+.. cpp:function:: morphology load_swc_neuron(const swc_data& data)
+
+   Returns a :cpp:type:`morphology` constructed according to NEURON's SWC specifications.
+
+.. _cppneuroml:
+
+NeuroML
+^^^^^^^
+
+Arbor offers limited support for models described in
+`NeuroML version 2 <https://neuroml.org/neuromlv2>`_.
+This is not built by default, but can be enabled by
+providing the `-DARB_NEUROML=ON` argument to CMake at
+configuration time (see :ref:`install-neuroml`). This will
+build the ``arborio`` libray with neuroml support.
+
+The ``arborio`` library uses `libxml2 <http://xmlsoft.org/>`_
+for XML parsing. Applications using NeuroML through ``arborio``
+will need to link against ``libxml2`` in addition, though this
+is performed implicitly within CMake projects that add ``arbor::arborio``
+as a link library.
+
+All classes and functions provided by the ``arborio`` library
+are provided in the ``arborio`` namespace.
+
+Libxml2 interface
+=================
+
+Libxml2 offers threadsafe XML parsing, but not by default. If
+the application uses NeuromML support from ``arborio`` in an
+unthreaded context, or has already explicitly initialized ``libxml2``,
+nothing more needs to be done. Otherwise, the ``libxml2`` function
+``xmlInitParser()`` must be called explicitly.
+
+``arborio`` provides a helper guard object for this purpose, defined
+in ``arborio/with_xml.hpp``:
+
+.. cpp:namespace:: arborio
+
+.. cpp:class:: with_xml
+
+   An RAII guard object that calls ``xmlInitParser()`` upon construction, and
+   ``xmlCleanupParser()`` upon destruction. The constructor takes no parameters.
+
+NeuroML2 morphology support
+===========================
+
+NeuroML documents are represented by the ``arborio::neuroml`` class,
+which in turn provides methods for the identification and translation
+of morphology data. ``neuroml`` objects are moveable and move-assignable, but not copyable.
+
+An implementation limitation restricts valid segment id values to
+those which can be represented by an ``unsigned long long`` value.
+
+.. cpp:class:: neuroml
+
+   .. cpp:function:: neuroml(std::string)
+
+   Build a NeuroML document representation from the supplied string.
+
+   .. cpp:function:: std::vector<std::string> cell_ids() const
+
+   Return the id of each ``<cell>`` element defined in the NeuroML document.
+
+   .. cpp:function:: std::vector<std::string> morphology_ids() const
+
+   Return the id of each top-level ``<morphology>`` element defined in the NeuroML document.
+
+   .. cpp:function:: std::optional<morphology_data> morphology(const std::string&) const
+
+   Return a representation of the top-level morphology with the supplied identifier, or
+   ``std::nullopt`` if no such morphology could be found. Parse errors or an inconsistent
+   representation will raise an exception derived from ``neuroml_exception``.
+
+   .. cpp:function:: std::optional<morphology_data> cell_morphology(const std::string&) const
+
+   Return a representation of the morphology associated with the cell with the supplied identifier,
+   or ``std::nullopt`` if the cell or its morphology could not be found. Parse errors or an
+   inconsistent representation will raise an exception derived from ``neuroml_exception``.
+
+The morphology representation contains the corresponding Arbor ``arb::morphology`` object,
+label dictionaries for regions corresponding to its segments and segment groups by name
+and id, and a map providing the explicit list of segments contained within each defined
+segment group.
+
+.. cpp:class:: morphology_data
+
+   .. cpp:member:: std::optional<std::string> cell_id
+
+   The id attribute of the cell that was used to find the morphology in the NeuroML document, if any.
+
+   .. cpp:member:: std::string id
+
+   The id attribute of the morphology.
+
+   .. cpp:member:: arb::morphology morphology
+
+   The corresponding Arbor morphology.
+
+   .. cpp:member:: arb::label_dict segments
+
+   A label dictionary with a region entry for each segment, keyed by the segment id (as a string).
+
+   .. cpp:member:: arb::label_dict named_segments
+
+   A label dictionary with a region entry for each name attribute given to one or more segments.
+   The region corresponds to the union of all segments sharing the same name attribute.
+
+   .. cpp:member:: arb::label_dict groups
+
+   A label dictionary with a region entry for each defined segment group
+
+   .. cpp:member:: std::unordered_map<std::string, std::vector<unsigned long long>> group_segments
+
+   A map from each segment group id to its corresponding collection of segments.
+
+
+Exceptions
+==========
+
+All NeuroML-specific exceptions are defined in ``arborio/arbornml.hpp``, and are
+derived from ``arborio::neuroml_exception`` which in turn is derived from ``std::runtime_error``.
+With the exception of the ``no_document`` exception, all contain an unsigned member ``line``
+which is intended to identify the problematic construct within the document.
+
+.. cpp:class:: xml_error: neuroml_exception
+
+   A generic XML error generated by the ``libxml2`` library.
+
+.. cpp:class:: no_document: neuroml_exception
+
+   A request was made on an :cpp:class:`neuroml` document without any content.
+
+.. cpp:class:: parse_error: neuroml_exception
+
+   Failure parsing an element or attribute in the NeuroML document. These
+   can be generated if the document does not confirm to the NeuroML2 schema,
+   for example.
+
+.. cpp:class:: bad_segment: neuroml_exception
+
+   A ``<segment>`` element has an improper ``id`` attribue, refers to a non-existent
+   parent, is missing a required parent or proximal element, or otherwise is missing
+   a mandatory child element or has a malformed child element.
+
+.. cpp:class:: bad_segment_group: neuroml_exception
+
+   A ``<segmentGroup>`` element has a malformed child element or references
+   a non-existent segment group or segment.
+
+.. cpp:class:: cyclic_dependency: neuroml_exception
+
+   A segment or segment group ultimately refers to itself via ``parent``
+   or ``include`` elements respectively.
\ No newline at end of file
diff --git a/doc/fileformat/index.rst b/doc/fileformat/index.rst
deleted file mode 100644
index f92b352b..00000000
--- a/doc/fileformat/index.rst
+++ /dev/null
@@ -1,14 +0,0 @@
-.. _format-overview:
-
-File formats
-============
-
-Arbor supports the following file formats.
-
-.. toctree::
-   :maxdepth: 1
-
-   swc
-   nmodl
-   neuroml
-
diff --git a/doc/fileformat/neuroml.rst b/doc/fileformat/neuroml.rst
index f027f978..9c3347af 100644
--- a/doc/fileformat/neuroml.rst
+++ b/doc/fileformat/neuroml.rst
@@ -1,46 +1,7 @@
 .. _formatneuroml:
 
-NeuroML support
-===============
-
-Arbor offers limited support for models described in
-`NeuroML version 2 <https://neuroml.org/neuromlv2>`_.
-This is not built by default, but can be enabled by
-providing the `-DARB_NEUROML=ON` argument to CMake at
-configuration time (see :ref:`install-neuroml`). This will
-build the ``arborio`` libray with neuroml support.
-
-The ``arborio`` library uses `libxml2 <http://xmlsoft.org/>`_
-for XML parsing. Applications using NeuroML through ``arborio``
-will need to link against ``libxml2`` in addition, though this
-is performed implicitly within CMake projects that add ``arbor::arborio``
-as a link library.
-
-All classes and functions provided by the ``arborio`` library
-are provided in the ``arborio`` namespace.
-
-Libxml2 interface
------------------
-
-Libxml2 offers threadsafe XML parsing, but not by default. If
-the application uses NeuromML support from ``arborio`` in an
-unthreaded context, or has already explicitly initialized ``libxml2``,
-nothing more needs to be done. Otherwise, the ``libxml2`` function
-``xmlInitParser()`` must be called explicitly.
-
-``arborio`` provides a helper guard object for this purpose, defined
-in ``arborio/with_xml.hpp``:
-
-.. cpp:namespace:: arborio
-
-.. cpp:class:: with_xml
-
-   An RAII guard object that calls ``xmlInitParser()`` upon construction, and
-   ``xmlCleanupParser()`` upon destruction. The constructor takes no parameters.
-
-
-NeuroML 2 morphology support
-----------------------------
+NeuroML2
+--------
 
 Arbor offers limited support for models described in `NeuroML version 2 <https://neuroml.org/neuromlv2>`_.
 This is not built by default (see :ref:`NeuroML support <install-neuroml>` for instructions on how
@@ -51,6 +12,9 @@ and present the encoded data to the user.  This is more than a simple a `segment
 
 NeuroML can encode in the same file multiple top-level morphologies, as well as cells:
 
+Example
+^^^^^^^
+
 .. code:: XML
 
    <neuroml xmlns="http://www.neuroml.org/schema/neuroml2">
@@ -78,115 +42,8 @@ The morphological data includes the actual morphology as well as the named segme
 For example, the above ``m1`` morphology has one named segment ``seg-0`` and one named group ``group-0`` that are
 both represented using Arbor's :ref:`region expressions <labels-expressions>`.
 
-C++
+API
 ^^^
 
-NeuroML documents are represented by the ``arborio::neuroml`` class,
-which in turn provides methods for the identification and translation
-of morphology data. ``neuroml`` objects are moveable and move-assignable, but not copyable.
-
-An implementation limitation restricts valid segment id values to
-those which can be represented by an ``unsigned long long`` value.
-
-.. cpp:class:: neuroml
-
-   .. cpp:function:: neuroml(std::string)
-
-   Build a NeuroML document representation from the supplied string.
-
-   .. cpp:function:: std::vector<std::string> cell_ids() const
-
-   Return the id of each ``<cell>`` element defined in the NeuroML document.
-
-   .. cpp:function:: std::vector<std::string> morphology_ids() const
-
-   Return the id of each top-level ``<morphology>`` element defined in the NeuroML document.
-
-   .. cpp:function:: std::optional<morphology_data> morphology(const std::string&) const
-
-   Return a representation of the top-level morphology with the supplied identifier, or
-   ``std::nullopt`` if no such morphology could be found. Parse errors or an inconsistent
-   representation will raise an exception derived from ``neuroml_exception``.
-
-   .. cpp:function:: std::optional<morphology_data> cell_morphology(const std::string&) const
-
-   Return a representation of the morphology associated with the cell with the supplied identifier,
-   or ``std::nullopt`` if the cell or its morphology could not be found. Parse errors or an
-   inconsistent representation will raise an exception derived from ``neuroml_exception``.
-
-The morphology representation contains the corresponding Arbor ``arb::morphology`` object,
-label dictionaries for regions corresponding to its segments and segment groups by name
-and id, and a map providing the explicit list of segments contained within each defined
-segment group.
-
-.. cpp:class:: morphology_data
-
-   .. cpp:member:: std::optional<std::string> cell_id
-
-   The id attribute of the cell that was used to find the morphology in the NeuroML document, if any.
-
-   .. cpp:member:: std::string id
-
-   The id attribute of the morphology.
-
-   .. cpp:member:: arb::morphology morphology
-
-   The corresponding Arbor morphology.
-
-   .. cpp:member:: arb::label_dict segments
-
-   A label dictionary with a region entry for each segment, keyed by the segment id (as a string).
-
-   .. cpp:member:: arb::label_dict named_segments
-
-   A label dictionary with a region entry for each name attribute given to one or more segments.
-   The region corresponds to the union of all segments sharing the same name attribute.
-
-   .. cpp:member:: arb::label_dict groups
-
-   A label dictionary with a region entry for each defined segment group
-
-   .. cpp:member:: std::unordered_map<std::string, std::vector<unsigned long long>> group_segments
-
-   A map from each segment group id to its corresponding collection of segments.
-
-
-Exceptions
-----------
-
-All NeuroML-specific exceptions are defined in ``arborio/arbornml.hpp``, and are
-derived from ``arborio::neuroml_exception`` which in turn is derived from ``std::runtime_error``.
-With the exception of the ``no_document`` exception, all contain an unsigned member ``line``
-which is intended to identify the problematic construct within the document.
-
-.. cpp:class:: xml_error: neuroml_exception
-
-   A generic XML error generated by the ``libxml2`` library.
-
-.. cpp:class:: no_document: neuroml_exception
-
-   A request was made on an :cpp:class:`neuroml` document without any content.
-
-.. cpp:class:: parse_error: neuroml_exception
-
-   Failure parsing an element or attribute in the NeuroML document. These
-   can be generated if the document does not confirm to the NeuroML2 schema,
-   for example.
-
-.. cpp:class:: bad_segment: neuroml_exception
-
-   A ``<segment>`` element has an improper ``id`` attribue, refers to a non-existent
-   parent, is missing a required parent or proximal element, or otherwise is missing
-   a mandatory child element or has a malformed child element.
-
-.. cpp:class:: bad_segment_group: neuroml_exception
-
-   A ``<segmentGroup>`` element has a malformed child element or references
-   a non-existent segment group or segment.
-
-.. cpp:class:: cyclic_dependency: neuroml_exception
-
-   A segment or segment group ultimately refers to itself via ``parent``
-   or ``include`` elements respectively.
-
-
+* :ref:`Python <pyneuroml>`
+* :ref:`C++ <cppneuroml>`
\ No newline at end of file
diff --git a/doc/fileformat/swc.rst b/doc/fileformat/swc.rst
index 09d3affa..b4d7048b 100644
--- a/doc/fileformat/swc.rst
+++ b/doc/fileformat/swc.rst
@@ -30,8 +30,8 @@ its parent and inherits the tag of the sample; and if more than 1 sample have th
 is interpreted as a fork point in the morphology, and acts as the proximal point to a new branch for each of its
 "child" samples. There a couple of exceptions to these rules which are listed below.
 
-Arbor interpretation:
-"""""""""""""""""""""
+Arbor interpretation
+""""""""""""""""""""
 In addition to the previously listed checks, the arbor interpretation explicitly disallows SWC files where the soma is
 described by a single sample. It constructs the soma from 2 or more samples, forming 1 or more segments. A *segment* is
 always constructed between a sample and its parent. This means that there are no gaps in the resulting morphology.
@@ -55,8 +55,8 @@ like this:
    :align: center
 
 
-Allen interpretation:
-"""""""""""""""""""""
+Allen interpretation
+""""""""""""""""""""
 In addition to the previously mentioned checks, the Allen interpretation expects a single-sample soma to be the first
 sample of the file and to be interpreted as a spherical soma. Arbor represents the spherical soma as a cylinder with
 length and diameter equal to the diameter of the sample representing the sphere.
@@ -71,8 +71,8 @@ or to the proximal end of the soma if they are axons or apical dendrites. Only a
 Finally the Allen institute interpretation of SWC files centres the morphology around the soma at the origin (0, 0, 0)
 and all samples are translated in space towards the origin.
 
-NEURON interpretation:
-""""""""""""""""""""""
+NEURON interpretation
+"""""""""""""""""""""
 The NEURON interpretation was obtained by experimenting with the ``Import3d_SWC_read`` function. We came up with the
 following set of rules that govern NEURON's SWC behavior and enforced them in arbor's NEURON-complaint SWC
 interpreter:
@@ -91,3 +91,8 @@ interpreter:
 * To create a segment with a certain tag, that is to be attached to the soma, we need at least 2 samples with that
   tag.
 
+API
+"""
+
+* :ref:`Python <pyswc>`
+* :ref:`C++ <cppswc>`
\ No newline at end of file
diff --git a/doc/index.rst b/doc/index.rst
index 7a55e36a..e61d4f3a 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -27,7 +27,7 @@ Documentation organisation
 
 * :ref:`tutorial` contains a few ready-made examples you can use to quickly get started using Arbor. In the tutorial descriptions we link to the relevant Arbor concepts.
 * :ref:`modelintro` describes the design and concepts used in Arbor. The breakdown of concepts is mirrored (as much as possible) in the :ref:`pyoverview` and :ref:`cppoverview`, so you can easily switch between languages and concepts.
-* The API section details our :ref:`pyoverview` and :ref:`cppoverview` API, as well as :ref:`supported file formats <format-overview>`. :ref:`internals-overview` describes Arbor code that is not user-facing; convenience classes, architecture abstractions, etc.
+* The API section details our :ref:`pyoverview` and :ref:`cppoverview` API. :ref:`internals-overview` describes Arbor code that is not user-facing; convenience classes, architecture abstractions, etc.
 * Contributions to Arbor are very welcome! Under :ref:`contribindex` describe conventions and procedures for all kinds of contributions.
 
 Citing Arbor
@@ -101,13 +101,20 @@ Arbor is an `eBrains project <https://ebrains.eu/service/arbor/>`_.
    concepts/spike_source_cell
    concepts/benchmark_cell
 
+.. toctree::
+   :caption: File formats:
+   :maxdepth: 1
+
+   fileformat/swc
+   fileformat/neuroml
+   fileformat/nmodl
+
 .. toctree::
    :caption: API reference:
    :maxdepth: 1
 
    python/index
    cpp/index
-   fileformat/index
    internals/index
 
 .. toctree::
diff --git a/doc/python/morphology.rst b/doc/python/morphology.rst
index fd18f618..8f58623d 100644
--- a/doc/python/morphology.rst
+++ b/doc/python/morphology.rst
@@ -315,6 +315,8 @@ Cable cell morphology
             :param int i: branch index
             :rtype: list
 
+.. _pyswc:
+
 .. py:function:: load_swc_arbor(filename)
 
     Loads the :class:`morphology` from an SWC file according to arbor's SWC specifications.
@@ -557,6 +559,8 @@ constitute part of the CV boundary point set.
     :param float max_etent: The maximum length for generated CVs.
     :param str domain: The region on which the policy is applied.
 
+.. _pyneuroml:
+
 .. py:class:: neuroml_morph_data
 
     A :class:`neuroml_morphology_data` object contains a representation of a morphology defined in
-- 
GitLab