Calculation of LvR and correlation coefficients of ground state

Created by: albada

Could it be the condition for neurons to have spiked at least once where this goes wrong? So that you are still including some silent neurons in the calculation?

Created by: neworderofjamie

Well-spotted! rates ends up with less than 2000 neurons that spike after calling strip_binned_spiketrains. However, I don't quite understand how this ever worked as the paper says that the ground state simulations were run for 10.5 seconds and the 10.5 second spike trains in the repository don't have enough data

Created by: neworderofjamie

So it's working but not leaving 2000 neurons

Created by: albada

Yes that sounds right. I agree the relevant wording in the paper is ambiguous at best...

Created by: neworderofjamie

So were the stats calculated using a longer simulation?

Created by: albada

No, I don't think so. The paper states that for chi=1, the simulation duration was 10.5 s. I think the calculation started with 2000 neurons, from which the ones that spiked at least once were taken, leaving fewer than 2000 neurons in some cases.

Created by: neworderofjamie

Ahh ok so how come our recalculation of the stats using the same code and (presumably) the same data don't match? I don't see any non-determinism in that code

assigned to @didihou

Created by: mschmidt87

I think that @albada is right in saying that for some populations, there were probably fewer than 2000 neurons entering the calculation, simply because these population are small in the first place and then have low rates.

For what it's worth, I wouldn't call these deviations significant. Keep in mind that these values are very low, 10^-3 - 10^-4, on a scale of 0 to 1. If your simulations do not produce exactly the same spikes, these deviations can easily occur, but they're not significant, in my opinion.

I think, unless you're using the exact same configuration for your compute system (MPI processes, threads) and the same NEST version (+ some other dependencies that influence the random numbers), it's unlikely that you can produce the same spikes.

Created by: neworderofjamie

Hey @mschmidt87 - thanks for looking at this. My concern is that, as a test, we're calculating these metrics from the published spiking data using the published code and we don't get the published correlation coefficients

Created by: mschmidt87

Okay, sorry, I missed that point, I thought you had run your own simulations. Let me take a look at that tomorrow then, I'll get back to you.

Created by: neworderofjamie

Thank you so much!

Created by: mschmidt87

I can reproduce your finding for FEF, 6E. I am getting the same value as you for the cross-correlation coefficient. I can also see a deviation for the LvR value (0.2683 from recalculating vs. 0.4178 in the json file). However, I can reproduce the the population rates from the json files, which makes me conclude that the spike data is the correct one and I didn't use other data for the calculation of the json files.

I couldn't find the problem in the analysis code (I tried to change the part where we subsample etc., but no success). Since I produced the json files with the code that is in the repo and the file hasn't been modified, I am suspecting that this might perhaps be a version problem of the dependencies.

I am now using Python 3.8 and numpy 1.18.1 as well as the latest master of correlation_toolbox. Unfortunately, I didn't record the exact dependencies at the time I produced the data in the data repo, so I can't investigate this in an easy manner.

Created by: neworderofjamie

Glad to hear you can reproduce. Wouldn't it be possible that the spike rate would result in the same mean rates but different correlatation and irregularity (in the extreme case you could generate spike trains from populations of poisson sources with the same mean rates)?

Nonetheless, I can try and investigate older versions today. correlation_toolbox doesn't look to have changed a huge amount at least. The data was pushed on the 8/1/2018 so, assuming you created it around then, I can try and bisect numpy versions.

Created by: neworderofjamie

I've tried using numpy 1.13.3 and 1.11.3 (which seems like a good guess for era-appropriate bleeding edge and less bleeding edge) on both python3 and python2 and using the older version of correlation_toolbox.sort_gdf_by_id (https://github.com/INM-6/correlation-toolbox/blob/f91d9f2fff2a1b27c1ace4cd771464e57c109648/correlation_toolbox/helper.py#L44-L79) (as this seems to be only change with much chance of affecting this) and the results remain unchanged so I'm a little stumped

Created by: mschmidt87

Thanks for your tests. Of course, it is possible to produce the exact same rates with different 2nd order statistics, but to achieve that with two different runs of the same simulation (with different RNG seeds), which is what I would have suspected, is extremely likely, i.e. can be excluded.

I'll think about it a little more.

Created by: neworderofjamie

That is very true

Created by: neworderofjamie

Hey @mschmidt87, any thoughts?

Created by: neworderofjamie

I've done a little bit of code archeology and found a change in the LvR calculation. If I calculate the LvR before with and without this change:

from correlation_toolbox import helper as ch
import numpy as np

# **NOTE** this is heavily based off the analysis code from the paper
def load(filename, duration_s, num, check):
    tmin = 500.
    subsample = 2000
    resolution = 1.
    tref= 2.0

    spikes = np.load(filename)

    # calc lvr
    i_min = np.searchsorted(spikes[:, 1], tmin)
    i_max = np.searchsorted(spikes[:, 1], duration_s * 1000.0)
    LvR = np.array([])
    data_array = spikes[i_min:i_max]
    for i in np.unique(data_array[:, 0]):
        intervals = np.diff(data_array[np.where(data_array[:, 0] == i)[0], 1])
        if intervals.size > 1:
            val = np.sum((1. - 4 * intervals[0:-1] * intervals[1:] / (intervals[0:-1] + intervals[
                         1:]) ** 2) * (1 + 4 * tref / (intervals[0:-1] + intervals[1:])))
            LvR = np.append(LvR, val * 3 / (intervals.size - 1.))
        else:
            LvR = np.append(LvR, 0.0)
    # CHANGE HERE
    if check and len(LvR) < num:
        LvR = np.append(LvR, np.zeros(num - len(LvR)))
    return np.mean(LvR)


# Values from the json file on gnode
dataset_fef_6e_lvr = 0.4178813296671444
dataset_fef_5i_lvr = 0.9737456740769203

duration_s = 10.5

# Population sizes
num_fef_5i = 3721
num_fef_6e = 16128

# Load data
nest_fef_5i_lvr = load("33fb5955558ba8bb15a3fdce49dfd914682ef3ea-spikes-FEF-5I.npy", duration_s, num_fef_5i, False)
nest_fef_6e_lvr = load("33fb5955558ba8bb15a3fdce49dfd914682ef3ea-spikes-FEF-6E.npy", duration_s, num_fef_6e, False)

print("FEF 5I LvR - NEST:%f, Dataset:%f" % (nest_fef_5i_lvr, dataset_fef_5i_lvr))
print("FEF 6E LvR - NEST:%f, Dataset:%f" % (nest_fef_6e_lvr, dataset_fef_6e_lvr))

With check=True:

FEF 5I LvR - NEST:0.973484, Dataset:0.973746
FEF 6E LvR - NEST:0.268638, Dataset:0.417881

With check=False:

FEF 5I LvR - NEST:0.973746, Dataset:0.973746
FEF 6E LvR - NEST:0.473715, Dataset:0.417881

Which is significantly closer to the values in the published JSON. As this change was after the original submission date, might the published LvR data have been calculated prior to this change?

assigned to @didihou

Calculation of LvR and correlation coefficients of ground state

Designs

Child items ...

Activity