MemoryError in unfolding for many bins #97

fzeiser · 2018-09-04T06:14:50Z

I tried running the tutorial (Basic API Tutorial) with more bins. Above ~200 bins, I receive a MemoryError when I run iterative_unfold.

I tried another implementation of the iterative Bayesian unfolding algorithm in RooUnfold, which easily runs more then 1000s of bins -- of course taking some time. A difference there is that the systematic errors are not computed.

Now I see that creating a covariance matrix of type CovPP = np.zeros((cbins * ebins, cbins * ebins)) takes lots of memory. Still, in multidimensional problems, one quickly runs over 100 bins. So I wonder if one could turn off systematic error calculation with a keyword.
A more advances version would be if one could (optionally) handle the covariance matrix through submatrices, eg fox algorithm or numpy.memmap to memory map a file on disk. This would slow down the calculations, but maybe better slow than impossible.
Alternatively I have to look for a nice cluster with some hugemem coputer nodes.)

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-181-14949ba7442b> in <module>()
      8                                     efficiencies=efficiencies,
      9                                     efficiencies_err=efficiencies_err,
---> 10                                     callbacks=[Logger()])

/usr/local/lib/python3.5/dist-packages/pyunfold/unfold.py in iterative_unfold(data, data_err, response, response_err, efficiencies, efficiencies_err, prior, ts, ts_stopping, max_iter, cov_type, return_iterations, callbacks)
    160                               ts_func=ts_func,
    161                               max_iter=max_iter,
--> 162                               callbacks=callbacks)
    163 
    164     if return_iterations:

/usr/local/lib/python3.5/dist-packages/pyunfold/unfold.py in _unfold(prior, mixer, ts_func, max_iter, callbacks)
    205         status = {'unfolded': unfolded_n_c,
    206                   'stat_err': mixer.get_stat_err(),
--> 207                   'sys_err': mixer.get_MC_err(),
    208                   'num_iterations': iteration}
    209 

/usr/local/lib/python3.5/dist-packages/pyunfold/mix.py in get_MC_err(self)
     57         """MC (Systematic) Errors
     58         """
---> 59         cvm = self.cov.getVc1()
     60         err = np.sqrt(cvm.diagonal())
     61         return err

/usr/local/lib/python3.5/dist-packages/pyunfold/mix.py in getVc1(self)
    245         """
    246         # Get NObs covariance
--> 247         CovPP = self.getVcPP()
    248         # Get derivative
    249         dcdP = self.dcdP

/usr/local/lib/python3.5/dist-packages/pyunfold/mix.py in getVcPP(self)
    217         ebins = self.ebins
    218 
--> 219         CovPP = np.zeros((cbins * ebins, cbins * ebins))
    220 
    221         # Poisson covariance matrix

MemoryError:

The text was updated successfully, but these errors were encountered:

jrbourbeau · 2018-09-06T17:40:41Z

Thanks for reporting this MemoryError issue and potential fixes @fzeiser!

A more advances version would be if one could (optionally) handle the covariance matrix through submatrices, eg fox algorithm or numpy.memmap to memory map a file on disk

Do you have an interest in implementing this?

Perhaps @zhampel has some thoughts on this topic as well.

fzeiser · 2018-09-12T15:01:54Z

Unfortunately I doubt that I have the skills to do that. But I had two questions/comments/ideas:

First, on the implementation in RooUnfold:

I tried to understand the implementation in RooUnfold, from where you derived the corrections to the error propagation, if I'm not mistaken. Did any of you have a closer look at the implementation? I think I was wrong the first time, when I stated that the systematic error due to the MC truths is not computed. I think it's not computed by default, but one can choose to do so.

I tried it out (with RooUnfold) by setting unfold.IncludeSystematics(1); from a brief look it's difficult to say whether it is "correct", but it then claims to (also) have calculated the covariance due to unfolding matrix.

Now, if I understand it correctly, doesn't that implementation "only" need a matrix of shape (cbins, cbins * ebins)? Could that be? This would reduce the size already considerably .

 if (_dosys)    _dnCidPjk.ResizeTo(_nc,_ne*_nc);

https://github.com/hep-mirrors/RooUnfold/blob/fa7c56cb1a55da75c7879119cbb93b2862e9f480/src/RooUnfoldBayes.cxx#L179

  if (_dosys)    _dnCidPjk.ResizeTo(_nc,_ne*_nc);
https://github.com/hep-mirrors/RooUnfold/blob/fa7c56cb1a55da75c7879119cbb93b2862e9f480/src/RooUnfoldBayes.cxx#L179

 if (_dosys) {
   if (verbose()>=1) cout << "Calculating covariance due to unfolding matrix..." << endl;


   const TMatrixD& Eres= _res->Eresponse();
   TVectorD Vjk(_ne*_nc);           // vec(Var(j,k))
   for (Int_t j = 0 ; j < _ne ; j++) {
     Int_t j0= j*_nc;
     for (Int_t i = 0 ; i < _nc ; i++) {
       Double_t e= Eres(j,i);
       Vjk[j0+i]= e*e;
     }
   }


   if (_dosys!=2) {
     TMatrixD covres(_nc,_nc);
     ABAT (_dnCidPjk, Vjk, covres);
     _cov += covres;
   } else {
     _cov.ResizeTo (_nc, _nc);
     ABAT (_dnCidPjk, Vjk, _cov);
   }
 }

https://github.com/hep-mirrors/RooUnfold/blob/fa7c56cb1a55da75c7879119cbb93b2862e9f480/src/RooUnfoldBayes.cxx#L360-L382

Computation time:

Now, when calculating the systematic errors wit h RooUnfold for a larger matrix (#effect bins = #cause bins = 700), this was still possible (so, no memory error), but it takes a considerable amount of cpu time; about 1h-2h on this pc for 2 iterations. As I want to use the unfolding on a problem where I expect to unfold between 20-10.000 different spectra, I think it may be interesting to be able to turn off the systematic error calculation. At least for a rough data-analysis with the possibility to get back to a more precise answer later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryError in unfolding for many bins #97

MemoryError in unfolding for many bins #97

fzeiser commented Sep 4, 2018 •

edited

Loading

jrbourbeau commented Sep 6, 2018

fzeiser commented Sep 12, 2018

MemoryError in unfolding for many bins #97

MemoryError in unfolding for many bins #97

Comments

fzeiser commented Sep 4, 2018 • edited Loading

jrbourbeau commented Sep 6, 2018

fzeiser commented Sep 12, 2018

First, on the implementation in RooUnfold:

Computation time:

fzeiser commented Sep 4, 2018 •

edited

Loading