Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError in unfolding for many bins #97

Open
fzeiser opened this issue Sep 4, 2018 · 2 comments
Open

MemoryError in unfolding for many bins #97

fzeiser opened this issue Sep 4, 2018 · 2 comments

Comments

@fzeiser
Copy link
Contributor

fzeiser commented Sep 4, 2018

I tried running the tutorial (Basic API Tutorial) with more bins. Above ~200 bins, I receive a MemoryError when I run iterative_unfold.

I tried another implementation of the iterative Bayesian unfolding algorithm in RooUnfold, which easily runs more then 1000s of bins -- of course taking some time. A difference there is that the systematic errors are not computed.

Now I see that creating a covariance matrix of type CovPP = np.zeros((cbins * ebins, cbins * ebins)) takes lots of memory. Still, in multidimensional problems, one quickly runs over 100 bins. So I wonder if one could turn off systematic error calculation with a keyword.
A more advances version would be if one could (optionally) handle the covariance matrix through submatrices, eg fox algorithm or numpy.memmap to memory map a file on disk. This would slow down the calculations, but maybe better slow than impossible.
Alternatively I have to look for a nice cluster with some hugemem coputer nodes.)

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-181-14949ba7442b> in <module>()
      8                                     efficiencies=efficiencies,
      9                                     efficiencies_err=efficiencies_err,
---> 10                                     callbacks=[Logger()])

/usr/local/lib/python3.5/dist-packages/pyunfold/unfold.py in iterative_unfold(data, data_err, response, response_err, efficiencies, efficiencies_err, prior, ts, ts_stopping, max_iter, cov_type, return_iterations, callbacks)
    160                               ts_func=ts_func,
    161                               max_iter=max_iter,
--> 162                               callbacks=callbacks)
    163 
    164     if return_iterations:

/usr/local/lib/python3.5/dist-packages/pyunfold/unfold.py in _unfold(prior, mixer, ts_func, max_iter, callbacks)
    205         status = {'unfolded': unfolded_n_c,
    206                   'stat_err': mixer.get_stat_err(),
--> 207                   'sys_err': mixer.get_MC_err(),
    208                   'num_iterations': iteration}
    209 

/usr/local/lib/python3.5/dist-packages/pyunfold/mix.py in get_MC_err(self)
     57         """MC (Systematic) Errors
     58         """
---> 59         cvm = self.cov.getVc1()
     60         err = np.sqrt(cvm.diagonal())
     61         return err

/usr/local/lib/python3.5/dist-packages/pyunfold/mix.py in getVc1(self)
    245         """
    246         # Get NObs covariance
--> 247         CovPP = self.getVcPP()
    248         # Get derivative
    249         dcdP = self.dcdP

/usr/local/lib/python3.5/dist-packages/pyunfold/mix.py in getVcPP(self)
    217         ebins = self.ebins
    218 
--> 219         CovPP = np.zeros((cbins * ebins, cbins * ebins))
    220 
    221         # Poisson covariance matrix

MemoryError: 
@jrbourbeau
Copy link
Owner

Thanks for reporting this MemoryError issue and potential fixes @fzeiser!

A more advances version would be if one could (optionally) handle the covariance matrix through submatrices, eg fox algorithm or numpy.memmap to memory map a file on disk

Do you have an interest in implementing this?

Perhaps @zhampel has some thoughts on this topic as well.

@fzeiser
Copy link
Contributor Author

fzeiser commented Sep 12, 2018

Unfortunately I doubt that I have the skills to do that. But I had two questions/comments/ideas:

First, on the implementation in RooUnfold:

I tried to understand the implementation in RooUnfold, from where you derived the corrections to the error propagation, if I'm not mistaken. Did any of you have a closer look at the implementation? I think I was wrong the first time, when I stated that the systematic error due to the MC truths is not computed. I think it's not computed by default, but one can choose to do so.

I tried it out (with RooUnfold) by setting unfold.IncludeSystematics(1); from a brief look it's difficult to say whether it is "correct", but it then claims to (also) have calculated the covariance due to unfolding matrix.

Now, if I understand it correctly, doesn't that implementation "only" need a matrix of shape (cbins, cbins * ebins)? Could that be? This would reduce the size already considerably .

 if (_dosys)    _dnCidPjk.ResizeTo(_nc,_ne*_nc);

https://github.com/hep-mirrors/RooUnfold/blob/fa7c56cb1a55da75c7879119cbb93b2862e9f480/src/RooUnfoldBayes.cxx#L179

  if (_dosys)    _dnCidPjk.ResizeTo(_nc,_ne*_nc);
https://github.com/hep-mirrors/RooUnfold/blob/fa7c56cb1a55da75c7879119cbb93b2862e9f480/src/RooUnfoldBayes.cxx#L179

 if (_dosys) {
   if (verbose()>=1) cout << "Calculating covariance due to unfolding matrix..." << endl;


   const TMatrixD& Eres= _res->Eresponse();
   TVectorD Vjk(_ne*_nc);           // vec(Var(j,k))
   for (Int_t j = 0 ; j < _ne ; j++) {
     Int_t j0= j*_nc;
     for (Int_t i = 0 ; i < _nc ; i++) {
       Double_t e= Eres(j,i);
       Vjk[j0+i]= e*e;
     }
   }


   if (_dosys!=2) {
     TMatrixD covres(_nc,_nc);
     ABAT (_dnCidPjk, Vjk, covres);
     _cov += covres;
   } else {
     _cov.ResizeTo (_nc, _nc);
     ABAT (_dnCidPjk, Vjk, _cov);
   }
 }

https://github.com/hep-mirrors/RooUnfold/blob/fa7c56cb1a55da75c7879119cbb93b2862e9f480/src/RooUnfoldBayes.cxx#L360-L382

Computation time:

Now, when calculating the systematic errors wit h RooUnfold for a larger matrix (#effect bins = #cause bins = 700), this was still possible (so, no memory error), but it takes a considerable amount of cpu time; about 1h-2h on this pc for 2 iterations. As I want to use the unfolding on a problem where I expect to unfold between 20-10.000 different spectra, I think it may be interesting to be able to turn off the systematic error calculation. At least for a rough data-analysis with the possibility to get back to a more precise answer later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants