-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] A new type-safe and casting-free wrapper for TClonesArray: TCAViewer #764
Conversation
Some comments:
then it would never happen for events with nHits==0. (FairTask does not provide a hook which is run at the start of an event, which would be the perfect time to reset stuff.) Otherwise, this is the third implementation of a wrapper for TCAs in our codebase:
All of these have upsides and downsides. Has anyone recently tried to see of ROOT can nowadays store |
Thanks for your feedback. These are very helpful.
For this, we should wait for ROOT 7. |
For the last point, no, there is no simple and clean solution I could come up with now. The ugly and easy way I could think of is to check the event number. Check whether the difference between the event number when the Get() is called last time and this time is 1 or not. If not, raise an error. But such risk should be pointed out during the compile time not the run time. I don't know how to check this in compile time. |
} | ||
|
||
template <typename... Args> | ||
void Emplace_back(Args&&... args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emplace_back
if you want to mimic std, or EmplaceBack
if you want to keep the style of the other names in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will follow emplace_back
.
{ | ||
public: | ||
friend TCAOutput_SafeGuard<DataType>; | ||
explicit Data(std::string branchName = "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure you want to provide a default here? If there is no usecase for it, I would also not provide getter and setter for mBranchName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, better write a default one. Setter is kind of needed especially when branch name is not available during inline initialisation.
|
||
namespace TCAViewer | ||
{ | ||
enum Mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that it is wise to have both input and output in the same class, as there is some complexity involved with if-fing between those modes. There seem to be two distinct classes in one here. One of which does not need the SafeGuard (if I saw that correctly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, they are different classes with different methods. But I would say for users, they are just two kinds of data handling. One could be input or output. So in terms of usability, IMO, TCAVIewer<out>
and TCAViwer<in>
is better than TCAOutputViewer
and TCAInputViewer
.
Another reason is I'm still considering how to add another mode call "inout". With this mode, user can modify the data (read and write), which requires both functionalities from in mode and out mode.
}; | ||
|
||
template <typename Datatype> | ||
class TCAOutput_SafeGuard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get what this is for, but you do need a small comment on why it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, yeah, I should do that.
{ | ||
if (mSafeGuard) | ||
{ | ||
LOG(fatal) << "TCAViewer<out>: Get() function can only be called once in the scope. "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think LOG(fatal) aborts, but you might want to make this more explicit.
Do what the editor tells you to do: If you add a exception below the log, and the code analysis says it can't be reached, then you don't need it (or just a comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Error handling is my weakest spot in C++. Maybe I should throw an exception instead of using "LOG(fatal)"?
{ | ||
LOG(fatal) << "TCAViewer<out>: Get() function can only be called once in the scope. "; | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the log(fatal) acts as an "early exit", you don't need the else, and can put inner scope on one level with the return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, make sense
} | ||
} | ||
|
||
inline auto GetMCEntryNumber() -> int { return eventHeader->GetMCEntryNumber(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need that function, its just a wrapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
return is_success; | ||
} | ||
|
||
void LogMCEntryNumber() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log...
is not what this does. This is about ensuring the function was called, right? So what about EnsureFunctionCalled
or similar?
Then, you could pull it together with the safeguard, and pull out the if safeguard
plus theEnsureFunctionCalled
from OutputGet
into a EnsureFunctionCalledExactlyOnce
.
I don't understand why it is necessary, by the way. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[nodiscard]] auto OutputGet() -> TCAOutput_SafeGuard<DataType>
{
EnsureFunctionCalledExactlyOnce();
Clear();
return TCAOutput_SafeGuard<DataType>(this);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The reason is when user put this Get() (or OutputGet) function call inside an if-statement or there is a throw before the function call, Get() function won't be called for every event. And in some events, if function is not called, old data from last call can't be cleared and as the result they would be written down to files for the second time in the end of event loop. This is bad. (I also think clearing data cache in TClonesArray should be handled by FairRootManger, not by ourself.)
So I need to make sure the OutputGet() function must be called at least once to clear the old data. And I also don't want user to create multiple unnecessary output handlers for a single output data. The function should be called exactly once per event loop. Yeah, EnsureCalledOnce
is a better name.
Please, rebase the dev branch to update the CIs |
There are still some changes to I need to make. I will inform you when it's finished. |
NB: In #792, I have moved the core tasks for CALIFA (except CalifaPoint for simulation) to std::map and std::vector. So far, this seems to work. I think the preferable way to use TClonesArrays is "not at all". |
@klenze Yes, I also managed to store a std::vector using class MyClass : FairTask
{
std::vector<NeulandMappedData> data_;
}
FairRootManager::Instance()->RegisterAny("a name", &data_, true); But rather I have to do something ugly like this: class MyClass : FairTask
{
std::vector<NeulandMappedData> data_;
std::vector<NeulandMappedData>* dataPtr_;
// constructor:
{
dataPtr_ = &data_;
}
}
FairRootManager::Instance()->RegisterAny("a name", dataPtr_, true); It's so hard to have really nice things with FairRoot :( |
RegisterAny actually applies the & operator to its argument:
Kids, don't do this at home. InitObjectAs feels confusing as well, the comment says "Initializes and returns a default object for a branch", but from reading the code I would assume it actually first searches in its branches, then asks fSource if it happens to know an object with that name, and if that fails, it actually returns nullptr, which is what I would want? Have you tried using anything except for std::vector as a container? I tried std::map and it did not work to well for me. ROOT seems to serialize std::map<K,V> to an array of tuple<K,V> or something, and I did not succeed in getting it to turn it back into a map when reading. The documentation for serializing STL to *.root felt rather slim. Of course, it was probably a year ago that I tried it, might be better now. If std::map does not work, I guess one could write a custom class and just reconstruct the map during deserialization. |
@klenze Yes, I just tried writing and reading from std::map. It works pretty much the same as std::vector. Did you add the definition to the cint link file? Like: #pragma link C++ class map<int, R3BPaddleTamexMappedData2>+; And probably adding
Yes, thus if I just pass auto data = std::vector<MapData>{};
auto* ptr = &data;
tree->SetBranchAddress("name", &ptr); But anyway, we could throw the TCloneArray into trash bin from now. |
Interesting. did you read back the map from file with FairRoot or with plain ROOT? Do you know if ROOT performs type checking? (If it is based on TTree::SetAddress(void*), I don't see how it would do that). The manual states
This is confusing to me. Does that mean that for a std::list with five elements, SetAddress actually remembers five different heap locations, and if I later have four or six elements in that list, I explicitly need to update them? If so, this is totally not how I would use containers (e.g. query the STL object for its contents in each TTree::Fill). |
I close this PR because there is a better way to replace TClonesArray with STL containers. Please see #890. |
Motivation
Much has been discussed about castings during event loops. Ideally casting should not be used at all. The biggest reason why there are so many castings in our program is because all the input and output data are in the type
TClonesArray
, which neither has type-safety nor memory safety. Some people, like our former colleague, @janmayer, tried to create type-safety by making a wrapping class template, calledTCAConnector
. But unfortunately, in the actual implementation, data pointers still need to be copied and casted to the correct version.The new wrapper
TCAViewer
solves this problem by using an array-view class calledspan
. The idea of viewing an array is relatively new and was not even implemented until the latest standard C++20. With C++17, we can still fully utilize such a conception from a library called "Guideline Supported Library" (GSL).Basic idea
Unlike
std::array<>
,gsl::span
(orstd::span
in C++20) is very light weighted and does not own any data. It just requires the address of the first element and the size of the container. However, it has all features of modern C++ containers, such as iterators and range-based looping. Fortunately, the address of the first element ofTClonesArray
always stays the same throughout all events. Once such address value becomes known, aspan
object can be created per event to represent all the data fromTClonesArray
without the need to copy or cast.TCAViewer
(please check the source code in
TCAViewer.h
for the full details)Initialisation:
Return:
Note: Always keep in mind that the return in input mode is a span containing pointers to the data, NOT directly data.
Data types:
There are three important data types defined in TCAViewer:
TCAViewer::Mode
is an enumerator containing only two entries:TCAViewer::in
andTCAViewer::out
. It specifies whether the object from the classTCAViewer::Data
represents input data or output data. If you accidentally choose the wrong mode, an error will be shown during the compilation (see below).TCAViewer::Data
, as a class template, is the main part of TCAViewer. Its signature and constructor are shown below:The typename DataType is the type of the object, such as
R3BNeulandPoint
orTVector3
, which must be derived fromTObject
. mode shows whether you need an input data or an output data. The constructor parameterbranchName
is the name of the branch where the actual data is extracted fromTTree
. It's the same constructor parameter when you use type-unsafeFairRootManager::GetObject()
, such asTCAViewer::TCAOutput_SafeGuard
is a return type ofTCAViewer::Pointer<DataType, TCAViewer::out>::Get()
and should rarely appear if you have a good C++ programming habit (using auto as much as possible. ). Remember always needing to "clear" the TClonesArray or TCAOutputConnector at the end of the event loop? With TCAViewer, there is no such need as this is automatically done (thanks to an C++ idiom called "RAII").Member functions of TCAViewer::Data:
To be added...
UB?
To be continued...