Design help: 6D SplitSplineEstimator using AutoDiffCost #342

urbste · 2022-10-31T16:22:44Z

urbste
Oct 31, 2022

❓ Questions and Help

Hi guys,

thank you for open sourcing Theseus. :)
I am currently experimenting with Theseus to implement a fully differentiable continuous-time spline estimator. I have a working implementation in ceres-solver and am keen if I am able to do the same in Theseus. It's quite complex, but maybe you can quickly see what I am doing wrong.

At the moment, I am struggling with finding the best way to pass the spline knots, that need to be optimized, to the error function. I tried several ways. Each reprojection residual depends on two camera poses, as I am parametrizing object points with inverse depth (i.e. I need the pose of the reference camera and of the current one in the objective).

I wrote a SplineEstimator module. Each spline (one in SO3 and one in R3) has multiple knots in th.SO3 and th.Vector3.
Subsequently, I add all of them to a class member list: self.optim_vars.

class SplineEstimator3D(nn.Module):
        self.so3_spline = SO3Spline(0, 0, dt_ns=dt_ns_so3, N=N, device=self.device)
        self.r3_spline = RDSpline(0, 0, dt_ns=dt_ns_r3, dim=3, N=N, device=self.device)
       ...
        # add optim variables --> all spline knots
        self.optim_vars = []
        for i in range(len(self.r3_spline.knots)):
            self.optim_vars.append(self.r3_spline.knots[i])
        for i in range(len(self.so3_spline.knots)):
            self.optim_vars.append(self.so3_spline.knots[i])
       ...

Then I take each view in the trajectory and iterate the observed object points to gather observations and reprojection residuals.
Now comes the first thing that iritates me:
Each camera pose is interpolated from N=4 spline knots (N being the degree of the spline). That means I somehow have to pass multiple optim_vars of different types to the error function.
E.g. for one residual: 4xSO3 tensors (rotations), 4xR3 tensors (translation), 1xR1 tensor (inverse depth))
In the error function, I then want to take the relevant knots and interpolate a camera pose to calculate the reprojection error.

To achieve this I am passing the index variable knot_ids as an aux_vars to the error function. That help me to gather the relevant optim_vars from self.optim_vars. Maybe this is already wrong?!

    def add_rs_view(self, view, view_id, recon):
        with torch.no_grad():
            # iterate observations of that view
            tracks = view.TrackIds()

            aux_vars = []
            # pass knot ids to optimizer,
            # we unfold those in the cost function
            knot_ids = torch.zeros((1,len(tracks),self.N,4)).int()
            knot_us = torch.zeros((1,len(tracks),4)).int()
            ...

            for idx, t_id in enumerate(tracks):
                ref_view_id = recon.Track(t_id).ReferenceViewId()
                ref_view = recon.View(ref_view_id)

                img_obs_time_ns = time_util.S_TO_NS * (
                    view.GetTimestamp() + self.line_delay.tensor[0]*view.GetFeature(t_id).point[1])
                img_ref_time_ns = time_util.S_TO_NS * (
                    ref_view.GetTimestamp() + self.line_delay.tensor[0]*ref_view.GetFeature(t_id).point[1])

                u_so3_obs, s_so3_obs, suc1 = self._calc_time_so3(img_obs_time_ns)
                u_r3_obs, s_r3_obs, suc2 = self._calc_time_r3(img_obs_time_ns)

                u_so3_ref, s_so3_ref, suc3 = self._calc_time_so3(img_ref_time_ns)
                u_r3_ref, s_r3_ref, suc4 = self._calc_time_r3(img_ref_time_ns)

                for i in range(self.so3_spline.N):
                    knot_ids[0,idx,i,0] = s_so3_ref + i
                    knot_ids[0,idx,i,1] = s_r3_ref + i
                    knot_ids[0,idx,i,2] = s_so3_obs + i
                    knot_ids[0,idx,i,3] = s_r3_obs + i
                knot_us[0,idx,0] = u_so3_ref
                knot_us[0,idx,1] = u_r3_ref
                knot_us[0,idx,2] = u_so3_obs
                knot_us[0,idx,3] = u_r3_obs

            aux_vars = [
                th.Variable(tensor=knot_ids.float(), name="knot_ids_"+str_cnt),
                th.Variable(tensor=knot_us.float(), name="knot_us_"+str_cnt),
                ...
            ]

            cost_function = th.AutoDiffCostFunction(
                self.optim_vars, 
                self._rs_error, 
                2*len(tracks), 
                aux_vars=aux_vars,
                name="rs_repro_cost_"+str_cnt, 
                autograd_vectorize=True, 
                autograd_mode=th.AutogradMode.LOOP_BATCH,
                autograd_strict=False)

            self.objective.add(cost_function)

            self.cnt_repro_err += 1

My error function then looks like below.
I already encountered problems with VMAP. To get the relevant spline knots from the optim_vars list I am iterating the index tensors s_*. Also this is probably slow.
In the end, I need a SO3 tensor with size (N, NUM_OBSERVATIONS, 3, 3) and a R3 tensor with (N, NUM_OBSERVATIONS, 3) to evaluate the spline and get the two camera poses for each residual.

    def _rs_error(self, optim_vars, aux_vars):

        r3_knots = optim_vars[:len(optim_vars)//2]
        so3_knots = optim_vars[len(optim_vars)//2:]
        
        s = aux_vars[0].tensor[0]
        u = aux_vars[1].tensor[0]
       ....
        # (obs, N, )
        s_so3_ref = s[:,:,0].int()
        s_r3_ref = s[:,:,1].int()
        s_so3_obs = s[:,:,2].int()
        s_r3_obs = s[:,:,3].int()
        num_obs = s_so3_ref.shape[0]

        u_so3_ref = u[:,0]
        u_r3_ref = u[:,1]
        u_so3_obs = u[:,2]
        u_r3_obs = u[:,3]

        all_R_refs = torch.cat([so3_knots[idx].tensor[0] for idx in s_so3_ref.flatten()],0).reshape(self.N,num_obs,3,3)
        all_p_refs = torch.cat([r3_knots[idx].tensor[0] for idx in s_r3_ref.flatten()],0).reshape(self.N,num_obs,3) 
        all_R_obs = torch.cat([so3_knots[idx].tensor[0] for idx in s_so3_obs.flatten()],0).reshape(self.N,num_obs,3,3) 
        all_p_obs = torch.cat([r3_knots[idx].tensor[0] for idx in s_r3_obs.flatten()],0).reshape(self.N,num_obs,3) 

        # evalute reference rolling shutter pose
        R_w_i_ref = self.spline_helper.evaluate_lie_vec(
            all_R_refs, u_ld_ref_so3, 
            self.inv_dt_so3.tensor, derivatives=0, num_meas=num_obs)[0]

        t_w_i_ref = self.spline_helper.evaluate_euclidean_vec(
            all_p_refs, u_ld_ref_r3, 
            self.inv_dt_r3.tensor, derivatives=0, num_meas=num_obs)

        R_w_i_obs = self.spline_helper.evaluate_lie_vec(
            all_R_obs, u_ld_obs_so3, 
            self.inv_dt_so3.tensor, derivatives=0, num_meas=num_obs)[0]

        t_w_i_obs = self.spline_helper.evaluate_euclidean_vec(
            all_p_obs, u_ld_obs_r3, 
            self.inv_dt_r3.tensor, derivatives=0, num_meas=num_obs)

       ...
        repro_error = (x_camera[:,:2,0] / x_camera[:,2] - obs_obs).flatten().unsqueeze(0)
        return repro_error

Unfortunately, the optimization is extremely slow and is not really working either.
So to summarize my questions are:

How to design the cost functions and pass multiple optim vars of different types (SO3, R3, R1, ...) in the optim_vars tuple?
VMAP does not work for control loops inside the error function, so again, how would you create such a cost function? At the moment I somehow have to gather the relevant optim_vars from the tuple
The optimization is extremely slow. Maybe this is due to my wrong error function design. In ceres-solver it only takes milliseconds for 10 cameras.
If you want to test my implementation you can see the full code here. It also contains a dataset.

Thanks already for reading!
Cheers Steffen

Answered by luisenp

Nov 12, 2022

@urbste Following our conversation, I thought a bit more about this and came up with something like the mock code below. Instead of passing a function to err_fn I pass a callable object that stores the indices internally, so that they don't have to be passed as aux vars. The code below works for me.

BTW, I suspect the part that vmap doesn't like is the tensor reshaping, which I think you should also be able to avoid using torch.stack with the appropriate dimension. Let me know if the example below makes sense.

class MockReprErr:
    def __init__(self, so3_knot_idx, r3_knot_idx):
        self.so3_knot_idx = so3_knot_idx
        self.r3_knot_idx = r3_knot_idx
        
    def __call__(self, o…

View full answer

luisenp · 2022-11-01T20:20:33Z

luisenp
Nov 1, 2022
Collaborator

Hi @urbste. Thanks for your interest in Theseus and for reaching out! I will try to set aside to look at this tomorrow. In the mean time, do you have any of this math written or a reference that you can share? I'm not too familiar with this, and would love to understand the abstract problem better before I suggest something.

0 replies

urbste · 2022-11-01T20:35:49Z

urbste
Nov 1, 2022
Author

Hi. Thanks already for your answer.
Sure. One of the first papers was:
Lovegrove, Steven, Alonso Patron-Perez, and Gabe Sibley. "Spline Fusion: A continuous-time representation for visual-inertial fusion with application to rolling shutter cameras." BMVC. Vol. 2. No. 5. 2013.

Building on this is:
Ovrén, Hannes, and Per-Erik Forssén. "Trajectory representation and landmark projection for continuous-time structure from motion." The International Journal of Robotics Research 38.6 (2019): 686-701.

Steffen

0 replies

urbste · 2022-11-03T06:31:52Z

urbste
Nov 3, 2022
Author

My spline implementation is based on that C++ code: https://gitlab.com/VladyslavUsenko/basalt-headers/-/tree/master/include/basalt/spline

0 replies

luisenp · 2022-11-03T13:53:01Z

luisenp
Nov 3, 2022
Collaborator

I started looking at the code and the paper, but it's going to take me some time to fully digest :) I'll also check with the rest of the team to see if someone else is already familiar with this.

In the meantime, I have some questions. I downloaded your code and ran it, and noticed that all your cost functions are receiving the full set of optimization variables. Is this intentional? Typically, every cost is only a function of a subset of the optimization variables. Maybe this the reason why you are doing the complex indexing that you mentioned above that breaks vmap?

Is it possible to write your problem in a way that you have separate cost functions that only receive the subset of optimization variables that they need? For example, if I understand the code correctly, your aux vars never change (I don't see them being modified by your forward()). This means that the indexing you do inside the cost function call is also constant, and this makes me think that you should be able to write separate cost functions for each indexing case, with only the subset of optim vars that's needed for each.

If this rewrite is possible, I strongly suggest to do so, because it may have quite a large effect on running time. In the current version torch's autograd needs to evaluate 458 jacobians for each cost function, and a lot of these are actually zeros (in one cost function I checked only 10 were nonzero). Thus a lot of computation seems to be wasted here. This would also make your cost function cost much simpler, which should also speed up computation.

Does this make sense? Let me know if anything here is unclear or if I'm misunderstanding something.

PS: I'm not sure I understood the first question in your original post, you can indeed pass vars of different types as optim vars (your code does so already with SO3 and R3). Also, it looks like you are using batch size = 1, so in this case using vmap shouldn't make a big difference.

18 replies

urbste Nov 11, 2022
Author

Yes I am also confused about that part. :-D That's basically the core of the problem I guess.

Each camera pose P=[R,t] depends on 4 SO3 Knots and 4 R3 Knots (4 is the order of the spline)
The camera pose P is then interpolated from those 4 knots at some timestamp using the function
R_ref = self.spline_helper.evaluate_lie_vec(all_R_refs, ...)
p_ref = self.spline_helper.evaluate_euclidean_vec(all_p_refs, ...)
P_ref = [R_ref, p_ref]

R_obs = self.spline_helper.evaluate_lie_vec(all_R_obs, ...)
p_obs = self.spline_helper.evaluate_euclidean_vec(all_p_obs, ...)
P_obs = [R_obs, p_obs]

all_R_refs has shape (4,1,3,3) -> 4 knots, 1 measurement, 3x3 rotation
all_p_refs has shape (4,1,3,1) -> 4 knots, 1 measurement, 3x1 position

all_R_obs has shape (4,1,3,3) -> 4 knots, 1 measurement, 3x3 rotation
all_p_obs has shape (4,1,3,1) -> 4 knots, 1 measurement, 3x1 position

However all_R_obs and all_R_refs might depend on different knots (e.g. all_R_obs might depend on knot 0,1,2,3 and R_refs might depend on 4,5,6,7).
So in each error function i need knot 0-7. That's why I am having that complex index mapping to gather all knots that take part in one residual:

 # get all knots that are in both reference and observing camera
            knots_in_cost_ids_so3 = sorted(set(knot_ids_ref_so3) | set(knot_ids_obs_so3))
            knots_in_cost_ids_r3 = sorted(set(knot_ids_ref_r3) | set(knot_ids_obs_r3))

            # get start knots for each spline (N=4) and camera
            # e.g. global cam_ref_knots_ids = [10,11,12,13], cam_obs_knots_ids=[12,13,14,15]
            # then global knots_in_cost are [10,11,12,13,14,15]
            # however in each cost locally the knot ids are [0,1,2,3,4,5]
            # hence in the cost function the start knots for cam_ref will be 0 -> [0,1,2,3]
            # and in the cost function the start knots for cam_obs will be 2 -> [2,3,4,5]
            # in turn this is split in so3 and r3 spline as they do not necessarily have the same dt 
            # and thus not the same global ids
            start_idx_ref_so3 = knots_in_cost_ids_so3.index(knot_ids_ref_so3[0])
            start_idx_obs_so3 = knots_in_cost_ids_so3.index(knot_ids_obs_so3[0])
            start_idx_ref_r3 = knots_in_cost_ids_r3.index(knot_ids_ref_r3[0])
            start_idx_obs_r3 = knots_in_cost_ids_r3.index(knot_ids_obs_r3[0])

            knot_start_ids = torch.arange(0,4).repeat(1,4).reshape(4,4).T + torch.tensor(
                [[[start_idx_ref_so3,start_idx_obs_so3,start_idx_ref_r3, start_idx_obs_r3]]])

            optim_vars = [self.so3_spline.knots[idx] for idx in knots_in_cost_ids_so3]
            optim_vars.extend([self.r3_spline.knots[idx] for idx in knots_in_cost_ids_r3])

Argh, it's difficult to explain ^^

urbste Nov 11, 2022
Author

Generally, Theseus assumes that the first dimension is always a batch dimension, where the batch index represents a different optimization problem (e.g., you have two pose graph optimization problems, with the same structure, but different data). So, all error functions, inputs, etc. must be written assuming that the first dimension of all optim/aux var tensors represents an index to the optimization problem.

Hm ok, but I do only have a single optimization problem. How would I reformulate the objective then? I am using batch size 1 everywhere when constructing the variables.

luisenp Nov 11, 2022
Collaborator

Seems tricky to discuss this in this format. How about a video chat? I'd be happy to jump on a call and try to help out. Today would be a good day for me, since it's a holiday here.

urbste Nov 11, 2022
Author

Sure we can do that if you like. :) What tool?

luisenp Nov 11, 2022
Collaborator

Just sent you an email to the zeiss address.

luisenp · 2022-11-03T13:57:19Z

luisenp
Nov 3, 2022
Collaborator

Also, another thing that should speed up things would be to use our sparse solvers. In the LevenbergMarquardt construct you can pass linear_solver_cls=th.LUCudaSparseSolver and linearization_cls=th.SparseLinearization. We also have a more recent solver called th.BaspachoSparseSolver that's much faster, but it's more complex to install if you are installing from source (but it's available with pip install theseus-ai). For CPU-only you can use th.CholmodSparseSolver.

That being said, I still think that simplifying the objective formulation has the more room for impact.

1 reply

urbste Nov 3, 2022
Author

Thanks for those hints! I was planning on playing around with the settings, but I do not even get the optimization objective constructed correctly atm ;)

mhmukadam · 2022-11-03T15:13:14Z

mhmukadam
Nov 3, 2022

Hi @urbste, thanks for the interest! Some of the details in your application are a bit hard to parse from the implementation. Would it be possible for you to share a sketch of the factor graph visualization of your objective, and the equation number from the reference paper for the objective you are implementing. A pointer to your Ceres implementation would also be helpful.

For now some high-level responses to your questions in case they help:

pass multiple optim vars of different types

The autodiff cost already allows this, where you can pass a sequence of Manifold objects as the optim_vars. The main contract of any error function inside this cost is that it will compute one Jacobian of the error with respect to every input optim var. So similar to what @luisenp suggested you want to avoid passing optim vars the error doesn't depend on since it will lead to unnecessary computation of Jacobians that are zero.

VMAP does not work for control loops ... gather the relevant optim_vars

This might be related to the point above; optim_vars should not contain object that the error doesn't need. If your overall objective is a sum of residuals then you want to use one separate AutoDiffCostFunction for each residual in your objective. vmap also has a few other requirements for it to work like no in-place operations, etc.

1 reply

urbste Nov 3, 2022
Author

Hey, thank you for you help. I will try to make the math clearer when I find the time tomorrow.

You can find the Ceres implementation of the RS error without inverse depth in my IMU-Camera calibrator. The one using inverse depth is commented out in the calibrator as it does not make much sense in the calibration context but you can find it here.

And the entire C++ SplineEstimator is implemented here whereas the inverse depth residuals are added here.

luisenp · 2022-11-12T12:55:54Z

luisenp
Nov 12, 2022
Collaborator

@urbste Following our conversation, I thought a bit more about this and came up with something like the mock code below. Instead of passing a function to err_fn I pass a callable object that stores the indices internally, so that they don't have to be passed as aux vars. The code below works for me.

BTW, I suspect the part that vmap doesn't like is the tensor reshaping, which I think you should also be able to avoid using torch.stack with the appropriate dimension. Let me know if the example below makes sense.

class MockReprErr:
    def __init__(self, so3_knot_idx, r3_knot_idx):
        self.so3_knot_idx = so3_knot_idx
        self.r3_knot_idx = r3_knot_idx
        
    def __call__(self, optim_vars, aux_vars):
        so3_knots_ = optim_vars[: len(optim_vars) // 2]  # each (B, 3, 3)
        r3_knots_ = optim_vars[len(optim_vars) // 2 :]   # each (B, 3)
        
        # Shapes will be (4, B, 3, 3) and (4, B, 3)
        so3_knots_tensor_ = torch.stack(
            [so3_knots[i].tensor for i in self.so3_knot_idx], dim=1)
        r3_knots_tensor_ = torch.stack(
            [r3_knots[i].tensor for i in self.so3_knot_idx], dim=1)
        
        dummy_err_1_ = th.SO3(tensor=so3_knots_tensor_[:, 1, ...]).local(th.SO3()) # (B, 3)
        dummy_err_2_ = th.Point3(tensor=r3_knots_tensor_[:, 1, ...]).local(th.Point3()) # (B, 3)
        total_dummy_err_ = torch.cat([dummy_err_1_, dummy_err_2_], dim=1)  # (B, 6)
        return total_dummy_err_
    

obj = th.Objective()
so3_knots = []
r3_knots = []
for i in range(5):
    so3_knots.append(th.SO3.rand(1))
    r3_knots.append(th.Point3.rand(1))

weight = th.ScaleCostWeight(1.0)
obj.add(
    th.AutoDiffCostFunction(
        optim_vars=so3_knots + r3_knots,
        err_fn=MockReprErr([0, 1, 2, 3], [0, 1, 2, 3]),
        dim=6,
        cost_weight=weight,
        autograd_mode="vmap",
        name="mock_repr_1",
    )
)
obj.add(
    th.AutoDiffCostFunction(
        optim_vars=so3_knots + r3_knots,
        err_fn=MockReprErr([1, 2, 3, 4], [0, 1, 2, 3]),
        dim=6,
        cost_weight=weight,
        autograd_mode="vmap",
        name="mock_repr_2",
    )
)
layer = th.TheseusLayer(th.LevenbergMarquardt(obj))
layer.forward(optimizer_kwargs={"verbose": True, "damping": 0.1})

1 reply

urbste Nov 16, 2022
Author

Hey @luisenp . It is working! Also VMAP works now. I like the implementation that you proposed and adapted my implementation. Will update my repo in time. Thank you so much again for your support :-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design help: 6D SplitSplineEstimator using AutoDiffCost #342

{{title}}

Replies: 7 comments 21 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Design help: 6D SplitSplineEstimator using AutoDiffCost #342

urbste Oct 31, 2022

❓ Questions and Help

Replies: 7 comments · 21 replies

luisenp Nov 1, 2022 Collaborator

urbste Nov 1, 2022 Author

urbste Nov 3, 2022 Author

luisenp Nov 3, 2022 Collaborator

urbste Nov 11, 2022 Author

urbste Nov 11, 2022 Author

luisenp Nov 11, 2022 Collaborator

urbste Nov 11, 2022 Author

luisenp Nov 11, 2022 Collaborator

luisenp Nov 3, 2022 Collaborator

urbste Nov 3, 2022 Author

mhmukadam Nov 3, 2022

urbste Nov 3, 2022 Author

luisenp Nov 12, 2022 Collaborator

urbste Nov 16, 2022 Author

urbste
Oct 31, 2022

Replies: 7 comments 21 replies

luisenp
Nov 1, 2022
Collaborator

urbste
Nov 1, 2022
Author

urbste
Nov 3, 2022
Author

luisenp
Nov 3, 2022
Collaborator

urbste Nov 11, 2022
Author

urbste Nov 11, 2022
Author

luisenp Nov 11, 2022
Collaborator

urbste Nov 11, 2022
Author

luisenp Nov 11, 2022
Collaborator

luisenp
Nov 3, 2022
Collaborator

urbste Nov 3, 2022
Author

mhmukadam
Nov 3, 2022

urbste Nov 3, 2022
Author

luisenp
Nov 12, 2022
Collaborator

urbste Nov 16, 2022
Author