This project implements PF-AAE, a framework to perform tracking of the 3D pose of an object via particle filtering (PF) and augmented autoencoders (AAE). The filter iteratively estimates the posterior of the rotation matrix Rt given the RGB input y1:t. The prediction step deploys a noise model in SO(3), while the correction step deploys a measurement model based on the AAE architecture. A novel resampling strategy called AAETL resampling suggests an improvement of the tracking performance when the object undergoes abrupt changes of the pose. It relies on AAETL, an augmented autoencoder trained with texture-less reconstruction objectives. The tracking procedure is carried out offline.
This work builts on the AugmentedAutoencoder repository, available here.
The actual implementation of PF-AAE is under NDA. This repository contains only the baseline code of the work, namely the AugmentedAutoencoder repository. The present readme is included to provide an overview of PF-AAE and its functionalities. Therefore, the reported demos cannot be reproduced and some files may not be available. Feel free to reach out for further details regarding the work π.
- Installation
- Augmented Autoencoders
- PF-AAE architecture
- Run a demo
- Datasets
- Code structure
- Acknowledgments
- License
- References
- Install the code dependencies
pip install -r requirements.txt
- Pip installation of the code
pip install --user .
- Create the workspace (folder to collect AAE, AAETL, PF-AAE data)
export AE_WORKSPACE_PATH=/path/to/aae_workspace
mkdir $AE_WORKSPACE_PATH
ae_init_workspace
- Check the content of the workspace
βββ aae_workspace
Β Β βββ cfg
Β Β βΒ Β βββ train_template_aae.cfg
Β Β βΒ Β βββ train_template_aae_tl.cfg
Β Β βββ cfg_eval
Β Β βββ experiments
Β Β βββ tmp_datasets
Augmented autoencoders (AAEs) are convolutional autoencoders trained to reconstruct the view of an object from an augmented version of it fed as input. Thus, they deliver an implicit representation of rotations in their latent space. For further details, refer to the original readme of the AugmentedAutoencoder repository.
In this framework, it is possible to train two kinds of AAEs:
- AAE architecture: augmented autoencoder with textured reconstruction.
- AAETL architecture: augmented autoencoder with texture-less reconstruction.
The former is more discriminative, while the latter maps views of the object that are symmetric without textures nearby in the latent space.
The image shows the AAE training procedure and the results obtained after 20000 training epochs.
The image shows the AAETL training procedure and the results obtained after 20000 training epochs.
The AAEs architectures are defined via a configuration file .cfg. Examples can be found in auto_pose/ae/cfg or in the workspace, after its initialization.
In the configuration file must be defined the path to the 3D model of the object (MODEL_PATH). Moreover, it is required the path to a folder containing the images for the augmentation of the training input (BACKGROUND_IMAGES_GLOB). The datasets used for 3D models and background images are reported in section Datasets.
[Paths]
MODEL_PATH: /path/to/my_3d_model.ply
BACKGROUND_IMAGES_GLOB: /path/to/background/images/*.jpg
To enable the training of an AAE or an AAETL architecture the TLESS_TARGET flag must be set as follow:
[Network]
TLESS_TARGET: False # for AAE training
TLESS_TARGET: True # for AAE_TL training
For further details about the configuration files, refer to the original readme of the AugmentedAutoencoder repository.
- Copy your configuration file
my_autoencoder.cfg
in the workspace
mkdir $AE_WORKSPACE_PATH/cfg/exp_group
cp path/to/your/my_autoencoder.cfg $AE_WORKSPACE_PATH/cfg/exp_group/my_autoencoder.cfg
- Train the architecture
ae_train exp_group/my_autoencoder
- Create the embedding (i.e., the codebook)
ae_embed exp_group/my_autoencoder
- Check the content of the workspace
βββ aae_workspace
Β Β βββ cfg
Β Β βΒ Β βββ exp_group
Β Β βΒ Β βββ my_autoencoder.cfg
Β Β βββ experiments
Β Β Β Β βββ exp_group
Β Β Β Β βββ my_autoencoder
Β Β Β Β βββ checkpoints
Β Β Β Β βββ train_figures
PF-AAE is a particle filter that performs the tracking of the 3D pose of an object from a sequence of images. It iteratively estimates the posterior of the object rotation matrix Rt given the RGB input, or observations, y1:t.
This framework builts on the implementation of a particle filter offered by the pfilter repository.
The image shows one iteration of PF-AAE. The prediction step moves the particles exploiting a noise model in SO(3) as state evolution model. The correction step builds the measurement model with an AAE encoder and its latent space. Namely, the rendered particles are compared with the observation via the cosine similarity. Then, a Gaussian kernel is applied as weighting function (not shown). As resampling procedure, it is possible to combine systematic resampling and AAETL resampling (cf. the subsection AAETL resampling).
The implemented noise models are norm
, unif-norm
, predict
. The weighting function presents a parameter gamma
that controls the discriminative behavior of the system. The resampling is performed when the effective number of particles is below a threshold n_eff_threshold
. For further details, refer to auto_pose/pf/pfilter_aae.py.
With an AAETL architecture trained on the same object of the AAE architecture employed in the filter, it is possible to use the AAETL resampling. At each iteration, the portion aae_resampling_proportion
of particles with the lowest weights is substituted with particles uniformly sampled from the aae_resampling_knn
nearest neighbors of the MAP estimate in the AAETL codebook. The other particles are resampled according to the systematic resampling procedure.
The codename of this resampling procedure is aae-tl
. For comparison, also the unif
resampling is implemented. It performs the sampling procedure uniformly in SO(3) instead of the codebook of AAETL.
- Generate a sequence of views of the object
pf_generate_sequences exp_group/my_autoencoder \
# sequence parameters (see below)
For the available parameters, refer to auto_pose/pf/pf_generate_sequences.py and the section Run a Demo.
- Start tracking of the sequence
pf_tracking_sequences -aae exp_group/my_autoencoder \
# tracking parameters (see below)
For the available parameters, refer to auto_pose/pf/pf_tracking_sequences.py, auto_pose/pf/pfilter_aae.py, and the section Run a Demo.
- Check the results in the workspace
βββ aae_workspace
Β Β βββ experiments
Β Β βΒ Β βββ exp_group
Β Β βΒ Β βββ my_autoencoder
Β Β βΒ Β βββ filtering
Β Β βΒ Β β βββ sequence_name
Β Β βΒ Β β βββ pf_tracking_name
β Β Β Β Β βββ ...
Β βββ ...
In place of sequence_name
and tracking_name
will appear two strings that respectively identify the generated sequence and the tracking experiments, along with their parameters.
- Edit the first two lines of demo/cfg/aae/cracker.cfg and demo/cfg/aae/cracker.cfg. MODEL_PATH must be the path to the YCB cracker_box model, available in demo/obj_000002.ply. BACKGROUND_IMAGES_GLOB must be the path to the Pascal VOC2012 dataset, available here.
[Paths]
MODEL_PATH: /path/to/obj_000002.ply
BACKGROUND_IMAGES_GLOB: /path/to/voc12/VOCdevkit/VOC2012/JPEGImages/*.jpg
- Copy the configuration files in demo/cfg in the workspace
cp -r demo/cfg $AE_WORKSPACE_PATH
- Training and embedding of the AAE architecture for the YCB cracker_box
ae_train aae/cracker # NB: ~8 hours with a K40 GPU
ae_embed aae/cracker
- Training and embedding of the AAETL architecture for the YCB cracker_box
ae_train aae_tl/cracker # NB: ~8 hours with a K40 GPU
ae_embed aae_tl/cracker
- Generate a sequence with a backflip of the object at 4.2 seconds. Then, run 3 tracking experiments on it: one with AAETL resampling, one with uniform resampling, and one without AAETL and uniform resampling.
cd demo
./pf_aae_example.sh
- Results in
$AE_WORKSPACE_PATH/experiments/aae/cracker/filtering
The following animation shows from left to right:
- the input sequence (generated)
- the output of the PF-AAE w/ AAETL resampling (rendered)
- the output of the PF-AAE w/ uniform resampling (rendered)
- the output of the PF-AAE w/o AAETL and uniform resampling (rendered)
The following figure compares the ground truth 3D poses of the object in the input sequence with the ones estimated by the 3 tracking experiments. Poses are expressed with their axis-angle representations.
The following figures show the (rendered) particles of the filters when the backflip occurs. For each particle:
- bottom left: the cosine similarity with the observation yt
- bottom right: the weight of the particle
The meanings of the colors of the borders are the following:
- black: particles obtained with the prediction step
- red: particles obtained with AAETL or uniform resampling
- blue: MAP estimate that comes from a particle obtained with the prediction step
- green: MAP estimate that comes from a particle obtained with AAETL or uniform resampling
Particles of PF-AAE w/ AAETL resampling when the backflip occurs (4.2 s):
Particles of PF-AAE w/ uniform resampling when the backflip occurs (4.2 s):
This work has been tested with the following two datasets:
- YCB_Video: used for the 3D models of the objects being tracked.
- Pascal VOC 2012: used for the augmentation of the input images during training.
We use Pyrender + EGL for object rendering. Differently from the original AugmentedAutoencoder code, this renderer permits to use 3D models with textures. Please, make sure that the mesh vertices are expressed in meters before launching the training procedure.
The main changes from the original AugmentedAutoencoder code are in the following files:
βββ auto_pose
βΒ Β βββ ae
βΒ Β βΒ Β βββ ae_latent_exploration.py
βΒ Β βΒ Β βββ ae_latent_study.py
βΒ Β βΒ Β βββ cfg
βΒ Β βΒ Β βΒ Β βββ train_template_aae.cfg
βΒ Β βΒ Β βΒ Β βββ train_template_aae_tl.cfg
βΒ Β βΒ Β βββ ...
βΒ Β βββ pf
βΒ Β βΒ Β βββ pf_generate_sequences.py
βΒ Β βΒ Β βββ pfilter_aae.py
βΒ Β βΒ Β βββ pfilter.py
βΒ Β βΒ Β βββ pf_tracking_sequences.py
βΒ Β βΒ Β βββ utils.py
βΒ Β βββ renderer
βΒ Β βΒ Β βββ renderer.py
βΒ Β βββ ...
βββ scripts
βΒ Β βββ ae_embedding
βΒ Β βββ ae_latent_exploration
βΒ Β βββ ae_latent_study
βΒ Β βββ ae_training
βΒ Β βββ pf_sequences
βΒ Β βββ pf_tracking
βββ setup.py
βββ ...
We provide hereafter a brief overview of the code structure:
- auto_pose/ae contains some new code to train and study the AAETL architecture, along with the original AAE.
- auto_pose/pf contains the main code that implements the PF-AAE architecture.
- auto_pose/renderer contains the interface with Pyrender, which supports textured models.
- scripts contains some examples of shell scripts to configure and use the PF-AAE, AAE, AAETL architectures.
We refer to the code documentation for more details.
This project has been developed during my internship at the Istituto Italiano di Tecnologia (IIT), within the Humanoid Sensing and Perception group (HSP). I am sincerely thankful to my supervisors for all their support and suggestions to carry out the work.
This code is licensed under MIT License, see the LICENSE file for more details.
[1] Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, and Rudolph Triebel, Implicit 3D Orientation Learning for 6D Object Detection from RGB Images, The European Conference on Computer Vision (ECCV), September 2018.
[2] Xinke Deng, Arsalan Mousavian, Yu Xiang, Fei Xia, Timothy Bretl, and Dieter Fox, PoseRBPF: A RaoβBlackwellized Particle Filter for 6-D Object Pose Tracking, 2019.
[3] Simo SΓ€rkkΓ€, Bayesian Filtering and Smoothing. Cambridge University Press, USA. 2013.