index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description"
          content="Gripper-Aware Grasping: End-Effector Shape Context for Cross-Gripper Generalization">
    <meta name="author" content="Alina Sarmiento,
				 Anthony Simeonov,
                                Pulkit Agrawal">

    <title>Gripper-Aware Grasping: End-Effector Shape Context for Cross-Gripper Generalization</title>
    <!-- Bootstrap core CSS -->
    <!--link href="bootstrap.min.css" rel="stylesheet"-->
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css"
          integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">

    <!-- Custom styles for this template -->
    <link href="offcanvas.css" rel="stylesheet">
</head>

<body>
<div class="jumbotron jumbotron-fluid">
    <div class="container"></div>
    <h2>Gripper-Aware Grasping</br> End-Effector Shape Context for Cross-Gripper Generalization</h2>
    <h3>IROS IPPC 2023</h3>
    <hr>
    <p class="authors">
        <a href="https://alinasarmiento.github.io/"> Alina Sarmiento</a>,
        <a href="https://anthonysimeonov.github.io/"> Anthony Simeonov</a>,
        <a href="http://people.csail.mit.edu/pulkitag/"> Pulkit Agrawal</a></br>
        Massachusetts Institute of Technology</br>
    </p>
    <div class="btn-group" role="group" aria-label="Top menu">
        <a class="btn btn-primary" href="https://arxiv.org/abs/2307.04751">Paper</a>
        <a class="btn btn-primary" href="https://github.com/anthonysimeonov/rpdiff">Code</a>
    </div>
</div>

<div class="container">
    <div class="section">
        <div class="vcontainer">
            <iframe class='video' src="https://www.youtube.com/embed/x9noTl_aqu0" frameborder="0"
                    allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
                    allowfullscreen></iframe>
        </div>
        <hr>
        <p>
          In cluttered, unmodeled environments, many learned manipulation pipelines rely on some
	  inherent knowledge of robot and end-effector extents to predict or solve for feasible grasp poses
	  and motion plans. However, these models become specific to one robot geometry and cannot effectively
	  generalize to other end-effectors that have different feasible grasp distributions.

	  We present Gripper-Aware GraspNet, a learned pipeline for grasping and manipulating unknown objects in highly
	  occluded environments, conditioned on gripper geometry.

	  This method builds off of prior work in learned 6D grasp generation that was previously limited to
	  specific gripper geometries and can predict grasps that utilize a wide range of gripper extents. We
	  demonstrate results on cluttered tabletop picking from a single view pointcloud, and show results that
	  utilize full gripper extents across different end-effectors in simulation. We also show a qualitative
	  improvement on grasp diversity when using different grippers in the real world.
        </p>
    </div>

    <div class="section">
        <h2>Grasp Generation On Different Grippers</h2>
        <hr>
        <p>
            Grasp generation across various gripper extents.
        </p>

        <h3>Mug/Rack-multi</h3>
        <div class="row justify-content-center">
            <div class="col-md-4">
                <video width="100%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="img/mugrack_miscviz_k5_2_c.mp4" type="video/mp4">
                </video>
            </div> 
            <div class="col-md-4">
                <video width="100%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="img/mugrack_miscviz_k5_3_c.mp4" type="video/mp4">
                </video>
            </div> 
            <div class="col-md-4">
                <video width="100%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="img/mugrack_miscviz_k5_1_c.mp4" type="video/mp4">
                </video>
            </div> 
        </div> 

        <h3>Can/Cabinet</h3>
        <div class="row justify-content-center">
            <div class="col-md-4">
                <video width="100%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="img/cancabinet_miscviz_k5_2_c.mp4" type="video/mp4">
                </video>
            </div> 
            <div class="col-md-4">
                <video width="100%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="img/cancabinet_miscviz_k5_3_c.mp4" type="video/mp4">
                </video>
            </div> 
            <div class="col-md-4">
                <video width="100%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="img/cancabinet_miscviz_k5_1_c.mp4" type="video/mp4">
                </video>
            </div> 
        </div> 
    </div>

    <div class="section">
        <h2>Real-world Multi-modal Rearrangement via Pick-and-Place</h2>
        <hr>
        <p>
            Rearrangement in the real world using the Franka Panda arm. Each task features scene 
            objects that offer multiple placement locations. RPDiff is used to produce a set of 
            candidate placements and one of the predicted solutions is executed. Multiple 
            executions in sequence show the ability to find multiple diverse solutions. Our 
            neural network is trained in simulation and directly deployed in the real world 
            (we do observe some performance gap due to sim2real distribution shift). 
        </p>
        <!-- <h3>Book/Bookshelf</h3> -->
        <div class="row align-items-center">
            <div class="col justify-content-center text-center">
                <h3>Book/Bookshelf</h3>
                <video width="60%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="img/book_bookshelf_seq1_8x_rdc_lout.mp4" type="video/mp4">
                </video>
            </div> 
        </div>

        <!-- <h3>Mug/Rack-multi</h3> -->
        <div class="row align-items-center">
            <div class="col justify-content-center text-center">
                <h3>Mug/Rack-multi</h3>
                <video width="60%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="img/mug_rack_multi_seq1_8x_lout.mp4" type="video/mp4">
                </video>
            </div> 
        </div>

        <!-- <h3>Can/Cabinet</h3> -->
        <div class="row align-items-center">
            <div class="col justify-content-center text-center">
                <h3>Can/Cabinet</h3>
                <video width="60%" playsinline="" autoplay="" loop="" preload="" muted="">
                    <source src="img/can_cabinet_seq1_10x_lout.mp4" type="video/mp4">
                </video>
            </div> 
        </div>
    </div>

    <div class="section">
        <h2>External Related Projects</h2>
        <hr>
        <p>
            Check out other projects related to diffusion models, iterative prediction, and rearrangement<br>
        </p>
        <div class='row vspace-top'>
            <div class="col-sm-3">
                <img src='img/external/structdiff.png' class='img-fluid'>
            </div>

            <div class="col">
                <div class='paper-title'>
                    <a href="https://structdiffusion.github.io/">StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects</a>
                </div>
                <div>
                    Combines a diffusion model and an object-centric transformer to construct structures given 
                    partial-view point clouds and high-level language goals, such as "set the table" and "make a line".
                    Using use one multi-task model, this allows building physically-valid structures without 
                    step-by-step instructions. 
                </div>
            </div>
        </div>
    </div>

    <div class="section">
        <h2>Paper</h2>
        <hr>
        <div>
            <div class="list-group">
                <a href="https://arxiv.org/abs/2307.04751"
                   class="list-group-item">
                    <img src="img/paper-thumb.png" style="width:100%; margin-right:-20px; margin-top:-10px;">
                </a>
            </div>
        </div>
    </div>

    <div class="section">
        <h2>Bibtex</h2>
        <hr>
        <div class="bibtexsection">
            @article{simeonov2023rpdiff,
                author = {Sarmiento, Alina
                            and Simeonov, Anthony
                            and Agrawal, Pulkit},
                title = {Gripper-Aware Grasping: End-Effector Shape Context
                            for Cross-Gripper Generalization},
                journal={arXiv preprint arXiv:2307.04751},
                year={2023}
            }
        </div>
    </div>

    <hr>

    <!-- <footer>
        <h2>Acknowledgements</h2>
        <p>
            We would like to thank NVIDIA Seattle Robotics Lab members and the MIT Improbable AI Lab for their valuable feedback and support in developing this project. 
            In particular, we would like to acknowledge Idan Shenfeld, Anurag Ajay, and Antonia Bronars for helpful suggestions on improving the clarity of the draft. 
            This work was partly supported by Sony Research Awards and Amazon Research Awards. Anthony Simeonov is supported in part by the NSF Graduate Research Fellowship.
        </p>
        <p>Send feedback and questions to <a href="https://anthonysimeonov.github.io">Anthony Simeonov</a></p>
        <div class="row justify-content-center">
        <p>Website template recycled from <a href="https://www.vincentsitzmann.com/siren/">SIREN</a></p>
        </div>
    </footer> -->
</div>


<script src="https://code.jquery.com/jquery-3.5.1.slim.min.js"
        integrity="sha384-DfXdz2htPH0lsSSs5nCTpuj/zy4C+OGpamoFVy38MVBnE+IbbVYUew+OrCXaRkfj"
        crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/popper.js@1.16.0/dist/umd/popper.min.js"
        integrity="sha384-Q6E9RHvbIyZFJoft+2mJbHaEWldlvI9IOYy5n3zV9zzTtmI3UksdQRVvoxMfooAo"
        crossorigin="anonymous"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js"
        integrity="sha384-OgVRvuATP1z7JjHLkuOU7Xw704+h835Lr+6QL9UvYjZE3Ipu6Tp75j7Bh/kR0JKI"
        crossorigin="anonymous"></script>

</body>
</html>