Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geometric transformations to pixel location incorrect #8704

Open
Trans-cending opened this issue Feb 23, 2025 · 0 comments
Open

Geometric transformations to pixel location incorrect #8704

Trans-cending opened this issue Feb 23, 2025 · 0 comments

Comments

@Trans-cending
Copy link

Trans-cending commented Feb 23, 2025

Hello, thanks for Carla 0.9.15 open source project. I tried to create a customed dataset by Carla, and I need to transfrom vehicle location in Carla world corrdinates to pixel location in RGB Camera for object detection. I set my Camera fov to 110 degrees. I refer the Geometric transformations and issue #56, but the result is incorrect for me. The part of the code is as follows.

1、I defined a new type of sensor called 'VehicleLocation' to compute the transformed vehicle location every several times. And I define the intrinsic transformation matrix self.camera_intrinsic and world to camera transformation matrix self.camera_transform_matrix.

   elif sensor_type == "VehicleLocation":
            self.bs_camera = bs_camera
            # self.vehicle = attached  # Store the vehicle reference
            self.candidates_list = candidates_list
            camera_attributes = self.bs_camera.sensor.attributes
            self.image_width = float(camera_attributes['image_size_x'])    # 1720
            self.image_height = float(camera_attributes['image_size_y'])    # 1440
            fov = float(camera_attributes['fov'])     # fov=110 degrees
            focal_length = self.image_width / (2.0 * np.tan(fov * np.pi / 360.0))
            focal_length_1 = self.image_height / (2.0 * np.tan(90 * np.pi / 360.0))
            self.camera_intrinsic = np.asarray([
                [focal_length, 0, self.image_width / 2],
                [0, focal_length_1, self.image_height / 2],
                [0, 0, 1]
            ])
            self.camera_transform_matrix = np.asarray(self.bs_camera.sensor.get_transform().get_inverse_matrix())
            self.world.on_tick(lambda timestamp: self.save_vehicle_location(timestamp, if_save=True))
            return attached

2、implementation of self.save_vehicle_location. object_camera_location is the transformed world to camera location. location_cam is the estimated pixel location.

    def save_vehicle_location(self, timestamp, if_save=False):
        if self.vehicle_destroyed:  # check vehicle if destroyed
            return

        self.tics_processing += 1
        if if_save and (self.tics_processing % self.save_period_vehicle_location == 0) and (
                self.tics_processing > self.wait_time):
            location_path = os.path.join(self.vehicle_location_path,
                                         f"location_{self.tics_processing // self.save_period_vehicle_location}.json")

            snapshot = self.world.get_snapshot()
            simulation_time = snapshot.timestamp.elapsed_seconds  
            simulation_time_str = str(datetime.timedelta(seconds=simulation_time))  
            info_len = len(self.candidates_list) 

            location_world = np.zeros([info_len, 3])
            velocity_world = np.zeros([info_len, 3])
            location_cam = np.zeros([info_len, 2])
            velocity_cam = np.zeros([info_len, 2])
            label = np.zeros([info_len])

            for idx, candidate in enumerate(self.candidates_list):
                # There is only one vehicle in the list under the test scenario
                if idx == 0:
                    label[idx] = 1
                location = candidate.get_location()
                velocity = candidate.get_velocity()
                location_world[idx, 0:3] = [location.x, -location.y, location.z]
                velocity_world[idx, 0:3] = [velocity.x, -velocity.y, velocity.z]


                object_camera_location = np.dot(self.camera_transform_matrix, np.asarray(
                    [location.x, location.y, location.z, 1]))

                # New we must change from UE4's coordinate system to an "standard"
                # (x, y ,z) -> (y, -z, x)
                # and we remove the fourth componebonent also
                point_camera = np.asarray([object_camera_location[1],
                                           -object_camera_location[2], object_camera_location[0]])

                pixel_location = np.dot(self.camera_intrinsic, point_camera)

    
                pixel_location[0] = pixel_location[0] / pixel_location[2]
                pixel_location[1] = pixel_location[1] / pixel_location[2]
                location_cam[idx, 0:2] = pixel_location[0:2]

                object_camera_velocity = np.dot(self.camera_transform_matrix, np.array(
                    (velocity.x, velocity.y, velocity.z, 1)))

                # New we must change from UE4's coordinate system to an "standard"
                # (x, y ,z) -> (y, -z, x)
                # and we remove the fourth componebonent also
                velocity_camera = np.asarray([object_camera_velocity[1],
                                           -object_camera_velocity[2], object_camera_velocity[0]])

                pixel_velocity = np.dot(self.camera_intrinsic, velocity_camera)

                pixel_velocity /= pixel_velocity[2]
                velocity_cam[idx, 0:2] = pixel_velocity[0:2]

            location_data = {
                "Simulation_Time": simulation_time_str,
                "Location_Sionna": location_world,   
                "Velocity_Sionna": velocity_world,
                "Location_Camera": location_cam,
                "Velocity_Camera": velocity_cam
            }

            print('location_data', location_data)

            self.location_data = location_data

3、One captured image is shown as follows. The print output is

location_data {'Simulation_Time': '3:17:32.315208', 'Location_Sionna': array([[234.48342896,  35.34738159,  -0.96628958]]), 'Velocity_Sionna': array([[-2.83438492, -0.24608247, -0.03728713]]), 'Location_Camera': array([[769.22414981, 702.88465035]]), 'Velocity_Camera': array([[807.71861529, 690.79130216]])}

Image

It can be seen that the true pixel location is about [1541, 780] instead of [769.22, 702.89]. So how to compute the correct pixel location. Is there any mistake in my code. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant