Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RL model for basic SubjuGator tasks #1325

Open
danushsingla opened this issue Feb 9, 2025 · 3 comments
Open

Implement RL model for basic SubjuGator tasks #1325

danushsingla opened this issue Feb 9, 2025 · 3 comments
Assignees
Labels

Comments

@danushsingla
Copy link

What needs to change?

We need to implement a basic RL model algorithm to SubjuGator using the ROS2 simulation so that the robot can automatically learn how to perform tasks.

How would this task be tested?

  1. Download Python dependencies
  2. Run a stable_baselines3 script that connects to the ROS2 simulation
  3. Analyze stats about training/evaluation
@danushsingla
Copy link
Author

I met with Mohana, Will, Daniel, and Keith about our future steps.

I have concluded that we should use stable baselines 3, implementing PPO through that, and that would serve as the way we train and evaluate the model. The next goal is to read through research papers, as outlined in Notion, and come up with what we should feed through the model.

I have outlined the following for Keith to provide. This information is what we, for now, think we should be sending to the PPO algorithm

Possible actions for the robot

This can be anything that the robot does like moving forward, backward, sideways, etc.
If a movement is continuous (like different thrust levels for moving forward) then please state that

States for the robot

I need information about what the robot is measuring
This can be something as simple as measuring speed or measuring the distance from the target
I essentially need to build the entire ROS2 world in a matrix of numbers
For these states, I also need possible ranges for each value. If the robot can have a top speed of 100 mph then state that and its min value (which is negative if it can go backwards)

@danushsingla danushsingla self-assigned this Feb 9, 2025
@willzoo willzoo self-assigned this Feb 9, 2025
@mohana-pamidi mohana-pamidi self-assigned this Feb 9, 2025
@willzoo
Copy link

willzoo commented Feb 10, 2025

Over this week, I met with Danush Mohana and others to discuss the idea of using Reinforcement Learning algorithms to train Subjugator, as an alternative to writing missions manually. As Danush mentioned in his comment, we have decided on PPO as the training algorithm that would be the most optimal for us to use, as opposed to TRPO or GRPO. Although the PPO algorithm is abstracted through stable baselines 3, the best first step for us is to start reading through the research papers on the Notion so that we can get an understanding of how it works, and I have started on the PPO paper this week. Also, this is just my input, but I think it might be best to start with integrating the model with something simple like the ROS2 turtlesim, as a test run, before trying to integrate it with something as complex as SubjuGator.

@mohana-pamidi
Copy link

During this week, I was also able to meet with the team and was introduced to the idea of using a modified version of PPO Reinforcement learning algorithm. To gain background information on the algorithm and reinforcement learning in general, I began by reading the PPO Research paper and found how each of the requirements (Robot states/behaviors) that Danush mentioned fit into and affected the algorithm. Furthermore, I was able to understand the concept of the adaptive KL Penalty coefficient, which is something we talked about implementing with the Subjugator. I also agree with Will about how we can first test this with the TurtleSim first as it will be easier to implement and we can migrate the software to the Sub. Furthermore, I am not sure if we have to build the model completely from scratch, or if there's open-source software we can build from and optimize-but if we are going to try to optimize it in the future somehow, we could benefit from starting to think of areas we can optimize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants