Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does multimodal-maestro support Distributed Data Parallel (DDP) training? #42

Closed
1 task done
David-19940718 opened this issue Sep 14, 2024 · 4 comments
Closed
1 task done
Labels
question Further information is requested

Comments

@David-19940718
Copy link

Search before asking

  • I have searched the Multimodal Maestro issues and found no similar feature requests.

Question

Description

I'm interested in using maestro for a project that requires distributed training across multiple GPUs. I'd like to know if the project currently supports Distributed Data Parallel (DDP) training, which is a common approach for scaling deep learning models.

Questions

  1. Does maestro currently support DDP training?
  2. If yes, is there any documentation or examples showing how to set up and use DDP with this project?
  3. If no, are there any plans to implement DDP support in the future?

Additional Context

DDP training can significantly speed up the training process for large models or datasets by utilizing multiple GPUs efficiently. It would be a valuable feature for users working with resource-intensive multimodal models.

Environment

  • Operating System: Ubuntu 20.04
  • Python version: 3.10
  • CUDA version (if applicable): 12.1
  • GPU model: NVIDIA RTX 3090

Thank you for your time and consideration!

Additional

No response

@David-19940718 David-19940718 added the question Further information is requested label Sep 14, 2024
@David-19940718
Copy link
Author

David-19940718 commented Sep 14, 2024

I’ve encountered another question. :(

When training Florence2 with the default settings, the process appears to proceed without any issues. However, upon completing the training phase, the output generated_text is unexpected like that: '</s><s>9 of spades</s>'.

Could this be a bug in the framework, or is it possible that the model did not converge properly?

import os
import supervision as sv

from maestro.trainer.common.data_loaders.datasets import JSONLDataset
from maestro.trainer.models.florence_2.checkpoints import load_model

data_location = "datasets/poker cards.v4i.florence2-od"
processor, model = load_model(model_id_or_path="training/florence-2/1/checkpoints/best")

save_location = "training/florence-2/1/results"
os.makedirs(save_location, exist_ok=True)

ds = JSONLDataset(
    jsonl_file_path = f"{data_location}/valid/annotations.jsonl",
    image_directory_path = f"{data_location}/valid/"
)

image, _ = ds[2]
text = ""
task = ""

inputs = processor(text=text, images=image, return_tensors="pt").to("cuda")
generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
print(f'generated_text: {generated_text}')
response = processor.post_process_generation(generated_text, task=task, image_size=image.size)

detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, response, resolution_wh=image.size)

box_annotator = sv.BoxAnnotator(color_lookup=sv.ColorLookup.INDEX)
label_annotator = sv.LabelAnnotator(color_lookup=sv.ColorLookup.INDEX)

image = box_annotator.annotate(image, detections)
image = label_annotator.annotate(image, detections)
image.thumbnail((600, 600))

# Save the annotated image
save_path = os.path.join(save_location, "annotated_image.png")
image.save(save_path)
print(f"Annotated image saved to: {save_path}")

@David-19940718
Copy link
Author

image

@SkalskiP
Copy link
Collaborator

Hi @David-19940718 👋🏻

W chwili obecnej maestro nie wspiera jeszcze DDP ale jak najbardziej mamy plany dodać takie wsparcie i to w niedalekiej przyszłości.

As for the </s><s>9 of spades</s> output format. This is expected. The Florence-2 model adds this on its own. You need to remove those extra tags in postporcessing.

@David-19940718
Copy link
Author

David-19940718 commented Sep 18, 2024

Thank you for your helpful response, @SkalskiP.

Regarding DDP support, I appreciate the update that there are plans to add this capability in the near future. That's great news and will be very useful.

As for the </s><s> tags in the output format, I've actually found that we need to explicitly set certain fields to get the expected results without those extra tags. Specifically, setting the following seems to resolve the issue:

text = "<OD>"
task = "<OD>"

This approach appears to prevent the Florence-2 model from adding those tags on its own. However, I'd be interested to hear if you have any insights on why this works or if there's a more ideal way to handle it.

Thanks again for looking into this and providing such a detailed explanation. Your help is much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants