-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about pretraining and fine tuning #7
Comments
Hi @kobrafarshidi, all the answers would be related to the notebook
Regards, |
Hi Mr Akrash,
With gratitiude |
Hi there,
And, if there is any additional dataset, I have tried to make the function Regards, |
Hi again Mr Akrash, Thank you so much. I get all your guidance and I'll do all of them. With gratitiude |
Hi Mr @uakarsh,
|
Hi there,
Regards, |
Hi Mr Akarsh , The third question, Do you mean that boxes , tokenized_words ,idx are equivalent to masked_boxes , masked_tokenized_words , temp? Gratefully, |
Hi, In the first question, I don't think you need to find anything, you can download the IDL dataset, and then take reference to the code of pretraining for extracting the OCR for the whole dataset and then mask them. Actually, if you are focusing on the pre-training part, I think, matching the fine-tuning and pre-training code would create confusion. In the third question, what I was trying to say, is once you have extracted the features (i.e from the |
Hi, |
Hi, I am not able to open the link. But, I think the essence would be to write a function, which can read the bounding boxes, and words for a given PDF, and then pass it to the Regards, |
Hi Mr @uakarsh |
I think in that case, maybe you have to find out a way to create a CSV file, in which there is an image entry and the corresponding ocr path of that image id. But, I guess this is not the case with your dataset. So, is it possible, if you can read the file (that you mentioned in your previous reply to this thread), and access the |
I really am not sure, about how to go, unless I get a few samples and then proceed. But, what I can understand is, you need to do something so that, the image_ud and its corresponding ocr from the dataset can be extracted. |
Hi, Mr @uakarsh |
Hi, I guess versions won't have a role to play, but here are the following things, that could help While constructing the pytorch lightning trainer object, you can do the following:
and many other tricks. All these you can find on the pytorch lightning trainer page. This was the reason, why I use pytorch lightning Hope, it helps |
Hi, |
Hi Mr @uakarsh |
What was the error when you ran on 2 GPUs? |
Hi, |
I think, maybe this link would be helpful. Link: https://pytorch-lightning.readthedocs.io/en/1.4.0/advanced/multi_gpu.html Maybe the ddp_spawn strategy is not applicable for the resources that you are using |
Hi, Thank you so much for your responce
1,2) I really don't know how to do it. If I'm not mistaken,Tensors in source code are img, questions, answers, tokenizers and I run with follow and I have error I think it's my wrong
|
Hi.
Thanks for great code.
First of all , I am so sorry if my questions are very simple and basic.
In the continue of checking your code I encounter an error and encounter some questions. I'd appreciate it if you could help me with it.
The error is in the following line and I think This error made me unable to see progress train , checkpoint . I'd appreciate it if you could help me very much and give me a guidance.
The text was updated successfully, but these errors were encountered: