-
Notifications
You must be signed in to change notification settings - Fork 87
Training code seems to get stuck even for a very small set of input images #25
Comments
It sounds like the input pipeline might be running forever but not producing any data. Since it's not much data, can you show us exactly what your directory structure and files looks like with "ls -R"? |
@reyet
Best regards, |
Hi @reyet, Hi @PuneetKohli , @olivertai, @lolz0r, |
@indranilsinharoy did you find a solution for this? I am also having the same issue while training the network. |
@bruce-wayne99 Unfortunately, I've not been able to solve it, and I have temporarily moved on to other things. If I had to try again with some different environment, I would try using Ubuntu 18.x instead of Ubuntu 19.04 (not sure if you are using the same OS or not) and also try different CUDA version ... just some thoughts. |
Sorry for not replying sooner. I did take a look at your directory structure, and it looked correct to me so I'm afraid I don't know what is going wrong. |
@reyet No problem at all. Thank you very much. I was guessing something same :-) Once I get back to it I'll try some more things (mostly with the environment I guess). If I do find the problem, I'll surely post it here. |
@indranilsinharoy thanks for the info. The issue was fixed after I changed the Cuda version to 9.0, I was using Cuda version 10.0 before. |
@bruce-wayne99 Thanks very much for posting your solution here. I hope it will help several others if they face similar problems. At least I know that is the first thing I must do! |
@reyet, @bruce-wayne99 Please feel free to close the issue if you see fit. |
Thanks @bruce-wayne99 for figuring that out! |
@reyet @indranilsinharoy, Just went through the code more briefly and I think the version is not an issue, if you look at |
@bruce-wayne99 Thanks very much. I'll check it out. |
Hi @indranilsinharoy, |
Thanks very much for sharing the code.
To do a quick test of the training code, I downloaded a few of the youtube clips from the RealEstate 10K dataset, and placed the extracted frames in
stereo-magnification\images
directory. The corresponding camera files are instereo-magnification\train
directory.However, when I try to execute the
train.py
the program doesn't proceed any further thansession.run()
function (I think). I'm copy-pasting the log below (please note that I've removed some of the warning messages related to some deprecated functions). I don't see any progress following the lineINFO:tensorflow:parameter_count = 16892227
even after waiting for several (over 10) hours. Since I placed just a few (around 25) low-resolution images in theimages
directory, I was expecting the training to finish within a few hours.My system's configuration are provided below:
OS: Ubuntu 19.04
Python: 2.7
Tensorflow version: 1.13.1
GPU information:
It would be great if you could provide some insight for solving this issue.
Thank you very much.
The text was updated successfully, but these errors were encountered: