I have achieved from yesterday my first trial to train one modele :
python object_detection/legacy/train.py --train_dir=CP --pipeline_config_path=faster_rcnn_inception_v2_coco.config
After few times (10 or 20 secondes ) i am no more able to enter something with the mouth or key board. Update of GPU (nvidia-smi) is freeze. After few minutes i did a reset, and verify the content of CP. It is no more empty. What I can see, it is that hard drive is all the time working.
I did the same a second time, but let the process continue till the morning. CP directory has been updated (till model.ckpt-491).
Now few word to describe my configuration : CPU : i5 RAM : 8 giga OS : Ubuntu 18.04 GPU 1 : GT 730 used for visualisation GPU 2 : GTX 1060
ncvv : V9.0 and nvidia-smi give :
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.87 Driver Version: 390.87 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 730 Off | 00000000:01:00.0 N/A | N/A | | N/A 34C P0 N/A / N/A | 703MiB / 2001MiB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 106... Off | 00000000:06:00.0 Off | N/A | | 0% 33C P8 4W / 120W | 2MiB / 6078MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+
Initially i have installed every thing to work only with one GPU (GT 730, as I did not have the second one at this time). Yesterday I received the new video card, and without doing something, it was recognize by nvidia-smi, and it was used directly by Tensorflow. Without any other modification.
Now my questions :
- the fact that i did not install a driver for this new card could be the issue (I did not use it for visualisation) ?
- or some point in the config file (I reduce the maxsize to 600*480) and lower batch_size to 1 could be modified to avoid my issue ?
Thanks you for your help Jean-Marie