Cuda out of memory问题
Traceback (most recent call last):
File "train.py", line 546, in <module>
main()
File "train.py", line 429, in main
y_pred = model(x, use_gt_durations=True)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 568, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/fastpitch/fastpitch/model.py", line 185, in forward
dec_out, dec_mask = self.decoder(len_regulated, dec_lens)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/fastpitch/fastpitch/transformer.py", line 289, in forward
out = layer(out, mask=mask)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/fastpitch/fastpitch/transformer.py", line 241, in forward
output = self.dec_attn(dec_inp, attn_mask=~mask.squeeze(2))
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/fastpitch/fastpitch/transformer.py", line 133, in forward
return self._forward(inp, attn_mask)
File "/workspace/fastpitch/fastpitch/transformer.py", line 178, in _forward
output = self.layer_norm(residual + attn_out)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/wrap.py", line 57, in wrapper
kwargs)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/utils.py", line 81, in casted_args
new_args.append(cast_fn(x))
File "/opt/conda/lib/python3.6/site-packages/apex/amp/utils.py", line 74, in maybe_float
return x.float()
RuntimeError: CUDA out of memory. Tried to allocate 82.00 MiB (GPU 0; 10.76 GiB total capacity; 8.94 GiB already allocated; 65.44 MiB free; 9.57 GiB reserved in total by PyTorch)
解决方案
由于Batch size设置过大,显存一次性载入不了。故降低BS,问题解决。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。