Any iterator supporting multi variable-length outputs in Pytorch? #2007
Comments
|
Hi, |
|
Hi @JanuszL, thank you for your reply. When I run the code, I got this error:
In a batch the data shape is like np.array([[80 * 218], [80 * 156], [80 * 131], [80 * 109]]). It seems the iterator doesn't support variable-length data. I tried to use
|
|
That is true, DALI doesn't support variable-length data in PyTorch. However, PaddlPaddle and MXNet (Gluon) have such support. TensorFlow supports this only for CPU. |
|
Another question is
What's the reason for this error? @JanuszL |
|
It seems that you are trying to pad on the datatype that is not supported by the pad operator. |
|
@Approximetal Most of our operators don't support |
|
Hi @jantonguirao, I checked the data, it is float32, the format is like this The Documentation said the parameter is |
|
@Approximetal The error message you shared with us before points to the fact that the input of Pad operator was in float64 format, which is also the default in numpy:
vs
The other TypeError message is probably not coming from Pad operator but from somewhere else. If you are able to share a reproducible code sample (with some sample data) we could analyze and find what's the problem. |
Do you refer to:
error? |
Not this one, I mean, I don't know why the pad function cannot use, so I checked the documentation, and it said the data format should be a TensorList, So I use torch.from_numpy to transfer the data then got |
|
I don't get it, torch.from_numpy creates a Torch tensor and DALI cannot handle it as an input, maybe you want to use .numpy method? |
Just want to confirm the |
|
Sure, it can be either a list of batch size numpy arrays, where each array corresponds to one tensor, or one nympy array where outermost dimension corresponds to the batch size. |
@jantonguirao I found the dtype of the id info is int64, I changed it to float32, and it works. Thanks~ |
|
Hi, I met a new error The code is like: What does this error mean? How can I change the data format? |
|
It looks that your NumPy array has strides and the memory is not contiguous. You can try to copy the array and see if the strides have been changes, see more info in for example this place. |
I use |

Hi, I met issues when using DALI to replace Pytorch dataloader. My task is to transfer mel spectrograms with several labels into training model. I firstly hope to transfer a python class with labels (including IDs and strings) and GPU tensors into pipeline, but here said
ExternalSource accepts input only on the CPU (via numpy array)so I changed the data format into several numpy arrays. Then I found the iterator (DALIGenericIteratorandDALIClassificationIterator) support only one or two outputs in single pipeline, but the batch in my model contains at least 6 (mel_inputs, input_lengths, mel_target, output_lengths, speaker_id, gate_padded). And each mel spectrogram has a different length.I would like to ask how can I transfer these data into a GPU based training model? Do I need to write a custom function to support this? If yes, may I have some guidance?
Here is my code:
The text was updated successfully, but these errors were encountered: