I had 77 pairs of sequences and sequence responses in Matlab. I had two cell arrays, sequences and responses, of dimension 77×1. Each cell held a 10×10,000 array. I created options, layers, and hyperparamaters, and executed
[net, info] = trainNetwork(sequences, responses, layers, options);
The network trained. Things worked.
I got more data, hundreds of pairs. I could still train the network, but I was rapidily coming to the memory limit on my HPC. I wanted to use datastores.
S(cripts) 1
save(‘sequences.mat’,’sequences’);
save(‘responses.mat’,’responses’);
S2
AData = fileDatastore(‘sequences.mat’,’Readfcn’,’@load’);
BData = fileDatastore(‘responses.mat’,’Readfcn’,’@load’);
CData = combine(AData, BData);
… %stuff
[net, info] = trainNetwork(CData, layers, options);
Error using trainNetwork (line 184)
Invalid training data. Predictors must be a N-by-1 cell array of sequences, where N is the number of sequences. All sequences
must have the same feature dimension and at least one time step.
Error in S1 (line ##)
[net, info] = trainNetwork(CData, layers, options);
Or in English, it did not work.
Using preview, I got this:
ans =
1×2 cell array
{1×1 struct} {1×1 struct}
First, the load function creatues a struct. I needed a de-struct-ing function.
Now, I got this.
preview(CData)
ans =
1×2 cell array
{259×1 cell} {259×1 cell}
Second, the combine function creats another cell array, meaning I had a 1×2 cell array (CData), each cell holding a 200×1 cell array (AData and BData), but those were holding lesser datastores.
The solution to the latter was saving each cell as an individual file with one vairable per file, that variable being the 10×10,000 array, NOT A CELL.
S3
%file manip, mkdir, addpath, etc.
for n=1:length(sequences)
sequence1 = sequences{n,1};
response1 = responses{n,1};
save(strcat(‘sequences’,string(n),’,mat’),’sequence1′);
save(strcat(‘responses’,string(n),’,mat’),’sequence1′);
end
AND THEN running S4
%file manip, preprocessing, etc.
getVarFromStruct = @(strct,varName) strct.(varName);
xds = fileDatastore(“sequences*.mat”,”ReadFcn”,@(fname) getVarFromStruct(load(fname),”sequence1″),”FileExtensions”,”.mat”);
yds = fileDatastore(“responses*.mat”,”ReadFcn”,@(fname) getVarFromStruct(load(fname),”response1″),”FileExtensions”,”.mat”);
%options, layers, hp, etc.
[net, info] = trainNetwork(CData, layers, options);
And it worked.
In the preview of CData up there, it creates 2 cell arrays OF CELLS. TrainNetwork doesn’t want cells; it wants data. So the extra layer of cells caused all those errors.
That took me weeks, and someone else had to explain it.