Hi,
I have a training dataset which is 67% of all my data. Then an evaluation dataset which is 33%.
They’ve been randomly shuffled. Somehow, there are some values in the evaluation dataset which didn’t appear in training. This is causing the following bug:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[4] = 54 is not in [0, 53)
Which, after some googling, is because not all the vocabulary values were found in the training dataset. I want to just extend the vocab size but I’m unsure how to do it.
The relevant lines of code would be these ones I think:
for feature_name in CATEGORICAL_COLUMNS:
vocabulary = dftrain[feature_name].unique() # gets a list of all unique values from given feature column
feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
Which is a list, and not a scalar length. So I can’t simply add to it.
Any ideas or more information required?
submitted by /u/Cwlrs
[visit reddit] [comments]