Sometimes your labels might be something like
['text1', 'test2']
for each example. Say, it’s an image dataset and there’s a label for multiple objects existing in an image.
Creating dataset
Tensorflow documentation shows that we’ll need to use tf.train.BytesList()
.
def _bytes_feature(values): values = [v for v in values if v is not None] # i added this return tf.train.Feature(bytes_list=tf.train.BytesList(value=values))
The values
is a list of strings, i.e., something like this.
def item_to_example(item): values = item['image_labels'] # list of strings values [d['label'].encode('utf-8') for d in values] features = {'label': _bytes_feature(values)} example = tf.train.Example(features=tf.train.Features(feature=features)) return example.SerializeToString()
Training
Later, we need a parsing function for each example when constructing tf.Dataset
. Because the number of groundtruth labels for each image varies in our example, we need to use tf.io.VarLenFeature
.
def parse_example(example): parsed = tf.io.parse_single_example( example, {'label': tf.io.VarLenFeature(tf.string))} ) return parsed['label']
Interestingly, this parsing returns not a list of string or whatever, but a sparse tensor (tf.Sparse.SparseTensor
). This should be probably converted into a dense tensor.
dense_labels = tf.sparse.to_dense(sparse_tensor_labels, default_value=b'') # now a tensor of length N and each element is tf.constant() - a tf string.
You might have used a lookup table to convert these strings to one-hot vectors.
# before data loading all_the_labels = ['cat', 'dog', 'car', 'all the possible objects are', 'here'] table = tensorflow.contrib.lookup.index_table_from_tensor( mapping=[tf.constant(s.encode()) for s in all_the_labels], num_oov_buckets=1, default_value=-1 ) depth = len(all_the_labels) + 1 # during data loading.. dense_labels = tf.sparse.to_dense(sparse_tensor_labels, default_value=b'') index = self.table.lookup(dense_label) # if you want to sum up all the one-hot-vectors into one N-hot vector, out = tf.zeros((depth,), dtype=tf.float32) for label in labels: index = table.lookup(label) out += tf.one_hot(index, depth=depth) return tf.clip_by_value(out, 0, 1) # just to be safer
It was pretty hard to Google all of these, so writing up here for me and you 🙂
Reference – this book – page 428 etc.
awesome!
LikeLike