Embedding
The Embedding module uses Rocksdb to store the values of Embedding, which is KV format. The Key of feature is int64_t type, the value is a list of floating point numbers and some other values.
Key and Group
All features are discretization and represented by the unique int64_t value. We use group to represent the same type of features.Different group can have different optimizer, initializer and dimension.
Value
struct MetaData {
int group;
int64_t key;
int64_t update_time;
int64_t update_num;
int dim;
float data[];
};
TTL
For some features that have not been updated for a long time, they can be deleted by setting TTL, which is supported by Rocksdb itself. This action can reduce the size of the model.
Usage
How to Create an Embedding
The arguments are listed below:
storage: damo.PyStorage type
optimizer: damo.PyOptimizer type
initializer: damo.PyInitializer type
dimension: int type, dim of embedding
group: int type, [0, 2^16), defaul: 0
import damo
storage = damo.PyStorage(...)
optimizer = damo.PyOptimizer(...)
initializer = damo.PyInitializer(...)
dimension = 16
group = 1
embedding = damo.PyEmbedding(storage, optimizer, initializer, dimension, group)
Member Functions of Embedding
There are two member functions of embedding, both have no return values, which are listed below:
lookup: pull weight from embedding
The arguments are listed below:
keys: numpy.ndarray type, one dimension, dtype MUST BE np.int64
weights: numpy.ndarray type, one dimension
dtype MUST BE np.float32
We will store the result in this space.
import numpy as np
# example
n = 8
keys = np.zeros(n).astype(np.int64)
for i in range(n):
keys[i] = i+1
# array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int64)
weight = np.zeros(n*dimension).astype(np.float32)
embedding.lookup(keys, weight)
# IT IS Easy To Extract Each Key's Weight
tmp = weight.reshape((n, dimension))
weight_dict = {k: v for k,v in zip(keys, tmp)}
apply_gradients: push gradients to embedding
The arguments are listed below:
keys: same as lookup, numpy.ndarray type, one dimension, dtype MUST BE np.int64
gradients: numpy.ndarray type, one dimension, dtype MUST BE np.float32 type,
dtype MUST BE np.float32
import numpy as np
gradients = np.random.random(n*dimension).astype(np.float32)
embedding.apply_gradients(keys, gradients)