Embedding

The Embedding module uses Rocksdb to store the values of Embedding, which is KV format. The Key of feature is int64_t type, the value is a list of floating point numbers and some other values.

Key and Group

All features are discretization and represented by the unique int64_t value. We use group to represent the same type of features.Different group can have different optimizer, initializer and dimension.

Value

struct MetaData {
    int group; 
    int64_t key;  
    int64_t update_time;
    int64_t update_num;
    int dim;
    float data[];
};

TTL

For some features that have not been updated for a long time, they can be deleted by setting TTL, which is supported by Rocksdb itself. This action can reduce the size of the model.

Usage

How to Create an Embedding

The arguments are listed below:

storage: damo.PyStorage type
optimizer: damo.PyOptimizer type
initializer: damo.PyInitializer type
dimension: int type, dim of embedding
group: int type, [0, 2^16), defaul: 0

import damo
storage = damo.PyStorage(...)
optimizer = damo.PyOptimizer(...)
initializer = damo.PyInitializer(...)
dimension = 16
group = 1
embedding = damo.PyEmbedding(storage, optimizer, initializer, dimension, group)

Member Functions of Embedding

There are two member functions of embedding, both have no return values, which are listed below:

lookup: pull weight from embedding

The arguments are listed below:

keys: numpy.ndarray type, one dimension, dtype MUST BE np.int64
weights: numpy.ndarray type, one dimension
1. dtype MUST BE np.float32
2. $size == embedding\_dimension * keys.shape[0]$
We will store the result in this space.

import numpy as np

# example
n = 8
keys = np.zeros(n).astype(np.int64)
for i in range(n):
    keys[i] = i+1

# array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int64)

weight = np.zeros(n*dimension).astype(np.float32)

embedding.lookup(keys, weight)

# IT IS Easy To Extract Each Key's Weight
tmp = weight.reshape((n, dimension))
weight_dict = {k: v for k,v in zip(keys, tmp)}

apply_gradients: push gradients to embedding

The arguments are listed below:

keys: same as lookup, numpy.ndarray type, one dimension, dtype MUST BE np.int64
gradients: numpy.ndarray type, one dimension, dtype MUST BE np.float32 type,
1. dtype MUST BE np.float32
2. $size == embedding\_dimension * keys.shape[0]$

import numpy as np

gradients = np.random.random(n*dimension).astype(np.float32)

embedding.apply_gradients(keys, gradients)

Embedding

Key and Group​

Value​

TTL​

Usage​

How to Create an Embedding​

Member Functions of Embedding​

lookup: pull weight from embedding​

apply_gradients: push gradients to embedding​

Key and Group

Value

TTL

Usage

How to Create an Embedding

Member Functions of Embedding

lookup: pull weight from embedding

apply_gradients: push gradients to embedding