Optimizer
When using an optimizer, you need to configure the name
item to indicate which optimizer to use, and then configure their respective parameters according to different optimizer. The following are the names of each optimizer.
Optimizer | name |
---|---|
SGD | sgd |
FTRL | ftrl |
Adagrad | adagrad |
Adam | adam |
AdamW | adamw |
Lion | lion |
SGD
SGD configure the following parameters:
- : learning rate, default: 1e-3, configure key:
gamma
- : weight decay, default: 0, configure key:
lambda
FTRL
FTRL configure the following parameters:
- : learning rate, default: 5e-3, configure key:
gamma
- \beta$ param, default: 0.0, configure key:
beta
- : L1 regulation, default: 0.0, configure key:
lambda1
- : L2 regulation, default: 0.0, configure key:
lambda2
Adagrad
Adagrad configure the following parameters:
- : learning rate, default: 1e-2, configure key:
gamma
- : weight decay, default: 0.0, configure key:
lambda
- : learning rate decay, default: 0.0, configure key:
eta
- : minimun error term, default: 1e-10, configure key:
epsilon
Adam
Adam configure the following parameters(not support amsgrad):
- : learning rate, default: 1e-3, configure key:
gamma
- : moving averages of gradient coefficient, default: 0.9, configure key:
beta1
- : moving averages of gradient's square coefficient, default: 0.999, configure key:
beta2
- : weight decay rate, default: 0.0, configure key:
lambda
- : minimun error term, default: 1e-8, configure key:
epsilon
AdamW
AdamW configure the following parameters(not support amsgrad):
- : learning rate, default: 1e-3, configure key:
gamma
- : moving averages of gradient coefficient, default: 0.9, configure key:
beta1
- : moving averages of gradient's square coefficient, default: 0.999, configure key:
beta2
- : weight decay rate, default: 1e-2, configure key:
lambda
- : minimun error term, default: 1e-8, configure key:
epsilon
Lion
Lion configure the following parameters:
- : learing rate, default: 3e-4, configure key:
eta
- : moving averages of gradient coefficient, default: 0.9, configure key:
beta1
- : moving averages of gradient's square coefficient, default: 0.99, configure key:
beta2
- : weight decay, default: 0.01, configure key:
lambda
Example
import damo
import numpy as np
# configure learning rate scheduler
schedluer_params = damo.Parameters()
schedluer_params.insert("name": "")
# configure optimizer
optimizer_params = damo.Parameters()
optimizer_params.insert("name": "sgd")
optimizer_params.insert("gamma": 0.001)
optimizer_params.insert("lambda": 0.0)
# no scheduler
opt1 = damo.PyOptimizer(optimizer_params)
# specific scheduler
opt1 = damo.PyOptimizer(optimizer_params, schedluer_params)
w = np.zeros(10, dtype=np.float32)
gs = np.random.random(10).astype(np.float32)
step = 0
opt1.call(w, gs, step)