Learning Rate Scheduler
When training deep neural networks, it is often useful to reduce learning rate as the training progresses. Learning rate schedules seek to adjust the learning rate during training by reducing the learning rate according to a pre-defined schedule.
When using an scheduler, you need to configure the name
item to indicate which scheduler to use, and then configure their respective parameters according to different scheduler. The following are the names of each scheduler. If name is empty, no scheduler is used.
scheduler | name |
---|
Exponential Decay | exponential_decay |
Polynomial Decay | polynomial_decay |
Nature Exponential Decay | nature_exponential_decay |
Inverse Time Decay | inverse_time_decay |
Cosine Decay | cosine_decay |
Liner Cosine Decay | liner_cosine_decay |
Exponential Decay
learning_rate∗decay_ratedecay_stepsglobal_step
configure the following parameters:
decay_steps: float type
decay_rate: float type
Polynomial Decay
(learning_rate−end_learning_rate)∗decay_rate1.0−decay_stepsmin(global_step,decay_steps)+end_learning_rate
configure the following parameters:
decay_steps: float type
decay_rate: float type, default: 1e-3
end_learning_rate: float type, default: 1.0
Nature Exponential Decay
learning_rate∗e−decay_rate∗decay_stepsglobal_step
configure the following parameters:
decay_steps: float type
decay_rate: float type
Inverse Time Decay
1.0+decay_rate∗decay_stepsglobal_steplearning_rate
configure the following parameters:
decay_steps: float type
decay_rate: float type
Cosine Decay
learning_rate∗0.5∗(1.0+cos(π∗decay_stepsglobal_step)
configure the following parameters:
- decay_steps: float type
Liner Cosine Decay
liner_decay=decay_stepsdecay_steps−min(global_step,decay_steps)
cos_decay=−0.5∗(1.0+cos(2π∗num_periods∗decay_stepsmin(global_step,decay_steps))
learning_rate∗(α+liner_decay)∗cos_decay+β
configure the following parameters:
alpha: α, float type, default: 0.0
beta: β, float type, default: 1e-3
num_periods: float type, default: 0.5
decay_steps: float type