Skip to main content

Learning Rate Scheduler

When training deep neural networks, it is often useful to reduce learning rate as the training progresses. Learning rate schedules seek to adjust the learning rate during training by reducing the learning rate according to a pre-defined schedule.

When using an scheduler, you need to configure the name item to indicate which scheduler to use, and then configure their respective parameters according to different scheduler. The following are the names of each scheduler. If name is empty, no scheduler is used.

schedulername
Exponential Decayexponential_decay
Polynomial Decaypolynomial_decay
Nature Exponential Decaynature_exponential_decay
Inverse Time Decayinverse_time_decay
Cosine Decaycosine_decay
Liner Cosine Decayliner_cosine_decay

Exponential Decay

learning_ratedecay_rateglobal_stepdecay_stepslearning\_rate * decay\_rate ^{\frac{global\_step}{decay\_steps}}

configure the following parameters:

  1. decay_steps: float type

  2. decay_rate: float type

Polynomial Decay

(learning_rateend_learning_rate)decay_rate1.0min(global_step,decay_steps)decay_steps+end_learning_rate(learning\_rate - end\_learning\_rate)*decay\_rate^{1.0 - \frac{min(global\_step, decay\_steps)}{decay\_steps}} + end\_learning\_rate

configure the following parameters:

  1. decay_steps: float type

  2. decay_rate: float type, default: 1e-3

  3. end_learning_rate: float type, default: 1.0

Nature Exponential Decay

learning_rateedecay_rateglobal_stepdecay_stepslearning\_rate*e^{-decay\_rate *{\frac{global\_step}{decay\_steps}}}

configure the following parameters:

  1. decay_steps: float type

  2. decay_rate: float type

Inverse Time Decay

learning_rate1.0+decay_rateglobal_stepdecay_steps\frac{learning\_rate}{1.0+ decay\_rate *{\frac{global\_step}{decay\_steps}}}

configure the following parameters:

  1. decay_steps: float type

  2. decay_rate: float type

Cosine Decay

learning_rate0.5(1.0+cos(πglobal_stepdecay_steps)learning\_rate * 0.5 *(1.0 + cos(\pi*\frac{global\_step}{decay\_steps})

configure the following parameters:

  1. decay_steps: float type

Liner Cosine Decay

liner_decay=decay_stepsmin(global_step,decay_steps)decay_stepsliner\_decay = \frac{decay\_steps - min(global\_step, decay\_steps)}{decay\_steps}

cos_decay=0.5(1.0+cos(2πnum_periodsmin(global_step,decay_steps)decay_steps)cos\_decay = -0.5 * (1.0 + cos(2\pi*num\_periods*\frac{min(global\_step, decay\_steps)}{decay\_steps})

learning_rate(α+liner_decay)cos_decay+βlearning\_rate * (\alpha + liner\_decay)*cos\_decay+\beta

configure the following parameters:

  1. alpha: α\alpha, float type, default: 0.0

  2. beta: β\beta, float type, default: 1e-3

  3. num_periods: float type, default: 0.5

  4. decay_steps: float type