ADAM adjusts the learning rate during training and normally results in faster conversion. SGD has constant pre-set learning rate and usually results in slower conversion. The same model trained with ADAM generally has better performance than the model trained with SGD.
Submitted by S. Jeng