Optimization For Neural Networks: Quest For Theoretical Understandings