Tumbling Robot Optimization Using A Learned Control Policy