A memory efficient deep recurrent Q-learning approach for autonomous wildfire surveillance