Learning Nash equilibria in zero-sum stochastic games via entropy-regularized policy approximation