Internet data extraction based on automatic regular expression inference