Geocoding MLS data
The data provided by REIL does not contain geocoding which means that I will have to do this myself. I am using Perl to download and build the data so I have designed a Perl process to bulk geocode each of the addresses. I designed a caching system so that I only need to do any one lookup one time.
I use the Google and Yahoo geocoders which are free. Yahoo allows something like 5000 per day and Google allows around 15,000. Even before I approached these limits I was blocked because I was sending too many requests per second. To solve this issue I placed a ’sleep(1)’ line before each of the geocoder calls ensuring that I don’t call either geocoder more than once a second. Once I implemented this (and moved to a different IP) this seems to work fine.
My caching system uses a GOOD and BAD cache table. Values found in one or the other are returned instead of doing a Google/Yahoo geocode call. I have about 20,000 addresses in my ‘GOOD’ cache right now. I will create a cron job to expire these after some period of time, perhaps a couple of weeks.