<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Joe Junkin &#187; MLS</title>
	<atom:link href="http://joe.junkin.com/category/mls/feed/" rel="self" type="application/rss+xml" />
	<link>http://joe.junkin.com</link>
	<description>Life as it happens</description>
	<lastBuildDate>Thu, 14 Jan 2010 23:34:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Geocoding solution</title>
		<link>http://joe.junkin.com/2007/12/16/geocoding-solution/</link>
		<comments>http://joe.junkin.com/2007/12/16/geocoding-solution/#comments</comments>
		<pubDate>Mon, 17 Dec 2007 03:55:15 +0000</pubDate>
		<dc:creator>jjunkin</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Geocode]]></category>
		<category><![CDATA[MLS]]></category>
		<category><![CDATA[Real Estate]]></category>

		<guid isPermaLink="false">http://joe.junkin.com/2007/12/16/geocoding-solution/</guid>
		<description><![CDATA[My Geocoding experience with Google and Yahoo became increasingly difficult and lengthy until I finally went looking for another solution. I found that solution with the Perl Module Geo::Coder::US. My final solution utilized Geo::Coder::US for most of the geocoding with Google and Yahoo helping on more difficult ones. The processing time for batch processing 20,000 real estate [...]]]></description>
			<content:encoded><![CDATA[<p>My Geocoding experience with Google and Yahoo became increasingly difficult and lengthy until I finally went looking for another solution. I found that solution with the Perl Module Geo::Coder::US. My final solution utilized Geo::Coder::US for most of the geocoding with Google and Yahoo helping on more difficult ones. The processing time for batch processing 20,000 real estate listings went from around 8 hours in the beginning down to about 1 minute in the end.</p>
<p><span id="more-34"></span>Batch geocoding 20,000+ real estate listings using the Google and Yahoo geocoders ended up taking about 8 hours to complete. At various points the process was cut off after to many requests, even after I placed a sleep(1) call before each transaction. I became very discouraged at the performance and went looking for another solution.</p>
<p>I investigated the Perl Module Geo::Coder::US and found it was extremely easy to setup and configure. The lookup process is nice because the module does some massaging of the input to standardize stuff like &#8216;av&#8217; and &#8216;ave&#8217;. It takes the input as a single string like google and yahoo allow. The lookups are extremely fast. Without enabling the cache, Geo::Coder::US handled 80% of the lookups, google 13% and yahoo 7%. This process took about 18 minutes.</p>
<p>When the cache was enabled the process used the database for the majority of the lookups and the process took 1-3 minutes to complete.</p>
]]></content:encoded>
			<wfw:commentRss>http://joe.junkin.com/2007/12/16/geocoding-solution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Geocoding MLS data</title>
		<link>http://joe.junkin.com/2007/12/11/geocoding-mls-data/</link>
		<comments>http://joe.junkin.com/2007/12/11/geocoding-mls-data/#comments</comments>
		<pubDate>Tue, 11 Dec 2007 21:43:18 +0000</pubDate>
		<dc:creator>jjunkin</dc:creator>
				<category><![CDATA[GIS]]></category>
		<category><![CDATA[Geocode]]></category>
		<category><![CDATA[MLS]]></category>
		<category><![CDATA[Real Estate]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[reil]]></category>
		<category><![CDATA[yahoo]]></category>

		<guid isPermaLink="false">http://joe.junkin.com/2007/12/11/geocoding-mls-data/</guid>
		<description><![CDATA[The data provided by REIL does not contain geocoding which means that I will have to do this myself. I am using Perl to download and build the data so I have designed a Perl process to bulk geocode each of the addresses. I designed a caching system so that I only need to do any one [...]]]></description>
			<content:encoded><![CDATA[<p>The data provided by REIL does not contain geocoding which means that I will have to do this myself. I am using Perl to download and build the data so I have designed a Perl process to bulk geocode each of the addresses. I designed a caching system so that I only need to do any one lookup one time.</p>
<p><span id="more-33"></span>I use the Google and Yahoo geocoders which are free. Yahoo allows something like 5000 per day and Google allows around 15,000. Even before I approached these limits I was blocked because I was sending too many requests per second. To solve this issue I placed a &#8217;sleep(1)&#8217; line before each of the geocoder calls ensuring that I don&#8217;t call either geocoder more than once a second. Once I implemented this (and moved to a different IP) this seems to work fine.</p>
<p>My caching system uses a GOOD and BAD cache table. Values found in one or the other are returned instead of doing a Google/Yahoo geocode call. I have about 20,000 addresses in my &#8216;GOOD&#8217; cache right now. I will create a cron job to expire these after some period of time, perhaps a couple of weeks.</p>
]]></content:encoded>
			<wfw:commentRss>http://joe.junkin.com/2007/12/11/geocoding-mls-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Download continued</title>
		<link>http://joe.junkin.com/2007/11/22/download-continued/</link>
		<comments>http://joe.junkin.com/2007/11/22/download-continued/#comments</comments>
		<pubDate>Thu, 22 Nov 2007 21:30:55 +0000</pubDate>
		<dc:creator>jjunkin</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MLS]]></category>
		<category><![CDATA[Real Estate]]></category>

		<guid isPermaLink="false">http://joe.junkin.com/2007/12/11/download-continued/</guid>
		<description><![CDATA[I have been doing continued refinement on the reil rets download.  I decided to structure the download so that each &#8216;Class&#8217; (residential, condo,etc) is downloaded into a separate table -as is. After the download is complete I run updates every 15 minutes. At first I attempted to place data into tables that would be accessed [...]]]></description>
			<content:encoded><![CDATA[<p>I have been doing continued refinement on the reil rets download.  I decided to structure the download so that each &#8216;Class&#8217; (residential, condo,etc) is downloaded into a separate table -as is. After the download is complete I run updates every 15 minutes. At first I attempted to place data into tables that would be accessed directly by the application, but I decided to change that process and create a final build table.</p>
<p><span id="more-32"></span>I added a new step in the process, I create a &#8216;combo&#8217; table that combines all of the pertinent data from each of the classes/tables into one combined table. The reason for this is so a user can choose to search over any combination of the property types (i.e. residential and condos). Another benefit to this is that my combo table can be de-normalized (containing redundant lookup data) and contains only the data I need. There are a lot of extra empty columns in the dataset.</p>
]]></content:encoded>
			<wfw:commentRss>http://joe.junkin.com/2007/11/22/download-continued/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Download REIL Property data from scratch</title>
		<link>http://joe.junkin.com/2007/10/05/download-property-data/</link>
		<comments>http://joe.junkin.com/2007/10/05/download-property-data/#comments</comments>
		<pubDate>Sat, 06 Oct 2007 02:48:19 +0000</pubDate>
		<dc:creator>jjunkin</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MLS]]></category>
		<category><![CDATA[Real Estate]]></category>

		<guid isPermaLink="false">http://joe.junkin.com/2007/10/05/download-property-data/</guid>
		<description><![CDATA[My first activity was to download the property data. REIL has a bunch of property &#8216;classes&#8217; such as RES (residential) and CON (Condominium). The classes will need to go into separate tables. So I designed my download process to take in a Resource (Property) and a Class (RES, CON, etc). Designing this way meant I [...]]]></description>
			<content:encoded><![CDATA[<p><span style="font-family: Georgia">My first activity was to download the property data. REIL has a bunch of property &#8216;classes&#8217; such as RES (residential) and CON (Condominium). The classes will need to go into separate tables. So I designed my download process to take in a Resource (Property) and a Class (RES, CON, etc). Designing this way meant I could use the same routines for all classes of data. </span></p>
<p><span style="font-family: Georgia"></span><span style="font-family: Georgia"><span id="more-31"></span>The first step is to create the data table. At first I tried this manually, but later realized that I could just download the metadata and use this to create the table. So my first task was actually to download the metadata (for each class) and put it into a <span>Â </span>metadata table. A fringe benefit of this is that I now have information about each column of data such as datatype, long description, short description, length, etc. I will use the metadata to build the table now, but will also use it later for search forms and other system stuff. </span></p>
<p><span style="font-family: Georgia"></span><span style="font-family: Georgia">After downloading the metadata for a particular class, I iterate through each record and build a CREATE TABLE statement. An obvious benefit here is that any column additions/modifications on REILâ€™s end will be incorporated each time the dataset is rebuilt. At the end of this process I have a fresh new up-to-date table ready to load with data.</span></p>
<p><span style="font-family: Georgia"></span><span style="font-family: Georgia">There is a limit to the amount of data you can download in one batch. This limit becomes more constrained as you download more columns, and there are a lot of columns (177). You can use pagination, but the documentation mentions that modifications to the data between the time that the download begins and ends can cause issues. What happens if someone deletes or removes a property? This could screw up the pagination and something could be skipped. </span></p>
<p><span style="font-family: Georgia"></span><span style="font-family: Georgia">I chose to download all of the propertyIDâ€™s at once, which the system allows since it is only one field. I load these into an array and proceed to break up the fetches of complete property data into chunks of 1000 or so. This allows the download of all of the data using an initial snapshot.</span></p>
<p><span style="font-family: Georgia"></span><span style="font-family: Georgia">I download the property data in COMPACT mode and load it into a TEMP table. This allows me to build the complete dataset and verify it before (quickly) moving it into production. <span>Â </span>When the load is complete and the data count is verified, a RENAME TABLE moves the data into production very quickly.</span></p>
<p><span style="font-family: Georgia"></span><span style="font-family: Georgia">The photos are next. I use a similar process for downloading the photos into a TEMP area. I download photo file names and other metadata into a Resource-Class specific table, and place the actual files in a hashed directory structure for quick lookup. The photo downloads are by far the longest part of the download process. I download the files to a TEMP directory and move the directory to production after validation.</span></p>
<p><span style="font-family: Georgia"></span><span style="font-family: Georgia">I found that some files error out with a â€œNo Object Foundâ€ error message. Investigating further I found that sometimes these files become available later. So, I search for the error message and write a record to a â€˜photo errorâ€™ table which includes the propertyID and attempt count. Later I run a cron job to attempt to re-download these failed files.</span><font face="Arial">Â </font></p>
]]></content:encoded>
			<wfw:commentRss>http://joe.junkin.com/2007/10/05/download-property-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Downloading REIL Data myself!</title>
		<link>http://joe.junkin.com/2007/08/31/downloading-reil-data-from-scratch/</link>
		<comments>http://joe.junkin.com/2007/08/31/downloading-reil-data-from-scratch/#comments</comments>
		<pubDate>Sat, 01 Sep 2007 02:43:08 +0000</pubDate>
		<dc:creator>jjunkin</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MLS]]></category>
		<category><![CDATA[Real Estate]]></category>

		<guid isPermaLink="false">http://joe.junkin.com/2007/11/08/downloading-reil-data-from-scratch/</guid>
		<description><![CDATA[So, this is where I have ended up, writing the REIL data download from scratch. I am using Perl and imagine it would be difficult using something else. I want to be able to have one set of routines to download the data in whatever format I choose, and transfer the data to a MySQL [...]]]></description>
			<content:encoded><![CDATA[<p>So, this is where I have ended up, writing the REIL data download from scratch. I am using Perl and imagine it would be difficult using something else. I want to be able to have one set of routines to download the data in whatever format I choose, and transfer the data to a MySQL database. Next I will need to download the images. I will need the data for all of the REIL property classes. Finally, I will need to have a routine that updates the data.</p>
]]></content:encoded>
			<wfw:commentRss>http://joe.junkin.com/2007/08/31/downloading-reil-data-from-scratch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using VieleRETS to download data</title>
		<link>http://joe.junkin.com/2007/08/10/using-vielerets-to-download-data/</link>
		<comments>http://joe.junkin.com/2007/08/10/using-vielerets-to-download-data/#comments</comments>
		<pubDate>Sat, 11 Aug 2007 02:14:24 +0000</pubDate>
		<dc:creator>jjunkin</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MLS]]></category>
		<category><![CDATA[Real Estate]]></category>

		<guid isPermaLink="false">http://joe.junkin.com/2007/08/10/using-vielerets-to-download-data/</guid>
		<description><![CDATA[I spent a long time messing with VieleRETS to try and download data from REIL. The short answer is that I failed to get it to work. I tried to debug the PHP code but was unable to identify why it was failing. Had I needed to get this to work I would have joined [...]]]></description>
			<content:encoded><![CDATA[<p>I spent a long time messing with VieleRETS to try and download data from REIL. The short answer is that I failed to get it to work. I tried to debug the PHP code but was unable to identify why it was failing. Had I needed to get this to work I would have joined the mail-list group and tried to get some help from there, but I never got that far.</p>
<p><span id="more-28"></span>This is a very complete package that should have done exactly what I needed. It allows you to enter your login credentials and then connect to the RETS compatible MLS, download metadata, download dataÂ and keep it updated.</p>
<p>Unfortunately, I was unable to get this to work.Â The system is notÂ very well documented at this point but certainly tries to do a lot. ItÂ did connect to REIL and downloaded some metadata.Â </p>
<p>My best guess is that it was trying to download property information without qualifying the &#8216;listing_status&#8217; parameter, and I could find no place where I could specify this information. REIL requires the listing status (or at least some constraint). The data queries failed when I tried to run the update saying there was no data.</p>
<p>I expect this package to get better quickly as it seems to have regular updates and is actively developed.</p>
]]></content:encoded>
			<wfw:commentRss>http://joe.junkin.com/2007/08/10/using-vielerets-to-download-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Login to reil.com</title>
		<link>http://joe.junkin.com/2007/06/11/login-to-reilcom/</link>
		<comments>http://joe.junkin.com/2007/06/11/login-to-reilcom/#comments</comments>
		<pubDate>Mon, 11 Jun 2007 20:51:20 +0000</pubDate>
		<dc:creator>jjunkin</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[GIS]]></category>
		<category><![CDATA[MLS]]></category>
		<category><![CDATA[Mapping]]></category>
		<category><![CDATA[Real Estate]]></category>

		<guid isPermaLink="false">http://joe.junkin.com/2007/06/11/login-to-reilcom/</guid>
		<description><![CDATA[REIL.com uses digest authentication. I am using perl to pull down the data so first I need to login to the reil.com server with proper authorization. I was unable to find a lot of documentation on how this process works. The usual method for exposing web services is via SOAP, but this service uses HTTP [...]]]></description>
			<content:encoded><![CDATA[<p>REIL.com uses digest authentication. I am using perl to pull down the data so first I need to login to the reil.com server with proper authorization. I was unable to find a lot of documentation on how this process works. The usual method for exposing web services is via SOAP, but this service uses HTTP Digest Authentication. I have not used it much but it appears that the main benefit is not sending the password in clear text over a non-ssl connection. It is not as secure as SSL by any means but is better than a clear text transmission of the password.<br />
<span id="more-26"></span><br />
The process appeared daunting at first, due to the lack of examples and documentation. The process was clear, attempt login to the server, the server rejects the login and requires login and finally the user sends validation. There are fields in the response header sent by the server that are used in the last step authentication.</p>
<p>What was not clear to me is how LWP::UserAgent to perform this feat. I could see that the mechanism was in LWP::Authen::Digest &#8211; authenticate, but i wasn&#8217;t sure how to access it. The docs mentioned overriding LWP:UserAgent&#8217;s get_basic_credentials sub to supply a password.</p>
<p><code><br />
#!/usr/bin/perl<br />
package RequestAgent;<br />
@ISA = qw(LWP::UserAgent);<br />
use strict;<br />
use LWP::UserAgent;<br />
sub new {<br />
my $self = LWP::UserAgent::new(@_);<br />
$self-&gt;agent("lwp-request/$main::VERSION");<br />
$self;<br />
}<br />
sub get_basic_credentials{<br />
my($self, $realm, $uri) = @_;<br />
return('myUsername','myPassword');<br />
}</code><code>So once this module was created, I just used the following code and everything worked as it was supposed to.</p>
<p></code></p>
]]></content:encoded>
			<wfw:commentRss>http://joe.junkin.com/2007/06/11/login-to-reilcom/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
