[Tutorial]: How to optimize your dream home search with KNIME?

KNIME tutorial

Optimize real estate searches with Knime

 

Home hunting with KNIME: have you ever wondered how to optimize your home hunting? How to survive a bigger apartment hunt in the age of coronavirus? Apartment hunting can be a weary business. Wasting too much time on apartment searchwebsites to find exactly what you want with so many options? Shorten your time by choosing the right and intuitive method thanks to KNIME ANALYTICS PLATFORM. The announcements are coming directly to you!

 

Home hunting with KNIME: Collect website information to optimize your dream home search

 

First, choose a website and search for your apartment as usual to generate a page of results that meet your requirements. Then, copy the page URL.

 

Knime real estate searches

 

Home hunting with KNIME: Identify the information you wish to keep

 

Real estate search criteria

 

I have decided to use the following details: Type of Housing, Number of Bedrooms, Rent, Charges (included or not), Area, Neighborhood, and Surface Area.

To do so, I need to copy the XPath expressions. With Google Chrome, on the search results page, right-click and select "Inspect Element." Then press CTRL+SHIFT+C, and you will see the relevant code on your right, including a box full of information.

 

Real estate website

 

Then, search and find the targeted information by expanding the sub-parties in the identified code. Right-click and select "Copy," then select "Copy full XPath" and paste it into an Excel file. Add a column, the name of the item, and the type of data expected (to keep it simple, I will select "String" as the value type).

And here is what an XPath expression looks like: "/html/body/div[2]/div/div/div[3]/div/div[2]/div[1]/div[2]/div[1]/div[1]/div"

The figure in red represents the first ad in the view. Then, don't forget the following step. An additional step is required in KNIME to vary this figure so you don't have to repeat this step for all the ads. Start again for all the items. Split the column with the XPath into 2. You should get the following:

 

Xpath

Home hunting with KNIME: Create your KNIME workflow

 

KNIME Workflow

 

Now you have all the information you need to start building your workflow!

  1. Create a table with the Table Creator node and insert the URL from the search page results in a URL column.

For the next steps, we will use the nodes from a KNIME extension that you need to install first: Palladian.

  1.  Add the node http Retriever andin the parameters select URL as URL input and apply.
  2. Then add the HTML Parser node and select Result as Input and apply.
  3. Next, create a second path in your workflow, in which you will add a new Table Creator node and simply paste the data from the Excel file, including the XPath (rename the columns!).
  4. Next step: add the Counting Loop Start node and set, in the dialog box, a number of loops (representing the number of announcements, here: 35).

Add a String Manipulation node to concatenate XPath1, the current iteration number, and XPath2 in a new XPath0 column as shown above:

 

Knime

 

  1. Then you need to add another String Manipulation node to replace "/" with "/dns:" (the required condition for using the XPath node in step 9).
  2. Here, add the Table Row To Variable Loop Start node .
  3. Then add the XPath node in order to link the two workflow branches. In the same dialog box, click in Settings and add XPath and apply directly without any modification. Go to the Flow Variables tab and describe the variables Name, XPath0, and Type to newColumn0, xpath0, and returnType0. Apply.
  4. Then, add this node: Loop End (Column Append).
  5. And finally, add the Loop End node .

As you can see from the results, you can notice additional data even though it is not really readable. We need to add a new step to the workflow to clean our data.

Home hunting with KNIME: Manage your data 

 

Home hunting with KNIME: Data cleaning

 

Cleaning Knime workflow data

 

  1. Start by adding a Column Filter node just to keep the following columns: Type, Number of rooms, Rent, Charges, Area, Neighborhood, Surface Square, Bedrooms, and Iteration.
  2. Then, add the Unpivoting node and in the dialog box in Value columns, include Iteration. In Retained columns, exclude Iteration.

Next to the Unpivoting node, build a parallel branch.
a. Here, add a Column Filter node to keep the ColumnValues and Number of Rooms columns.
b. And then, add a Rule Engine node in order to categorize your data in a brand new column as follows:

 

Knime

 

  1. 3. Add a Pivoting node and now in the dialog box, click in the Groups tab so as to put ColumnValues in the Group column(s) list. Then, in the Pivots tab, just keep the prediction column in the Pivot column(s) list. Lastly, in the Manual Aggregation tab, add the Number of rooms with the First aggregation (don't forget to tick Missing) and apply.

Repeat step 3, replacing the Number of Rooms with Surface Area.

  1. Using a Joiner node, create a join between the two branches from step 3. Return to the dialog box, click Joiner Settings, and add the joining key to Row ID. Then, in the Column Selection tab, keep all columns except ColumnValues and apply.
  2. Add the Column Aggregator node for concatenating the columns including the name Others (from the 2 previous branches).

 

Column Aggregator Knime Column Aggregator Knime

 

Repeat the same operation for Surface Area and Number of Rooms.

  1. In the main branch, next to the Unpivoting node (end of step 2), add a Column Filter node to exclude RowIDs and ColumnNames.
  2. Add a Row Filter node to exclude the row containing the type column with a nil value.
  3. Here we are, with our two distinct branches in my workflow that I will connect with the Joiner node and the join key RowID. Keep one version of each column and exclude ColumnValues two occurrences.

 

Home hunting with KNIME: Data extraction

 

Data extraction in Knime

 

  1. Add a Column Resorter node to organize the information.
  2. Add the Excel Writer (XLS) node to generate the final extract.

 

Home hunting with KNIME: The final data 

 

Criteria for optimizing real estate searches

 

This is the end of your home hunting with KNIME!

Even though house hunting has been revolutionized by the internet, you can now focus on your main options and use the workflow we just built! And you will get a file including all the announcements. Enjoy your apartment hunting experience more with KNIME.

This tutorial is also available for any kind of information you wish to extract from the Internet. It's your turn to play, happy hunting!

Want to see KNIME in action? Don't miss our upcoming webinar ▶️ HERE!

 

Knime Home Hunting Demo

Author Profile

Sylvana AH-LAYE
Sylvana AH-LAYE
Hello! Passionate about digital marketing, I work daily with the various departments at Mydral. In my publications, you will find content on the Big Data, BI, and AI sectors. SEO and e-reputation hold no secrets for me!

Write a comment

This site uses Akismet to reduce spam. Learn more about how your comment data is processed.