It’s hard to surprise anyone with artificial intelligence and machine learning nowadays. Even though it’s a young technology, the models and algorithms are already capable of completing many tasks. They can do anything from highly personalized customer service to sophisticated and
There is plenty of duties in real estate where machine learning algorithms can come in handy. For example, you can use it to estimate how the price of a particular property will change in the future. It will help brokers understand the market and plan their strategy based on real data. And we were lucky to work on such a solution. Here is our experience of building a machine learning model for real estate price prediction.
Let us share some interesting statistics that show what real estate looks like right now so we can see the big picture:
All these stats and trends play a part when you estimate and predict the price of a house. Now, we are going to move on to the machine learning case we had recently where all the stats came in handy.
It's hard to estimate a price of a particular house and predict its behavior in the future. Even if this prediction covers only a short period of time. Too many factors can influence the final numbers. The two most influential of them would be:
We can see from the previous part that both of these can change drastically with time and some measurements can be seasonal. Besides, there are many characteristics of the house itself that also count: the number of bedrooms, the building’s age, the overall condition, the quality of the neighborhood, proximity to shops, schools, and entertainment, and many more.
Also, it’s worth mentioning that professional opinion on a price can have several forms:
Real estate professionals look for comparable homes in the area and define the value of a property based on how those houses behaved on the market. Comparable homes are chosen by size, number of rooms, style, and recent sales price.
A Broker Price Opinion (BPO) is another option for a person to get a professional opinion on the house. It is usually made by a professional broker who is aware of the local market. This way is common for short sales, foreclosure, or providing buyers and sellers with the listing price.
Such an amount of data that a broker should bear in mind while predicting the house price can get extremely voluminous. This task can be tedious even for the most experienced specialists. Besides, there’s still a possibility of a simple human error.
That was the exact reason why we started working on this project a client has come to us with: To automate house price prediction and minimize the influence of possible human error. Our main goal was to make a highly accurate machine learning model that will predict the house price in a month with an accuracy of around 85 to 90 percent.
Now, the main part: What
Gathering data. Our first data source was the client itself. They provided us with several datasets, however, it wasn’t enough for model training. To solve this issue, we started the research on other sources that could provide us with real estate data. We used several available sources of information related to the real estate market in the U.S., as well as data related to the country’s economic conditions, in order to achieve a more representative data set.
Feature engineering. To predict the price, we have chosen the following features:
historical change in real estate price
property location
type of house
neighbors
presence or absence of a pool
other nontraditional variables
Hyperparameter tuning. Hyperparameters are meant to assist the model during the learning process. They are external to the model meaning they are imposed on it and it has no power to change them anyhow. They are also used only during the training itself and do not complement the model itself. This step was taken so we will have something to validate the model's results, control its behavior, and maximize the performance.
Studying variables. During the whole process of model training, we were constantly assessing and reevaluating the influence and relevance of each variable. These processes were meant to increase the accuracy of the final solution.
Iterating. When we finished the first version of the model, we started the process anew to polish the model and make sure its results are as accurate as possible.
Our main tool was XGBoost, an open-source gradient-boosted decision tree library for machine learning. We used it for the Regression model. Other items on the list included:
This toolset has helped us reach the desired accuracy.
Nothing is perfect in this world, and neither was our machine learning development process. For the most part, it was quite predictable and smooth, but at one moment, we faced the challenge of underfitting. Since the initial dataset was not of the best quality and was quite small, it was hard for the algorithm to find hidden trends and count the accurate results.
As we already mentioned, the solution was found in third-party data sources. The information we were able to find in public sources has helped us get back on track and train the model correctly.
The results were even better than we expected. The client expected accuracy of around 85-90% compared to the real prices. We were able to achieve 91%. Yes, not that much of a difference, but taking into account the circumstances, we couldn’t expect a better outcome.
In a nutshell, yes, it was. The model was definitely not perfect, the initial data was not the best, and the underfitting issue played its part, but it’s a good start. We were able to see that machine learning is a viable technology in real estate and that we can easily continue working on more cases. Besides, price prediction is not the only area to work with: It also can include home hunting and