Unveiling Anomalies in Property Data: A Summer Project with BigML
As data enthusiasts, we at BigML are always looking for new ways to explore and analyze data. Recently, we came across a blog post by Idealista that detailed some analysis of properties in Madrid, Barcelona, and Valencia in Spain, using data from 2018. Intrigued by this data and always up for a challenge, we decided to play around with it on our platform and see what interesting insights we could uncover.
The data provided in the repository included information such as property ID, price, unitary price, number of bedrooms, and more. While the data wasn’t in a standard CSV format, we were able to extract and clean it using R before uploading it onto our platform. Once the data was ready, we created datasets and anomaly detectors for each city to start our analysis.
Using BigML’s platform, we were able to easily create anomaly detectors that assigned anomaly scores to each property. These scores ranged from 0 to 1, with higher scores indicating more unusual properties. We found that the anomalies in each city were unique, with some properties standing out for their luxurious amenities or unique characteristics.
One interesting aspect of our analysis was the distribution of anomalies throughout each city. By computing batch anomaly scores and creating histograms, we could see that anomalies were more common in certain neighborhoods. For example, in Barcelona, anomalies were clustered in the upper side town and along the sea shore, indicating areas with more luxurious properties.
To visualize this distribution, we created a simple app using Streamlit and Mapbox that displayed the anomalies on a map. This allowed us to see at a glance where anomalies were more prevalent in each city and how they were distributed geographically. The app provided a unique way to explore the data and uncover patterns that may not have been immediately obvious.
Overall, this project was a fun and enlightening experience that showcased the power of anomaly detection in uncovering interesting insights from data. By bridging the gap between Machine Learning models and real-world applications, we were able to bring the data to life and gain a deeper understanding of the properties in each city. We hope that this analysis inspires others to explore their own datasets and uncover hidden anomalies that may lead to valuable insights.
If you’re curious to see the live app and explore the anomalies in Madrid, Barcelona, and Valencia, you can check it out here. Happy exploring!