How HouseData calculates property risk scores from official UK government data.
HouseData aggregates open data from official UK government sources and applies machine learning models to infer property-level risk scores. Every score is traceable to its underlying data source, and we publish our methodology so users can understand exactly how results are generated.
We do not estimate, guess, or use proprietary black-box valuations. Every data point on HouseData is sourced from a named, verifiable government or public dataset.
Data sources: Environment Agency flood zones (Flood Map for Planning), historical flood event records, river and sea level monitoring stations, surface water flood risk maps.
Method: An XGBoost model combines property-level features (distance to watercourse, flood zone classification, elevation, historical flood frequency within 1 km) to produce a normalised risk score from 0 to 100. The model is validated against historical flood insurance claims data.
Data sources: Local planning authority feeds (425+ councils), Planning Portal records, planning enforcement notices.
Method: We count and classify planning applications within configurable radii of each property. Applications are weighted by type (major/minor), recency, and outcome (approved/refused/pending). A Random Forest model produces a development pressure score indicating how actively the surrounding area is being developed.
Data sources: Environment Agency pollution inventory, historical landfill sites register, contaminated land registers (where published by local authorities).
Method: Properties are scored based on proximity to known pollution sources, weighted by pollution type (category A, B, C) and operational status (active vs. closed/remediated). Scores are normalised to a 0–100 scale.
Data sources: DLUHC Energy Performance of Buildings Register (domestic and non-domestic).
Method: We analyse time-series EPC data for each property to identify trends: has the rating improved, deteriorated, or remained static? We also flag properties approaching certificate expiry and those where physical characteristics (floor area, heating system, insulation) have changed between assessments — which may indicate undisclosed alterations.
Data sources: EPC time-series data, planning permission records, building control sign-off records, Listed Building registers.
Method: The Consent Gap Score cross-references physical changes detected in EPC data (e.g. a new extension appearing, changed heating type, increased floor area) against planning and building control records. A gap between detected changes and corresponding permissions flags a potential consent issue for conveyancers.
All models are trained on labelled datasets of known outcomes and validated using stratified k-fold cross-validation with held-out test sets. We publish precision/recall metrics for each risk category and retrain models quarterly as new data becomes available.
HouseData exports property data in PDTF (Property Data Trust Framework) v3 format, the UK industry standard for structured property data exchange. Each export includes verified claims with data provenance — documenting exactly which official source each data point came from and when it was retrieved. This enables seamless integration with conveyancing platforms, estate agents, lenders, and surveyors.
See our data sources page for a full list of every data source used, with links to the official provider. For questions about methodology, email hello@housedata.uk.