From billions of parameters to millions of documents, here’s how we’re building the future of property auction analytics—powered by HPC, LLMs, and predictive modeling.
Published on: April 24, 2025
Author: Renato Mutavdzic, Jr. AI Engineer
Context: The Real Estate Auction Problem
Real estate auctions are an opaque and fragmented landscape. Legal documents are dense, auction listings vary wildly in format, and market predictions are speculative at best. Investors and institutions often navigate this chaos manually.
BidSale, our AI-native platform at VendueTech, aims to solve this through automated document analysis, real-time price forecasting, and retrieval-augmented recommendation systems—powered by LLMs, deep learning, and massively parallel computing.
HPC: Not Just Nice-to-Have, But Mission-Critical
Training models like LLaMA 2, Falcon, and even domain-specific LLMs for legal/real estate use requires an environment beyond cloud or local resources.
Enter EuroHPC’s MareNostrum AI Cluster, where we will run:
- Up to 16 NVIDIA A100/H100 GPUs per job
- 4 GPUs per node, with each job using up to 320 CPU cores
- Scratch space of 5 TiB, job memory up to 512 GiB
- 100 concurrent jobs, each running ~48 hours, with checkpoints every 6–12 hours
Monthly usage: approximately 2500 node hours
Total dataset for training: approximately 2 TiB
We’ll leverage this to:
- Fine-tune LLMs (10B+ parameters) on multilingual legal corpora
- Train deep predictive models (XGBoost, PyTorch MLPs) on historical auction outcomes
- Run large-scale batch inference via vLLM, averaging over 1,000 docs/sec per GPU
The AI Stack: Modular, Multi-Modal, Multilingual
Natural Language Processing (25%)
- Fine-tuned HuggingFace models using accelerate and TRL
- Focused on clause extraction, auction summarization, entity linking
Predictive Analytics (20%)
- Price trajectory models trained on public/private data
- Forecast winning bid ranges, price volatility, and ROI estimates
Machine Learning (20%)
- XGBoost, ensemble meta-learners for pattern detection in historical bidding data
- Auction clustering via unsupervised learning
Computer Vision (25%)
- PyTorch-based classifiers for image quality scoring and anomaly detection in property visuals
Data Mining & Pattern Recognition (10%)
- Spark jobs for parsing noisy and unstructured HTML/PDF data across multiple auction sites
What Makes BidSale Unique
- Retrieval-Augmented Generation: Query-specific auction recommendations enhanced by FAISS-based similarity retrieval over a custom vector database
- GDPR-Compliant AI: All personal data is anonymized, encrypted, and processed lawfully with user opt-ins
- Cross-Border, Cross-Language: Trained to handle legal documents in Croatian, English, German, and more
- Actionable Outputs: Risk scores, predicted closing prices, and flagged legal anomalies—all within less than 5 seconds
FER TakeLab: Academic Precision at Production Scale
We’re not alone in this journey. Our academic partner TakeLab at the University of Zagreb’s Faculty of Electrical Engineering and Computing (FER) brings decades of research excellence in NLP, knowledge grounding, and language modeling.
Their role:
- Designing task-specific LLM evaluation metrics
- Validating fairness, interpretability, and generalizability
- Benchmarking models with BLEU, ROUGE, and custom legal QA tasks
This industry–academia synergy helps us scale fast without sacrificing rigor.
SRCE and the Role of National Infrastructure
In addition to EuroHPC, BidSale is supported by SRCE (University of Zagreb University Computing Centre), Croatia’s leading institution for national academic HPC infrastructure. SRCE is providing foundational support in terms of environment setup, containerized deployment compatibility, and compliance with national infrastructure requirements.
Moreover, we’ve applied for a CROBOHUB++ Innovation Voucher to further accelerate our work. This co-financing mechanism is intended to support high-potential AI innovations in Croatia and the EU, and we’re currently awaiting final determination.
This effort is also a precursor to our broader vision: joining an EU's AI Factory, a federated, production-grade, domain-specific platform for applied AI solutions in proptech, legal tech, and financial analytics. BidSale is only the beginning.
Performance Engineering & I/O Strategy
To mitigate common AI bottlenecks:
- Checkpointing every 6–12 hours to scratch storage
- PyTorch DataLoaders with multiprocessing to minimize I/O stalls
- I/O offloading to GPFS parallel file systems where possible
- Data serialized as Parquet with Apache Arrow for Spark ingestion
The I/O architecture allows data-hungry training loops to fully utilize compute without latency gaps.
Ethical AI: Fair, Transparent, Sustainable
- Users retain control: AI provides recommendations; users make final decisions
- Algorithmic fairness: Regular audits, demographically balanced training sets
- Environmental impact: Paperless workflows reduce CO₂ and physical resource strain
- Sustainability: Reuse of models across regions and scalable retraining via quantized checkpoints
Our Hypotheses (and Why They Matter)
We believe:
- LLMs can extract legal and financial insights with near-expert accuracy
- Predictive modeling can outperform heuristics in pricing and timing auctions
- AI-based personalization can give smaller investors a competitive edge
- Distributed HPC infrastructure is the only viable path to scalable auction intelligence
If confirmed, this transforms real estate—from bureaucratic chaos to real-time, data-driven decision making.
What’s Next
- Getting accepted into EU HPC program
- Train and deploy BidSale LLM v1.0
- Integrate Spark-optimized legal pipelines into platform
- Benchmark price forecasting models with real-world auction closings
- Release vLLM API for document summarization at scale
- Deliver first wave of insights to regulators, funds, and proptech partners
We’re not just building a product. We’re building the infrastructure for the next decade of real estate.
Read more at https://rnca.fccn.pt/en/marenostrum-5/
If you are an AI Engineer, PhD researcher, Algorithm developer, Quant, Back-end, Front-end engineer, or another field, check out our open positions and apply here.