hello world!
Published: February 9, 2026

From Raw Public Data to Production-Grade AI Pipelines: Inside VendueTech’s R&D Collaboration with CROBOHUB++ and FER TakeLab

At VendueTech, our mission is to turn complex, fragmented public auction information into reliable, machine-readable intelligence that can be used safely, consistently, and at scale across Europe.

Public auction datasets are inherently challenging:
they are heterogeneous, multilingual, structurally inconsistent, and evolve continuously over time. Making this data usable for analytics, decision-making, and downstream AI systems requires more than off-the-shelf tooling—it requires deep technical R&D, careful validation, and production-grade engineering.

To tackle this problem systematically, VendueTech partnered with EDIH CROBOHUB++ and the FER TakeLab at the FER, using the Test-Before-Invest (TBI) framework.

This collaboration allowed us to move from research hypotheses to deployed, production-ready AI infrastructure—without premature scaling risk.


Why Test-Before-Invest Is Critical for Serious AI Infrastructure

The Test-Before-Invest model enables technology companies to:

  • Experiment on real, production-like data
  • Validate architectural decisions under expert supervision
  • Stress-test models and pipelines before committing long-term resources
  • Reduce technical, operational, and financial risk

For VendueTech, TBI provided the structure needed to validate one of the hardest problems in applied AI today:

Consistent, schema-faithful extraction of structured information from complex, semi-structured public data—across jurisdictions, languages, and formats.


Phase 1: Understanding the Data Reality

The first phase focused on data characteristics and constraints rather than tooling.

Together with FER researchers, we analyzed:

  • Structural variability of auction datasets across sources and regions
  • Common failure modes of traditional extraction approaches
  • Requirements for repeatability, auditability, and downstream compatibility
  • Alignment with legal, ethical, and data-governance standards

Key insight
Any solution had to preserve semantic structure (key–value relationships, nested attributes, temporal fields) exactly as presented, without heuristic loss or uncontrolled normalization.

This immediately ruled out generic text-conversion pipelines.

Venduetech - TBI report - TBI1 …


Phase 2: Why Generic Extractors Fail—and Domain-Tuned LLMs Succeed

The second phase evaluated whether modern AI models could reliably transform complex inputs into strictly structured outputs suitable for automated processing.

We tested multiple approaches to structured extraction and found consistent limitations:

  • Loss of field boundaries
  • Inconsistent formatting across runs
  • Poor handling of nested or contextual attributes
  • Inability to enforce strict output schemas

The Turning Point: Instruction-Tuned LLMs


Example input (just the beginning of the document):

In collaboration with TakeLab, we designed and evaluated instruction-tuned large language models trained specifically to map domain-specific inputs into deterministic, key-value outputs.

Two strategies were compared:

  • Prompt-based inference on cleaned representations
  • Instruction-tuning compact LLMs using QLoRA

Outcome
Instruction-tuned models significantly outperformed prompt-only approaches, achieving near-deterministic output fidelity under exact-match evaluation—an essential property for production systems.

This phase validated a core architectural principle of VendueTech:

If output consistency matters, domain-specific fine-tuning beats generic prompting—every time.

Venduetech - TBI report - TBI2 …


Phase 3: Turning Research into Production Infrastructure

High extraction accuracy alone is insufficient without operational robustness.

The final TBI phase focused on deploying the solution in realistic conditions, emphasizing:

  • Model efficiency and resource optimization
  • Deterministic JSON outputs enforced by schema
  • Containerized deployment for reproducibility
  • API-driven integration with downstream systems

The Production Pipeline

Working closely with TakeLab, we delivered a fully containerized AI service with:

  • Input normalization and preprocessing
  • A quantized, domain-tuned LLM (4-bit QLoRA)
  • Constrained decoding to enforce strict JSON schemas
  • Automatic validation and post-processing
  • High-throughput inference using vLLM

All components are encapsulated in Docker, enabling:

  • Predictable deployments
  • GPU-efficient scaling
  • Infrastructure portability across environments

The result is a system where complex public data can be transformed into production-ready, schema-valid JSON via a simple API, without sacrificing precision, control, or auditability.


Looking Ahead

The work delivered through the TBI collaboration forms the backbone of VendueTech’s broader AI roadmap.

Future directions include:

  • Cross-domain generalization
  • Multi-document reasoning
  • Temporal intelligence over auction lifecycles
  • High-performance model training and evaluation

This is how Bidsale®, VendueTech’s auction intelligence platform, is being built:
quietly, rigorously, and with infrastructure-level ambition.

How Europe’s Judicial Property Auctions Are Really Performing in 2026

Read More

Discover Auction Opportunities Faster with Bidsale®

Read More

What Is the EIC Accelerator — and What Does It Really Take to Get Through It?

Read More

Inside Italy’s Real Estate Auctions: A First-Hand Journey Through Bureaucracy, Bidding, and Buying 🇮🇹

Read More

Achievements Timeline: A Journey of Growth, Innovation, and Recognition

Read More

VendueTech: First Croatian Company to Apply for EuroHPC AI Resources

Read More

Milestone Unlocked: Bidsale® is Now an Official EU Trademark

Read More

Joining the Eurostars Programme: A New Chapter Begins

Read More

Scaling Real Estate Intelligence with AI and EuroHPC: The Deep Tech Behind Bidsale

Read More

Bidsale Receives Strong Preliminary Evaluation from Eureka Panel

Read More

Secured EUIPO Community Design Protection

Read More

🥈 VendueTech Wins the Silver Teslino Jaje Innovation Award 2024

Read More

No sleep weekend - Building tech solutions for the public good 👨‍💻

Read More

Most points to enter ZICER - Zagreb Innovation Centre startup incubator 🇭🇷

Read More

Entered the program Startup Factory by ZICER - Zagreb Innovation Centre

Read More
Projekt sufinancira Hrvatska agencija za malo gospodarstvo, inovacije i investicije kroz program Eurostars sredstvima Europske unije iz programa Obzor Europa i sredstvima Švicarsko-hrvatskog programa suradnje. Eurostars je dio Europskog partnerstva za inovativna mala i srednja poduzeća.