hello world!

Published: February 9, 2026

From Raw Public Data to Production-Grade AI Pipelines: Inside VendueTech’s R&D Collaboration with CROBOHUB++ and FER TakeLab

At VendueTech, our mission is to turn complex, fragmented public auction information into reliable, machine-readable intelligence that can be used safely, consistently, and at scale across Europe.

Public auction datasets are inherently challenging:
they are heterogeneous, multilingual, structurally inconsistent, and evolve continuously over time. Making this data usable for analytics, decision-making, and downstream AI systems requires more than off-the-shelf tooling—it requires deep technical R&D, careful validation, and production-grade engineering.

To tackle this problem systematically, VendueTech partnered with EDIH CROBOHUB++ and the FER TakeLab at the FER, using the Test-Before-Invest (TBI) framework.

This collaboration allowed us to move from research hypotheses to deployed, production-ready AI infrastructure—without premature scaling risk.

Why Test-Before-Invest Is Critical for Serious AI Infrastructure

The Test-Before-Invest model enables technology companies to:

Experiment on real, production-like data
Validate architectural decisions under expert supervision
Stress-test models and pipelines before committing long-term resources
Reduce technical, operational, and financial risk

For VendueTech, TBI provided the structure needed to validate one of the hardest problems in applied AI today:

Consistent, schema-faithful extraction of structured information from complex, semi-structured public data—across jurisdictions, languages, and formats.

Phase 1: Understanding the Data Reality

The first phase focused on data characteristics and constraints rather than tooling.

Together with FER researchers, we analyzed:

Structural variability of auction datasets across sources and regions
Common failure modes of traditional extraction approaches
Requirements for repeatability, auditability, and downstream compatibility
Alignment with legal, ethical, and data-governance standards

Key insight
Any solution had to preserve semantic structure (key–value relationships, nested attributes, temporal fields) exactly as presented, without heuristic loss or uncontrolled normalization.

This immediately ruled out generic text-conversion pipelines.

Venduetech - TBI report - TBI1 …

Phase 2: Why Generic Extractors Fail—and Domain-Tuned LLMs Succeed

The second phase evaluated whether modern AI models could reliably transform complex inputs into strictly structured outputs suitable for automated processing.

We tested multiple approaches to structured extraction and found consistent limitations:

Loss of field boundaries
Inconsistent formatting across runs
Poor handling of nested or contextual attributes
Inability to enforce strict output schemas

The Turning Point: Instruction-Tuned LLMs

Example input (just the beginning of the document):

In collaboration with TakeLab, we designed and evaluated instruction-tuned large language models trained specifically to map domain-specific inputs into deterministic, key-value outputs.

Two strategies were compared:

Prompt-based inference on cleaned representations
Instruction-tuning compact LLMs using QLoRA

Outcome
Instruction-tuned models significantly outperformed prompt-only approaches, achieving near-deterministic output fidelity under exact-match evaluation—an essential property for production systems.

This phase validated a core architectural principle of VendueTech:

If output consistency matters, domain-specific fine-tuning beats generic prompting—every time.

Venduetech - TBI report - TBI2 …

Phase 3: Turning Research into Production Infrastructure

High extraction accuracy alone is insufficient without operational robustness.

The final TBI phase focused on deploying the solution in realistic conditions, emphasizing:

Model efficiency and resource optimization
Deterministic JSON outputs enforced by schema
Containerized deployment for reproducibility
API-driven integration with downstream systems

The Production Pipeline

Working closely with TakeLab, we delivered a fully containerized AI service with:

Input normalization and preprocessing
A quantized, domain-tuned LLM (4-bit QLoRA)
Constrained decoding to enforce strict JSON schemas
Automatic validation and post-processing
High-throughput inference using vLLM

All components are encapsulated in Docker, enabling:

Predictable deployments
GPU-efficient scaling
Infrastructure portability across environments

The result is a system where complex public data can be transformed into production-ready, schema-valid JSON via a simple API, without sacrificing precision, control, or auditability.

Looking Ahead

The work delivered through the TBI collaboration forms the backbone of VendueTech’s broader AI roadmap.

Future directions include:

Cross-domain generalization
Multi-document reasoning
Temporal intelligence over auction lifecycles
High-performance model training and evaluation

This is how Bidsale®, VendueTech’s auction intelligence platform, is being built:
quietly, rigorously, and with infrastructure-level ambition.

VendueTech Receives Horizon Europe's Seal of Excellence — and Why It Matters

May 8, 2026

The Seal of Excellence, Explained — From the Inside

April 29, 2026

How Europe’s Judicial Property Auctions Are Really Performing in 2026

March 16, 2026

Discover Auction Opportunities Faster with Bidsale®

March 12, 2026

What Is the EIC Accelerator — and What Does It Really Take to Get Through It?

February 9, 2026

Inside Italy’s Real Estate Auctions: A First-Hand Journey Through Bureaucracy, Bidding, and Buying 🇮🇹

September 25, 2025

Achievements Timeline: A Journey of Growth, Innovation, and Recognition

September 19, 2025

VendueTech: First Croatian Company to Apply for EuroHPC AI Resources

September 14, 2025

Milestone Unlocked: Bidsale® is Now an Official EU Trademark

September 14, 2025

Joining the Eurostars Programme: A New Chapter Begins

September 14, 2025

Scaling Real Estate Intelligence with AI and EuroHPC: The Deep Tech Behind Bidsale

April 24, 2025

Bidsale Receives Strong Preliminary Evaluation from Eureka Panel

April 22, 2025

Secured EUIPO Community Design Protection

February 14, 2025

🥈 VendueTech Wins the Silver Teslino Jaje Innovation Award 2024

November 28, 2024

No sleep weekend - Building tech solutions for the public good 👨‍💻

December 6, 2023

Most points to enter ZICER - Zagreb Innovation Centre startup incubator 🇭🇷

November 8, 2023

Entered the program Startup Factory by ZICER - Zagreb Innovation Centre

May 10, 2022