Location: Remote
Hours: Part time (10-20 hours/week)
Contract: Consulting agreement, Equity based
We are seeking a highly skilled Senior Data Engineer to play a pivotal role in developing scalable data pipelines, optimizing infrastructure, and enabling AI-driven innovation. A core focus of this role is building reliable, efficient systems to scrape and process data from diverse websites across multiple countries and languages. You will collaborate with cross-functional teams to create and maintain robust data systems that power VendueTech’s universal public data scraping and multimodal AI systems.
As a fully remote team, we seek a candidate with proven experience in remote work environments who can effectively manage project timelines and deadlines. Strong communication skills are essential to keeping stakeholders and team members aligned through digital channels while collaborating efficiently across time zones. Success in this role requires the ability to foster team cohesion, uphold project accountability, and maintain structured organization in a dynamic, remote setting
Key Responsibilities:
Data Engineering & Infrastructure:
- Design, develop, and maintain scalable, high-performance ETL pipelines.
- Build and optimize systems for large-scale public data scraping from diverse websites, auction platforms, and other sources.
- Develop strategies to handle website restrictions (e.g., CAPTCHAs, rate limiting) while ensuring compliance with applicable regulations.
- Optimize data architecture for multimodal public datasets across multiple languages and countries.
- Implement and maintain databases, including vector databases, to support AI-driven retrieval and analysis.
Web Scraping:
- Develop and maintain automated web scraping tools and frameworks for extracting structured and unstructured data.
- Build and deploy scalable scraping solutions that can handle dynamic, high-volume websites.
- Monitor and adapt scraping processes to accommodate changing website structures and new requirements.
- Ensure data quality, accuracy, and integrity during the scraping and processing phases.
AI & Data System Integration:
- Collaborate with AI teams to build and enhance Retrieval-Augmented Generation (RAG) systems.
- Ensure seamless integration of structured and unstructured data for AI models.
- Contribute to the development of multimodal data pipelines for AI applications.
Collaboration & Stakeholder Management:
- Work closely with data scientists, analysts, and engineering teams to meet project goals.
- Support product teams by providing clean, reliable datasets for AI-powered development.
- Contribute to the preparation of EU R&D grant proposals by providing technical expertise and insights.
Innovation & Technology Adoption:
- Stay updated on emerging technologies in data engineering, web scraping, LLMs, RAG systems, and big data.
- Evaluate and recommend new tools and technologies to enhance platform capabilities.
Technical Excellence:
- Ensure best practices in data engineering, including coding standards, testing, and documentation.
- Contribute to coding efforts and system architecture designs.
- Optimize system performance for speed, reliability, and scalability.
Qualifications:
- 5+ years of experience in data engineering, web scraping, or a related role.
- Strong programming skills in Python (Scrapy, BeautifulSoup, Selenium) or other scraping frameworks.
- Hands-on experience with cloud platforms (AWS, GCP, or Azure).
- Proficiency in designing and maintaining ETL pipelines and workflows.
- Expertise in databases (SQL and NoSQL), data warehouses, and big data technologies (e.g., Spark, Hadoop).
- Experience with vector databases and systems for multimodal data.
- Deep understanding of web technologies (HTML, CSS, JavaScript) and scraping techniques.
- Familiarity with machine learning workflows and AI integration.
- Strong problem-solving skills and ability to work in a fast-paced, collaborative environment.
Why Join Us?
- Work on innovative, AI-driven projects with real-world impact.
- Collaborate with top researchers and industry experts.
- Flexible remote work environment with a global team.
- Opportunity to contribute to groundbreaking data-driven technologies.
VendueTech is an equal opportunity employer and does not discriminate on the basis of race, religion, gender identity or expression, national origin, age, disability, marital status, sexual orientation, or any other legally protected characteristics. We are committed to creating a diverse and inclusive workplace and encourage applications from all qualified individuals.