Datambit.com

CURRENT OPENINGS

AI Applied Data Scientist

100% Remote

Full Time

Role Overview

We are looking for an AI Applied Data Scientist to contribute to the development of high-quality, diverse, and ethically sourced datasets for training and evaluating generative AI models. You will work hands-on with large language models (LLMs), diffusion frameworks, and other generative architectures to create scalable pipelines for synthetic and real data processing.

‍

This role suits a candidate with solid applied AI experience who is comfortable taking ownership of technical components, collaborating closely with senior researchers and engineers, and contributing to innovation in multi-modal dataset creation and governance.

‍

This position can also be offered as an internship or part-time opportunity for candidates with strong research or technical backgrounds seeking to develop applied experience in generative AI and data science.

‍

Key Responsibilities

Model Research and Evaluation

Research and evaluate open-source LLMs and generative models (e.g., diffusion models, audio synthesis, video generation frameworks) to identify suitable tools for multi-modal synthetic dataset creation.
Perform benchmarking and report findings on model performance, quality, and scalability.

‍

Data Generation and Pipeline Development

Develop and maintain scalable data generation pipelines using GPU-accelerated environments (e.g., PyTorch, TensorFlow, CUDA) for large-scale dataset synthesis.
Support automation, testing, and optimisation of data generation workflows.

‍

Prompt Engineering and Dataset Diversity

Design and refine prompts and conditioning strategies to ensure demographic, linguistic, and regional diversity in generated datasets.
Analyse outputs to identify and reduce representational bias.

‍

Data Management and Compliance

Contribute to the architecture of secure and compliant data pipelines, following UK GDPR, ISO/IEC 27001, and internal governance standards.
Implement and maintain labelling, data cleaning, and validation workflows for both synthetic and real datasets.
Ethically source real-world data from open-license repositories and verify data provenance and licence terms.

‍

Documentation and Collaboration

Produce clear technical documentation describing dataset generation logic, configuration parameters, and data lineage.
Collaborate closely with AI researchers, ML engineers, and data governance specialists to align dataset design with model training objectives.
Contribute to internal discussions and experimentation on generative data quality and diversity.

‍

Qualifications and Experience

Essential:

Bachelor’s or Master’s degree in Computer Science, AI, Data Science, or a related field.
Proven experience working with LLMs and generative AI models (e.g., Stable Diffusion, Mistral, Llama, or similar).
Proficiency in Python and common ML frameworks such as PyTorch, TensorFlow, or JAX.
Hands-on experience developing or maintaining GPU-accelerated pipelines for AI or data workflows.
Understanding of data governance and privacy requirements under UK GDPR.
Strong analytical, problem-solving, and documentation skills.

‍

Desirable:

Experience handling multi-modal data (text, audio, image, video).
Familiarity with MLOps tools (Docker, Airflow, MLflow, or Kubernetes).
Understanding of data lineage tracking, bias mitigation, or fairness evaluation.
Awareness of ethical AI and responsible data sourcing principles.

‍

What You’ll Gain

Experience working with state-of-the-art LLMs and generative models in a research-driven environment.
Opportunities to collaborate with leading AI researchers and contribute to multi-modal data innovation.
Training and mentoring in ethical data science, data governance, and scalable pipeline engineering.
Flexible or hybrid work options and a supportive, growth-oriented culture.

‍

How to Apply

Please send your CV, portfolio or GitHub profile, and a short cover letter outlining your relevant experience to info@datambit.com.

Apply Now

AI Applied Data Scientist

Role Overview

‍

Key Responsibilities

‍

Qualifications and Experience

‍

What You’ll Gain

How to Apply

More Openings

AI Applied Data Scientist

Ready to Protect Your Business From Deepfake Threats?