CURRENT OPENINGS

AI Applied Data Scientist

100% Remote
Full Time

Role Overview

We are looking for an AI Applied Data Scientist to contribute to the development of high-quality, diverse, and ethically sourced datasets for training and evaluating generative AI models. You will work hands-on with large language models (LLMs), diffusion frameworks, and other generative architectures to create scalable pipelines for synthetic and real data processing.

This role suits a candidate with solid applied AI experience who is comfortable taking ownership of technical components, collaborating closely with senior researchers and engineers, and contributing to innovation in multi-modal dataset creation and governance.

This position can also be offered as an internship or part-time opportunity for candidates with strong research or technical backgrounds seeking to develop applied experience in generative AI and data science.

Key Responsibilities

Model Research and Evaluation

  • Research and evaluate open-source LLMs and generative models (e.g., diffusion models, audio synthesis, video generation frameworks) to identify suitable tools for multi-modal synthetic dataset creation.
  • Perform benchmarking and report findings on model performance, quality, and scalability.

Data Generation and Pipeline Development

  • Develop and maintain scalable data generation pipelines using GPU-accelerated environments (e.g., PyTorch, TensorFlow, CUDA) for large-scale dataset synthesis.
  • Support automation, testing, and optimisation of data generation workflows.

Prompt Engineering and Dataset Diversity

  • Design and refine prompts and conditioning strategies to ensure demographic, linguistic, and regional diversity in generated datasets.
  • Analyse outputs to identify and reduce representational bias.

Data Management and Compliance

  • Contribute to the architecture of secure and compliant data pipelines, following UK GDPR, ISO/IEC 27001, and internal governance standards.
  • Implement and maintain labelling, data cleaning, and validation workflows for both synthetic and real datasets.
  • Ethically source real-world data from open-license repositories and verify data provenance and licence terms.

Documentation and Collaboration

  • Produce clear technical documentation describing dataset generation logic, configuration parameters, and data lineage.
  • Collaborate closely with AI researchers, ML engineers, and data governance specialists to align dataset design with model training objectives.
  • Contribute to internal discussions and experimentation on generative data quality and diversity.

Qualifications and Experience

Essential:

  • Bachelor’s or Master’s degree in Computer Science, AI, Data Science, or a related field.
  • Proven experience working with LLMs and generative AI models (e.g., Stable Diffusion, Mistral, Llama, or similar).
  • Proficiency in Python and common ML frameworks such as PyTorch, TensorFlow, or JAX.
  • Hands-on experience developing or maintaining GPU-accelerated pipelines for AI or data workflows.
  • Understanding of data governance and privacy requirements under UK GDPR.
  • Strong analytical, problem-solving, and documentation skills.

Desirable:

  • Experience handling multi-modal data (text, audio, image, video).
  • Familiarity with MLOps tools (Docker, Airflow, MLflow, or Kubernetes).
  • Understanding of data lineage tracking, bias mitigation, or fairness evaluation.
  • Awareness of ethical AI and responsible data sourcing principles.

What You’ll Gain

  • Experience working with state-of-the-art LLMs and generative models in a research-driven environment.
  • Opportunities to collaborate with leading AI researchers and contribute to multi-modal data innovation.
  • Training and mentoring in ethical data science, data governance, and scalable pipeline engineering.
  • Flexible or hybrid work options and a supportive, growth-oriented culture.

How to Apply

Please send your CV, portfolio or GitHub profile, and a short cover letter outlining your relevant experience to info@datambit.com.

More Openings

Explore our current openings and find the perfect role to grow your career with us today.

AI Applied Data Scientist

Work with cutting-edge LLMs and generative models to design scalable, ethical, and diverse synthetic data pipelines.
100% Remote
Full Time
View Details