🎲 FiddleCube - Automated dataset generation for fine-tuning LLMs

Create high-quality datasets for fine-tuning and reinforcement learning.

Kaushik Srinivasan

2 years ago

#ai#data_science#generative_ai#data_labeling#developer_tools

Tl;Dr; Fine-tuning LLMs requires high-quality datasets. FiddleCube automagically generates fine-tuning datasets from your data.

User Data Source > Fine-tuning Datasets (FiddleCube) > Fine-tuning

Head over to fiddlecube.ai to get started!

Hi everyone, we are Neha and Kaushik. We’re building FiddleCube to make high-quality datasets accessible to everyone.

🦸 Kaushik spent most of the last decade building tech at companies like Google, Uber, and LinkedIn.

🧙🏻 Neha has spent a similar amount of time as a dev at multiple startups, most recently at Uber

👫🏻🫶🏻 We met at Uber, eventually got married, and decided to build a startup together, following our passion for AI.

😤 The Problem

In the real world, LLMs need to be aligned to follow human instructions. It needs to respond in a manner that is:

Positive, Truthful & Honest
And in accordance with human beliefs and sensibilities

Remarkable outcomes have been achieved towards this end by fine-tuning and reinforcement learning with high-quality datasets. However, creating these datasets takes significant time, manual effort, and money.

💡The Solution

FiddleCube leverages a suite of AI models to create high-quality datasets for fine-tuning and reinforcement learning.

Generate annotated datasets from raw data.
Augment the datasets - create large datasets to significantly improve model performance.
Evaluate and improve the data quality of your training dataset.

We create a rich, diverse, high-quality dataset to produce better models with a lower corpus of data.

⚙️ Use Cases

👩🏻‍🎤Personalization

Give the model a personality, voice, and tone. For example, you can create a safe Dora the explorer / Peppa Pig model that speaks to children.

👩🏻‍💻 API calling and coding

For specific use cases like making API calls or generating code, fine-tuning has provably demonstrated better results. You can fine-tune the LLM on a corpus of code or API data to significantly improve their ability at these tasks.

🚄 Increase Throughput, Reduce Latency and Cost

Fine-tuned LLMs are much smaller than the foundational models. You can use them to increase throughput and reduce latency and cost.

🗺️ Low Resource Domains

LLMs perform poorly in certain domains like vernacular languages. These domains lack a sufficient corpus of high-quality data. Fine-tuning using generated datasets has shown remarkable improvements over the state of the art in these cases.

🙏🏻 Ask

Are you fine-tuning any LLM, or looking to fine-tune LLaMa V2, MPT, or Falcon? We would love to know your use case. Drop a comment on what you are doing, or reach out to us privately!

👋🏻 Need help with fine-tuning?

Book a slot on our calendar 🗓️ or drop us a line using:

- Email 📧 : kaushik@fiddlecube.ai

- Typeform 📝

and we will get back to you!