Exploring the Landscape of Domain-Specific AI: Part II

iarbel
Sep 30, 2023
8 min read

Updated: Nov 10, 2023

This is part 2 of our Domain-Specific AI series. In this post, we focus on model fine-tuning and analysis of outputs in different settings. We show surprising results using compact open-source models which you can use to serve AI models in production. Upcoming posts will tap into the details and provide step-by-step tutorials so you can build your own AI pipeline from scratch.

TL;DR

You can achieve high quality results for domain-specific tasks using AI with minimal setup effort!
Going open-source doesn't mean you have to compromise for quality. In fact, our experiment shows that a fine-tuned model with only 1.3B parameters performs on par with the commercial ChatGPT API, despite being much smaller than it.
Serving your own model is 10x-30x cheaper than using the ChatGPT API!
Open-source also means enhancing privacy and flexibility. However, if you need to prototype fast and iterate, using commercial LLMs can have its benefits.

Ready to add AI to your product? Click here to to get started

Understanding Model Fine-Tuning
Comparing Text Generation Results
- Pipeline Setup
- Results Comparison
- Cost Comparison
Additional Qualitative Insights
Summary and Discussion
Appendix

In our previous blog post, we explored the dynamic AI landscape, specifically focusing on how businesses can leverage Large Language Models (LLMs) for domain-specific tasks. We compared the advantages of commercial models to open-source alternatives and offered a high-level overview of the AI field. Now, we're diving deeper into the subject by unraveling the concept of fine-tuning in AI. This next chapter will provide detailed model comparisons, including a cost analysis, to empower you in making informed decisions. Join us as we continue the discussion on this intriguing field.

Understanding Model Fine-Tuning

Fine-tuning entails the process of customizing a pre-trained model to imbue it with additional capabilities, and it is considered pivotal in leveraging the immense potential of LLMs. This process essentially adapts a pre-existing model, previously trained on a vast corpus of data, to address specific tasks or domains with precision. By doing so, LLMs can be transformed into specialized tools, capable of performing a wide range of tasks, from natural language understanding and generation to image recognition and recommendation systems.

Fine-tuning involves exposing the model to additional training data tailored for our goal, and it can be done at varying scales. For example, Code Llama is a collection of models fine-tuned from LLaMA-2, with the objective of completing code in popular programming languages. These models were fine-tuned over 500B tokens, which is considered a very large dataset, and also accounts for 25% of the original LLaMA-2 training corpus (2T tokens). On the other hand, this post will demonstrate a fine-tuning example that used only 700 training examples and still yielded high-quality results.

There are many ways to fine-tune LLMs, and each one suits different purposes better. Additionally, you can combine these techniques, but the specific setup depends on your end goal. Three common methods include:

Supervised Fine-Tuning (SFT): Supervised fine-tuning involves training the LLM using task-specific data. The training process is similar to pre-training, however the SFT dataset it geared towards the specific task we'd like the model to perform.
Reinforcement Learning from Human Feedback (RLHF): Reinforcement Learning from Human Feedback is a technique used to further fine-tune the model. In this approach, the model generates multiple alternative responses, which are then ranked and scored. The model is then fine-tuned based on this feedback to improve the quality and relevance of its generated text. The training process differs significantly from pre-training and may employ Reinforcement Learning (RL) algorithms, such as Proximal Policy Optimization (PPO) or Direct Preference Optimization (DPO), although this is not strictly required.
LLM Knowledge Distillation (KD): In LLM knowledge distillation, responses or rankings are generated by alternate LLMs, typically those recognized as cutting-edge and considerably large. This approach allows us to acquire valuable knowledge from top-performing models in specific tasks, all while utilizing smaller models, resulting in substantial reductions in computational expenses.

It's worth highlighting that, although we describe these approaches as distinct, they do intersect in certain aspects. RLHF, for instance, involves obtaining human preferences, yet the model may undergo autoregressive fine-tuning, as is typical in SFT. Furthermore, LLM knowledge distillation can be integrated with human preferences to optimize a model either through autoregressive methods or with the application of RL algorithms.

Interested to learn more? Head over to our post on fine-tuning, which includes a complete hand-on guide to fine-tuning your model. In the meanwhile, at the end of this post you can checkout product description responses from two fine-tuned versions of the OPT model.

By combining these fine-tuning approaches, businesses can achieve greater control over the open-source LLM's output, tailoring it to their specific content generation needs for e-commerce product descriptions or other tasks. Additionally, a well executed fine-tuning pipeline can be based on smaller models, which means improved latency and lower operational costs at inference. And once such a pipeline is in place, it can serve for a variety of tasks, such as sentence classification, named entity recognition (NER), summarization, text generation in different contexts (e-commerce, blogs, emails, etc.) and different voices, and much more. All you'll need is to input a high quality training dataset, and let the magic of AI do the rest.

To sum up, open-source LLMs present an exciting opportunity for businesses seeking customizable text generation solutions. With Meta's LLaMA and other community-driven projects leading the way, the advantages of customization and flexibility are within reach. By leveraging fine-tuning techniques, businesses can unlock the true potential of open-source LLMs and create content that precisely aligns with their unique needs.

Comparing Text Generation Results: Commercial vs. Open-Source LLMs

In the preceding sections, we explored the landscape of commercial LLM APIs and the potential of open-source models for task-specific text generation. Now, we perform a detailed comparison of the results obtained from different LLMs for a specific task. Our focus is on generating e-commerce product descriptions, and we'll compare the outcomes from the ChatGPT API, an open-source LLM with 1.3B parameters, and another one with 350M parameters.

Pipeline Setup

In order to generate product descriptions (Amazon feature-bullets), we utilize three different models:

ChatGPT, using a 2-shot context
OPT-1.3B, fine-tuned
OPT-350M, fine-tuned

We prompt all models with the same inputs. We instructed the models about the task and provided them with a product title and technical details, in the spirit of the example shown in part I. We collected outputs from all models for 50 products and asked annotators to rank their preferred choices.

To learn more about this implementation, you can find a comprehensive, step-by-step guide on using ChatGPT for this task right here. Coming soon is a full walkthrough on fine-tuning an open-source model to enhances privacy and reduce operational expenses.

Results Comparison

Human annotators were presented with outputs of the three models, and were asked to rank them in comparison to one another. Three ranking objectives were considered:

Structure - required to follow Amazon guidelines.
Informativeness - how informative and helpful the description is.
Repetition - avoid repetitive descriptions.

We consider "repetition" as an objective, as we notice that smaller models are more prone than larger ones to generate repetitive content. Even if not using the same semantics, they might repeat ideas or information in a different form. Moreover, since only Supervised Fine-Tuning (SFT) was used to fine-tune the models, these evaluation results can be used with Reinforcement Learning from Human Feedback (RLHF) to further tune and improve models. Below is a summary of the results:

Model	% 1st Rank	% 2nd Rank	% 3rd Rank
ChatGPT	45.13%	39.98%	13.89%
OPT-1.3B	41.53%	30.69%	27.78%
OPT-350M	13.34%	28.33%	58.33%

And there you have it - an open-source deployment that matches the performance of ChatGPT 🥳. We provide examples of models' output at the end of this post.

How much can you save when using an open-source deployment?

At least 90% of the cost

How did we calculate this? Commercial LLMs are priced on a per token basis, while serving your own model requires hosting it. Therefore, we calculated an average token throughput, and then divided the hourly cost of cloud GPU machines by it. Let's analyze the figures we observed when running the product description task presented in this post:

Instance type: GCP G2, includes Nvidia's new L2 GPU (24 GB vRAM)
Prompts / minute: 150
Tokens / prompt: 500 (~170 input, ~330 output)
Overall: 500 tokens * 150 prompts * 60 minutes = 4500K tokens / hr

How does this translate to a cost per token?

VM hourly cost: $0.7 (G2 with L2 GPU)
Cost per token: $0.7 / 4500K = $0.00015 / 1K tokens

This is quite remarkable, considering that the ChatGPT API is priced at $0.002 / 1K tokens, making our open-source deployment more than 10x cheaper compared to the ChatGPT API. Furthermore, recall that ChatGPT required 2-shot prompting for this task; otherwise, the outputs were subpar. Consequently, a single ChatGPT prompt is required to process 3x the tokens. So, if we normalize costs, a fair comparison indicates that we save more than 30x on the cost of using a commercial LLM!

Of course, these numbers pertain to full VM utilization, which is practical when processing offline batches, such as managing a large e-commerce product catalog. However, even if we assume your model serves a SaaS platform and experiences idle periods, a modest 25% utilization translates to a cost of just $0.0006 / 1K tokens. This is still 10x cheaper than the normalized ChatGPT cost for this task. And when it comes to the SaaS setup, numerous vendors offer scalable production-serving solutions, allowing your machines to adapt to demand fluctuations automatically. Therefore, there's no need for your machines to remain idle for more than 75% of the time.

Additional Qualitative Insights

In addition to the summarized results, we also discuss certain key insights that emerged throughout the annotation process.

Commercial LLM (ChatGPT API):
- Text Quality: The generated descriptions were coherent and contextually relevant, outlining important product information. However, outputs were generally shorter and lacked depth. They also tended to have more bullets than desired.
- Customization: While ChatGPT API offered some level of customization through input prompts, it had certain limitations in tailoring responses precisely to specific brand voices or product categories.
- Performance: The API throughput is limited, which can pose a problem when processing large batches. On the other hand, having a managed API is easier during setup.
Open-Source LLM (OPT-1.3B Parameters):
- Text Quality: The open-source model with 1.3B parameters delivered high-quality results, with descriptions that were coherent, engaging, and tailored to the given prompts. Some instances returned descriptions that tended to be conceptually repetitive.
- Customization: This model exhibited a good level of customization potential. Fine-tuning it on a specific e-commerce dataset allowed us to generate product descriptions that seamlessly matched individual brand voices and unique product characteristics.
- Performance: Despite its larger size compared to the 350M variant, the 1B-parameter open-source LLM showed commendable inference times, making it feasible for real-time applications. Using only a simple consumer GPU, throughput was high which translates to low operational costs.
Open-Source LLM (OPT-350M Parameters):
- Text Quality: The 350M parameters open-source model produced satisfactory results, with descriptions that were mostly coherent and relevant. However, there were occasional instances of repetition, as well as minor grammatical errors or inconsistencies.
- Customization: The model allowed for moderate customization, enabling users to fine-tune it for the e-commerce domain. However, due to its limited size, achieving highly specific customizations remained challenging.
- Performance: The open-source LLM performed fast, easily fitting large batch sizes on a single consumer GPU, and you can even choose to perform inference on a CPU. Compared to commercial API, throughput is extremely high.

Conclusion

Based on the results obtained, the open-source LLM with 1.3B parameters emerged as a clear front-runner for the task of generating e-commerce product descriptions. Its superior text quality and high level of customization capability make it an excellent choice for businesses seeking tailored and engaging content.

We saw that open-source LLMs, particularly those that are trained on large corpuses and exhibit high rankings (checkout Open LLM Leaderboard from 🤗 Hugging Face), showcase remarkable potential in generating high-quality, customizable text for specific tasks like crafting e-commerce product descriptions. With benefits in operational costs, data privacy, and the ability to replicate training pipelines, open-source models provide an enticing alternative to commercial APIs. Embracing these open-source solutions empowers businesses with the freedom to harness cutting-edge language models, enabling them to create compelling content that resonates with their audience and reinforces their brand identity.

In upcoming posts, we'll dive into the technical work presented in this post, providing a comprehensive guide on how to power your product with a commercial API, or alternatively, how to fine-tune an open-source model for a domain-specific task.

Contact today to enhance your products with the power of AI

Appendix - Additional Model Outputs

Product Title	Drsn Universal Ceiling Mount with Tray for Projector/Camera, No Hole Installation, Height Adjustable 11.4-22inches, in-Tube Cable Routing, Wall/Ceiling Mounting
OPT-1.3B Output	- UNIVERSAL MOUNTING: The tray fits most projectors and cameras, with the height adjustment from 11.4" to 22". - HEIGHT ADJUSTMENT: The tray can be extended or retracted up to 6.5" to fit different sized equipment. - PROJECTOR MOUNTING: The tray is made of high quality steel and has a weight capacity of 40 lbs. - CABLE MANAGEMENT: The tray has a cable routing hole on the bottom and the extension pole is designed with a cable management hole at the top. - USER FRIENDLY DESIGN: The tray has a removable and replaceable design that makes it easy to clean and maintain. - PACKAGE INCLUDES: 1 x tray, 1 x installation kit, 1 x user manual, 1 x experienced tech support to help you resolve any issues.
OPT-350M Output	- UNIVERSAL MOUNTING DIMENSION: 11.4"x 12" x 3.2", Height Adjustable: 22 inches / 55 cm, Weight: 1.1 lbs (0.45 kg). - EASY TO INSTALL: No hole installation, just screw the mounting plate to a wall or ceiling and place the projector on top of it. - PACKAGE INCLUDES: Drsn universal projector mount with tray, instructions manual, experienced and friendly US based customer support available to assist 7 days a week. - EASY TO CLEAN: The tray is made from durable ABS plastic, easy to clean and wipe down. - HEIGHT ADJUSTMENT: The projector mount has a height adjustment range of 15.9 inches to 22 inches.
ChatGPT Output	- UNIVERSAL CEILING MOUNT: Compatible with projectors, cameras, and even iPhones. - NO HOLE INSTALLATION: No need to drill holes in your ceiling or wall for installation, making it convenient and hassle-free. - HEIGHT ADJUSTABLE: Adjustable height range from 11.4 to 22 inches to suit your specific needs and preferences. - IN-TUBE CABLE ROUTING: Neatly route and hide cables inside the mount's tubes for a clean and organized look. - VERSATILE MOUNTING OPTIONS: Can be installed on both ceilings and walls, providing flexibility in placement. - STURDY AND DURABLE: Made from high-quality materials to ensure stability and long-lasting use. - EASY INSTALLATION: Comes with all necessary installation hardware and instructions for a quick and easy setup.

Product Title	JoyReken 4-Port USB 3.0 Hub, FlyingVHUB Vertical Data USB Hub with 2 ft Extended Cable, for Mac, PC, Xbox One, PS4, PS5, iMac, Surface Pro, XPS, Laptop, Desktop, Flash Drive, Mobile HDD(Pink)
OPT-1.3B Output	- EASY TO INSTALL: This data hub comes with a USB extension cable that can be connected to your computer or laptop. You don't need to install any driver software or use complicated tools to get it working. It works out of the box right after plugging in. - SUPER FAST DATA TRANSFERS: The USB 3.0 interface offers up to 5Gbps transfer speed which is 10 times faster than USB 2.0. It supports USB 2.0 and USB 1.1 as well. No matter what you want to do on your device, this USB hub will help you complete it quickly and smoothly. - EXTENDED CABLE LENGTH: The extended USB cable measures 2 feet long. You can place the hub behind your monitor or other devices without worrying about its cable length. If you still find the cable too short, you can purchase another one separately. - VERSATILE DEVICE CONNECTIONS: This data hub has four ports. Each port supports USB 3.0 connection. You can connect more than one device at the same time. The USB hub also supports USB flash drives, mobile hard drives, printers, keyboards, mice, headphones, speakers, and other peripherals. - WIDE COMPATIBILITY: This USB hub is compatible with most devices including Windows PC/laptop, MacBook, iPhone, iPad, Android smartphone/tablet, Nintendo Switch, PlayStation 4/Pro/5, Xbox One/X, Chromebook, Chromecast, Kindle Fire tablets, etc.
OPT-350M Output	- ADVANTAGES OF THE JOYReken 4-PORT USB HUB: 1. Extends the reach of your desktop peripherals to connect up to 4 devices (up to 2x2TB) at once; 2. Plug and Play, no driver needed; 3. USB 3.0 port supports transfer speeds up to 5Gbps. - UPGRADED TO A SAFER PLUG AND PLAY DESIGN: The vertical data hub has a secure and reliable plug and play design that requires no drivers or software installation. No more risk of losing your data because of a USB 3.0 cable getting lost or damaged. - FLEXIBLE PLUG AND PLAY DESIGN: The USB 3.0 extension cable allows you to connect any USB 3.0 device such as flash drives, external hard drives, keyboards, mice, USB microphones, and more. The hub works with Windows 10/8/8.1/7/ Vista / XP, macOS Sierra, Chrome OS, Linux, Android, and other platforms. - HIGH SPEED DATA TRANSMISSION: USB 3.0 provides high speed data transmission (up to 5Gbps) to provide faster data transfer rates between computers, keyboards, mouse, and external storage devices. - PORTABLE & EASY TO USE: This USB 3.0 extension cable is designed to be lightweight and portable, making it easy to carry around your home, office, or anywhere else. The cable features a flexible, durable, and bendable design so you can easily extend its reach wherever you need it.
ChatGPT Output	- VERTICAL DESIGN: The vertical design of the JoyReken 4-Port USB 3.0 Hub allows for easy access and organization of your USB devices. It saves space on your desk and keeps your cables neat and tidy. - EXPANDED CONNECTIVITY: With 4 USB 3.0 ports, this hub allows you to connect multiple devices such as keyboards, laptops, desktops, flash drives, and mobile HDDs to your computer or gaming console. - HIGH-SPEED DATA TRANSFER: The USB 3.0 technology provides a data transfer rate of up to 5 Gbps, allowing you to transfer large files and stream HD videos quickly and efficiently. - EXTENDED CABLE: The 2 ft extended cable provides flexibility in positioning the hub to suit your needs. You can easily connect it to your computer or gaming console without any restrictions. - UNIVERSAL COMPATIBILITY: This USB hub is compatible with various devices including Mac, PC, Xbox One, PS4, PS5, iMac, Surface Pro, XPS, laptops, desktops, and more. It works with both USB and USB Type C interfaces, making it versatile and convenient to use.

Product Title	XNN Tablet 10 Inch, Android Tablets, Quad-Core Processor, 32GB ROM 128GB Expand，2+5MP Camera,IPS HD Touch Scree, 6000mah Battery, WiFi Bluetooth, 2023 Powerful Performance Tablet Computer(Black)
OPT-1.3B Output	- POWERFUL PERFORMANCE TABLET: XNN tablet is equipped with a powerful and stable high performance quad-core processor, 2GB RAM, 32GB ROM, and expandable up to 128GB via TF card. The powerful CPU allows the application to run smoothly and fast. The latest Android 11 system ensures that you can use your tablet comfortably and quickly. - LARGE CAPACITY BATTERY: The 6000mAh battery of this 10 inch tablet provides up to 12 hours of reading, watching movies or playing games. It's perfect for long journeys, traveling, reading, surfing the web, and more. - ANDROID 11 SYSTEM: This android tablet comes with the latest version of Android 11 system, which has been optimized for tablets and offers a smoother user experience. The system has a new design interface, new tools, and improved apps, making it easier than ever to download and install apps. - 10 INCH TABLET WITH 5MP CAMERA: The 10 inch tablet features a 5MP rear camera and a 2MP front camera. It also has dual speakers, which can provide loud sound and make video calls clearer. - GMS CERTIFICATION: This tablet is certified by Google Mobile Service (GMS), which ensures that the app works correctly and the system is updated regularly.
OPT-350M Output	- HIGH PERFORMANCE TABLET: The tablet is equipped with a powerful quad-core processor and Android 11.0 operating system. It can run various apps smoothly, such as Netflix, YouTube, Facebook, Twitter, etc. - LARGE STORAGE AND LONG BATTERY LIFE: The android tablet comes with 2GB RAM, 32GB ROM, and has a 5000mAh battery capacity, which can last for about 8 hours. - 10 INCH TOUCH SCREEN: The 10 inch tablet features a 1280800 IPS HD display and a bright color reproduction. You can enjoy more colorful images and videos with the 10 inch tablet. - LONG LASTING CONNECTIVITY: The wifi tablet uses 2.4G/5G dual band wireless technology to ensure fast connection speeds and stable connection. With this android tablet, you can easily connect to any network with your smartphone or tablet, such as Wifi, Bluetooth, USB, etc. - GREAT COMPATIBILITY*: This 10 inch tablet computer supports Bluetooth, 2.4G/5G dual band wireless technology, which is compatible with most Android phones and tablets, such as Google Pixel, Samsung Galaxy S8, HTC U11, iPad, etc.
ChatGPT Output	- POWERFUL PERFORMANCE: Equipped with a quad-core processor and 2GB RAM, this XNN tablet offers smooth and fast performance for multitasking, browsing the web, streaming videos, and playing games. It can handle your everyday tasks with ease. - EXPANDABLE STORAGE: With 32GB of internal storage, you have plenty of space to store your photos, videos, and documents. If you need more storage, you can easily expand it up to 128GB using a microSD card (not included), ensuring you never run out of space. - STUNNING VISUALS: The 10-inch IPS HD touch screen display with a resolution of 1280 x 800 pixels delivers vibrant colors and sharp details, providing an immersive viewing experience for movies, videos, and games. The screen is also great for reading ebooks and browsing the web. - LONG-LASTING BATTERY: The 6000mAh battery ensures that you can use your tablet for up to 10 hours on a single charge, giving you the freedom to take it with you on the go without worrying about running out of power. Perfect for travel, school, or work. - CAPTURE MEMORABLE MOMENTS: The tablet features a 5MP rear camera that allows you to capture high-quality photos and videos. Whether it's a special event or a candid moment, you can easily capture and share your memories with friends and family.