Semantic Search and LLM¶

2 min intro¶

In the previous chapters, we've explored using embeddings for semantic search and fine-tuning large language models. Now, we’re ready to integrate these two powerful techniques. This chapter will demonstrate how to leverage fine-tuned models to answer specific questions by retrieving relevant information from targeted subsets of private documents.In the previous chapters, we’ve delved into using embeddings for semantic search and fine-tuning large language models. Now, we're gearing up to combine these techniques in a practical and, dare I say, caffeinated way. Imagine a chatbot dedicated to troubleshooting coffee machines—those essential contraptions that keep developers fueled during marathon coding sessions. With countless coffee disasters to address, from “Why isn’t my espresso shot coming out?” to “Help, my cappuccino is more like a latte!”—we’ll build a system that pulls precise answers from a curated set of documents. Get ready to brew up a solution that’s both strong and smart, ensuring your coding energy stays as high as your coffee consumption!

Installing dependencies¶

The key dependencies for this project include:

sentence-transformers: This library offers a variety of pre-trained embedding models optimized for sentence similarity, enabling effective comparison and retrieval of text data.
chromadb: A robust database solution designed for efficiently storing and managing embeddings, facilitating quick lookups and scalable operations.
langchain: An essential library for building and managing complex language model workflows, integrating various components seamlessly.
transformers: Provides access to a wide range of pre-trained models and tools for various NLP tasks, crucial for fine-tuning and leveraging large language models.

In [ ]:

Copied!





%%capture
!pip install --quiet chromadb langchain_community langchain-huggingface loguru pydantic sentence-transformers
!pip install --quiet unstructured_client
!pip install --quiet transformers accelerate bitsandbytes einops
!pip install --quiet gradio
%%capture
!pip install --quiet chromadb langchain_community langchain-huggingface loguru pydantic sentence-transformers
!pip install --quiet unstructured_client
!pip install --quiet transformers accelerate bitsandbytes einops
!pip install --quiet gradio

Resources¶

To support our chatbot's troubleshooting capabilities, we'll be utilizing two key documents related to popular coffee machines:

Nespresso Inissia Manual: This document provides detailed instructions for the Nespresso Inissia model, including setup, operation, and maintenance procedures. It’s a valuable resource for understanding common issues and solutions specific to this machine.
Nespresso Citiz Platinum User Manual: This manual covers the Nespresso Citiz Platinum model, offering comprehensive guidance on its features, usage, and troubleshooting tips. It’s essential for addressing problems and providing solutions for this particular machine.

These manuals will serve as the foundation for the knowledge base that our chatbot will leverage to assist users with coffee machine issues.

In [ ]:

Copied!

%%capture
!curl https://www.nespresso.com/shared_res/manuals/inissia/inissia_C_breville.pdf > sample_data/inissia_C_breville.pdf
!curl https://www.nespresso.com/shared_res/manuals/essenza-mini/2017/UM_NESPRESSO_ESSENZA_MINI_BREVILLE_PROD_WEB_2017_02_20.pdf > sample_data/essenza_mini_breville.pdf
%%capture
!curl https://www.nespresso.com/shared_res/manuals/inissia/inissia_C_breville.pdf > sample_data/inissia_C_breville.pdf
!curl https://www.nespresso.com/shared_res/manuals/essenza-mini/2017/UM_NESPRESSO_ESSENZA_MINI_BREVILLE_PROD_WEB_2017_02_20.pdf > sample_data/essenza_mini_breville.pdf

RAG concept in a nutshell¶

Retrieval-Augmented Generation (RAG) is a powerful architecture that combines retrieval-based and generation-based approaches to enhance natural language processing tasks. As depicted in the diagram, RAG integrates two main components:

Retriever: This module is responsible for fetching relevant documents from a large corpus based on the input query. It utilizes methods such as dense retrieval to identify and retrieve pertinent information that can help answer the query.
Generator: Once the relevant documents are retrieved, the generator takes these documents along with the original query and produces a coherent and contextually accurate response. The generation process involves leveraging pre-trained language models to synthesize information and generate human-like text.

In essence, RAG combines the strengths of both retrieval and generation to produce more accurate and contextually informed answers, enhancing the overall performance of language understanding and generation tasks.

Data extraction and storage¶

Let's use the FREE Unstructured service¶

In [ ]:

Copied!





import os, json

import unstructured_client
from unstructured_client.models import operations, shared

from typing import List
from google.colab import userdata

elements = []
if True == False:
    client = unstructured_client.UnstructuredClient(
        api_key_auth=userdata.get("UNSTRUCTURED_API_KEY"),
        server_url="https://api.unstructuredapp.io/general/v0/general"
    )

    for filename in ["sample_data/inissia_C_breville.pdf", "sample_data/essenza_mini_breville.pdf"]:
        # Loop through PDFs, download, pre-process and then delete
        with open(filename, "rb") as f:
            data = f.read()

        req = operations.PartitionRequest(
            partition_parameters=shared.PartitionParameters(
                files=shared.Files(
                    content=data,
                    file_name=filename,
                ),
                # --- Other partition parameters ---
                # Note: Defining `strategy`, `chunking_strategy`, and `output_format`
                # parameters as strings is accepted, but will not pass strict type checking. It is
                # advised to use the defined enum classes as shown below.
                strategy=shared.Strategy.HI_RES,
                languages=['eng'],
                coordinates=True,
                chunking_strategy=shared.ChunkingStrategy.BY_TITLE,
                max_characters=1024,
                # --- PDF partition parameters ---
                split_pdf_page=True,            # If True, splits the PDF file into smaller chunks of pages.
                split_pdf_allow_failed=True,    # If True, the partitioning continues even if some pages fail.
                split_pdf_concurrency_level=15  # Set the number of concurrent request to the maximum value: 15.
            ),
        )

        try:
            res = client.general.partition(request=req)
            element_dicts = [element for element in res.elements]
            json_elements = json.dumps(element_dicts, indent=2)

            # Split filename in path, filename and extension
            filename_split = os.path.splitext(filename)
            filename_no_ext = filename_split[0].split('/')[-1]

            # Write the processed data to a local with same name but json extension
            with open(f"/content/drive/MyDrive/{filename_no_ext}.json", "w") as file:
                file.write(json_elements)
            with open(f"{filename_no_ext}.json", "w") as file:
                file.write(json_elements)
            elements.extend(element_dicts)
        except Exception as e:
            print(e)


    with open(f"all_elements.json", "w") as file:
        file.write(json.dumps(elements, indent=2))
import os, json

import unstructured_client
from unstructured_client.models import operations, shared

from typing import List
from google.colab import userdata

elements = []
if True == False:
    client = unstructured_client.UnstructuredClient(
        api_key_auth=userdata.get("UNSTRUCTURED_API_KEY"),
        server_url="https://api.unstructuredapp.io/general/v0/general"
    )

    for filename in ["sample_data/inissia_C_breville.pdf", "sample_data/essenza_mini_breville.pdf"]:
        # Loop through PDFs, download, pre-process and then delete
        with open(filename, "rb") as f:
            data = f.read()

        req = operations.PartitionRequest(
            partition_parameters=shared.PartitionParameters(
                files=shared.Files(
                    content=data,
                    file_name=filename,
                ),
                # --- Other partition parameters ---
                # Note: Defining `strategy`, `chunking_strategy`, and `output_format`
                # parameters as strings is accepted, but will not pass strict type checking. It is
                # advised to use the defined enum classes as shown below.
                strategy=shared.Strategy.HI_RES,
                languages=['eng'],
                coordinates=True,
                chunking_strategy=shared.ChunkingStrategy.BY_TITLE,
                max_characters=1024,
                # --- PDF partition parameters ---
                split_pdf_page=True,            # If True, splits the PDF file into smaller chunks of pages.
                split_pdf_allow_failed=True,    # If True, the partitioning continues even if some pages fail.
                split_pdf_concurrency_level=15  # Set the number of concurrent request to the maximum value: 15.
            ),
        )

        try:
            res = client.general.partition(request=req)
            element_dicts = [element for element in res.elements]
            json_elements = json.dumps(element_dicts, indent=2)

            # Split filename in path, filename and extension
            filename_split = os.path.splitext(filename)
            filename_no_ext = filename_split[0].split('/')[-1]

            # Write the processed data to a local with same name but json extension
            with open(f"/content/drive/MyDrive/{filename_no_ext}.json", "w") as file:
                file.write(json_elements)
            with open(f"{filename_no_ext}.json", "w") as file:
                file.write(json_elements)
            elements.extend(element_dicts)
        except Exception as e:
            print(e)


    with open(f"all_elements.json", "w") as file:
        file.write(json.dumps(elements, indent=2))

In [ ]:

Copied!

f"{len(elements)} chunks have been retrieved for those 2 documents"
f"{len(elements)} chunks have been retrieved for those 2 documents"

At the time this notebook was written, you only got few chunks. Let's use a strategy based on PyMuPDF

Alternate strategy¶

In [ ]:

Copied!

!pip install --quiet pymupdf
!pip install --quiet pymupdf

In [ ]:

Copied!





from typing import List

from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever

from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader


def process_docs(docs: List[str]) -> List[Document]:
  '''
  This function consumes a list of file names and aplied preprocessing to get Langchain doucment chunks.
  '''

  # preparing the doc splitter
  text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=32)

  # preparing the chunks
  chunked_documents = list()
  # reading file one by one
  for doc in docs:
    # loading the file with langchain loader
    doc_loader = PyMuPDFLoader(doc)
    # splitting the document in chunks
    chunks = doc_loader.load_and_split(text_splitter)
    # addding these to the returned lis
    chunked_documents.extend(chunks)

  return chunked_documents
from typing import List

from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever

from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader


def process_docs(docs: List[str]) -> List[Document]:
  '''
  This function consumes a list of file names and aplied preprocessing to get Langchain doucment chunks.
  '''

  # preparing the doc splitter
  text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=32)

  # preparing the chunks
  chunked_documents = list()
  # reading file one by one
  for doc in docs:
    # loading the file with langchain loader
    doc_loader = PyMuPDFLoader(doc)
    # splitting the document in chunks
    chunks = doc_loader.load_and_split(text_splitter)
    # addding these to the returned lis
    chunked_documents.extend(chunks)

  return chunked_documents

In [ ]:

Copied!

elements = process_docs(['sample_data/inissia_C_breville.pdf', 'sample_data/essenza_mini_breville.pdf'])
elements = process_docs(['sample_data/inissia_C_breville.pdf', 'sample_data/essenza_mini_breville.pdf'])

In [ ]:

Copied!

f"{len(elements)} chunks have been retrieved for those 2 documents"
f"{len(elements)} chunks have been retrieved for those 2 documents"

Transform Unstructured elements to list of Langchain documents¶

In [ ]:

Copied!





documents = []
for element in elements:
    metadata = element.metadata
    documents.append(Document(page_content=element.page_content, metadata=metadata))
documents = []
for element in elements:
    metadata = element.metadata
    documents.append(Document(page_content=element.page_content, metadata=metadata))

Create an embedding¶

In [ ]:

Copied!

from tqdm.autonotebook import tqdm

embeddings_model_name = "all-MiniLM-L6-v2"

from langchain_huggingface import HuggingFaceEmbeddings
from tqdm.autonotebook import tqdm, trange

embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
from tqdm.autonotebook import tqdm

embeddings_model_name = "all-MiniLM-L6-v2"

from langchain_huggingface import HuggingFaceEmbeddings
from tqdm.autonotebook import tqdm, trange

embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)

Embed elements and store them in ChromaDB store¶

In [ ]:

Copied!





from langchain.vectorstores import utils as chromautils
from langchain_community.vectorstores import Chroma

# ChromaDB doesn't support complex metadata, e.g. lists, so we drop it here.
# If you're using a different vector store, you may not need to do this
docs = chromautils.filter_complex_metadata(documents)

vectorstore = Chroma.from_documents(docs, embeddings, persist_directory="./db_1")
from langchain.vectorstores import utils as chromautils
from langchain_community.vectorstores import Chroma

# ChromaDB doesn't support complex metadata, e.g. lists, so we drop it here.
# If you're using a different vector store, you may not need to do this
docs = chromautils.filter_complex_metadata(documents)

vectorstore = Chroma.from_documents(docs, embeddings, persist_directory="./db_1")

Define a retriever¶

In [ ]:

Copied!

#retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 3, "fetch_k": 5})
#retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 3, "fetch_k": 5})

In [ ]:

Copied!

for k, v in vectorstore.get().items():
  print(k, v)
for k, v in vectorstore.get().items():
  print(k, v)

In [ ]:

Copied!





# query the retriever with simple input
query = "cold coffee"
retrieved_docs = retriever.invoke(input=query)

# print the chunk matching the query
for i, doc in enumerate(retrieved_docs):
  print('#'*30)
  print(f'\n<<{i}>> on page {doc.metadata["page"]}: \n{doc.page_content}')
# query the retriever with simple input
query = "cold coffee"
retrieved_docs = retriever.invoke(input=query)

# print the chunk matching the query
for i, doc in enumerate(retrieved_docs):
  print('#'*30)
  print(f'\n<<{i}>> on page {doc.metadata["page"]}: \n{doc.page_content}')

Define & Run the Chatbot¶

In [ ]:

Copied!

from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

Define the llm to be used¶

In [ ]:

Copied!

!pip install --quiet flash-attention
!pip install --quiet flash-attention

In [ ]:

Copied!





import torch

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace, HuggingFacePipeline

model_name = "microsoft/Phi-3-mini-4k-instruct"
#model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"

# model_name = "microsoft/Phi-3.5-mini-instruct"

#task shall be one of:
#  - “text-generation",
#  - “text2text-generation",
#  - “summarization”,
#  - “translation”

# This class uses serverless API (hosting by HF)
# llm = HuggingFaceEndpoint(
#     repo_id=model_name,
#     task="text-generation",
#     max_new_tokens=512,
#     do_sample=False,
#     repetition_penalty=1.03,
# )

device = 0 if torch.cuda.is_available() else -1
#
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
bnb_config = None

model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained(model_name)

pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    top_k=50,
    temperature=0.1
)
llm = HuggingFacePipeline(pipeline=pipeline)

chat = ChatHuggingFace(llm=llm, verbose=True)
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace, HuggingFacePipeline

model_name = "microsoft/Phi-3-mini-4k-instruct"
#model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"

# model_name = "microsoft/Phi-3.5-mini-instruct"

#task shall be one of:
#  - “text-generation",
#  - “text2text-generation",
#  - “summarization”,
#  - “translation”

# This class uses serverless API (hosting by HF)
# llm = HuggingFaceEndpoint(
#     repo_id=model_name,
#     task="text-generation",
#     max_new_tokens=512,
#     do_sample=False,
#     repetition_penalty=1.03,
# )

device = 0 if torch.cuda.is_available() else -1
#
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
bnb_config = None

model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained(model_name)

pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    top_k=50,
    temperature=0.1
)
llm = HuggingFacePipeline(pipeline=pipeline)

chat = ChatHuggingFace(llm=llm, verbose=True)

In [ ]:

Copied!

#del llm
import gc
gc.collect()
#del llm
import gc
gc.collect()

In [ ]:

Copied!

# clear GPU memory
import torch
torch.cuda.empty_cache()
# clear GPU memory
import torch
torch.cuda.empty_cache()

Define the prompt¶

In [ ]:

Copied!

from langchain_core.globals import set_debug

set_debug(False)
from langchain_core.globals import set_debug

set_debug(False)

In [ ]:

Copied!





#    "You are a troubleshooting chatbot that talks like a pirate."

system_prompt = (
    "You are a troubleshooting chatbot.\n"
    "Your goal is to help user solving the problem they have with their appliance.\n"
    "Use the following pieces of retrieved context to answer the question.\n"
    "If you don't know the answer, say that you don´t know.\n"
    "Use three sentences maximum and keep the answer concise.\n"
    "\n\n"
    "{context}\n"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(chat, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)
#    "You are a troubleshooting chatbot that talks like a pirate."

system_prompt = (
    "You are a troubleshooting chatbot.\n"
    "Your goal is to help user solving the problem they have with their appliance.\n"
    "Use the following pieces of retrieved context to answer the question.\n"
    "If you don't know the answer, say that you don´t know.\n"
    "Use three sentences maximum and keep the answer concise.\n"
    "\n\n"
    "{context}\n"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(chat, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [ ]:

Copied!

import os

from google.colab import userdata

hf_token = userdata.get('HF_TOKEN')
os.environ["HUGGINGFACEHUB_API_TOKEN"] = hf_token
import os

from google.colab import userdata

hf_token = userdata.get('HF_TOKEN')
os.environ["HUGGINGFACEHUB_API_TOKEN"] = hf_token

In [ ]:

Copied!

response = rag_chain.invoke({"input": "I have an Inissia coffee machine and my coffee is cold. What should I do?"})
response["answer"]
response = rag_chain.invoke({"input": "I have an Inissia coffee machine and my coffee is cold. What should I do?"})
response["answer"]

In [ ]:

Copied!

import gradio as gr
import gradio as gr

In [ ]:

Copied!





def add_message(history, message):
    if message["text"] is not None:
        history.append((message["text"], None))
    return history, gr.MultimodalTextbox(value=None, interactive=True)

import random

color_map = {
    "harmful": "crimson",
    "neutral": "gray",
    "beneficial": "green",
}

def html_src(harm_level):
    return f"""
<div style="display: flex; gap: 5px;">
  <div style="background-color: {color_map[harm_level]}; padding: 2px; border-radius: 5px;">
  {harm_level}
  </div>
</div>
"""

def fake_bot_response(history):
    response_type = random.choice(["text", "gallery", "image", "video", "audio", "html"])

    if response_type == "gallery":
        history[-1][1] = gr.Gallery(
            [
                "https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/bus.png",
                "https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/bus.png",
            ]
        )
    elif response_type == "image":
        history[-1][1] = gr.Image(
            "https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/bus.png"
        )
    elif response_type == "video":
        history[-1][1] = gr.Video(
            "https://github.com/gradio-app/gradio/raw/main/demo/video_component/files/world.mp4"
        )
    elif response_type == "audio":
        history[-1][1] = gr.Audio(
            "https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav"
        )
    elif response_type == "html":
        history[-1][1] = gr.HTML(
            html_src(random.choice(["harmful", "neutral", "beneficial"]))
        )
    else:
        history[-1][1] = "Cool!"
    return history

def llm_response(history):
    response = rag_chain.invoke({"input": history[-1][0]})
    history[-1][1] = response["answer"]
    return history

bot_response = llm_response
def add_message(history, message):
    if message["text"] is not None:
        history.append((message["text"], None))
    return history, gr.MultimodalTextbox(value=None, interactive=True)

import random

color_map = {
    "harmful": "crimson",
    "neutral": "gray",
    "beneficial": "green",
}

def html_src(harm_level):
    return f"""

  {harm_level}
  

"""

def fake_bot_response(history):
    response_type = random.choice(["text", "gallery", "image", "video", "audio", "html"])

    if response_type == "gallery":
        history[-1][1] = gr.Gallery(
            [
                "https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/bus.png",
                "https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/bus.png",
            ]
        )
    elif response_type == "image":
        history[-1][1] = gr.Image(
            "https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/bus.png"
        )
    elif response_type == "video":
        history[-1][1] = gr.Video(
            "https://github.com/gradio-app/gradio/raw/main/demo/video_component/files/world.mp4"
        )
    elif response_type == "audio":
        history[-1][1] = gr.Audio(
            "https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav"
        )
    elif response_type == "html":
        history[-1][1] = gr.HTML(
            html_src(random.choice(["harmful", "neutral", "beneficial"]))
        )
    else:
        history[-1][1] = "Cool!"
    return history

def llm_response(history):
    response = rag_chain.invoke({"input": history[-1][0]})
    history[-1][1] = response["answer"]
    return history

bot_response = llm_response

In [ ]:

Copied!





with gr.Blocks() as demo:
  chatbot = gr.Chatbot(
    [[None, "Hi, I'm your assistant. Ask me anything or upload a product manual to start."]],
    elem_id="chatbot",
    bubble_full_width=False,
  )

  with gr.Row():
    chat_input = gr.MultimodalTextbox(
        scale=4,
        interactive=True,
        placeholder="Enter message or upload file...",
        show_label=False
    )
    clear_button = gr.ClearButton([chat_input, chatbot], value="Clear chat")

  chat_msg = chat_input.submit(add_message, [chatbot, chat_input], [chatbot, chat_input])
  bot_msg = chat_msg.then(bot_response, [chatbot], chatbot)
  bot_msg.then(lambda: gr.MultimodalTextbox(interactive=True), None, [chat_input])

demo.launch(share=True)
with gr.Blocks() as demo:
  chatbot = gr.Chatbot(
    [[None, "Hi, I'm your assistant. Ask me anything or upload a product manual to start."]],
    elem_id="chatbot",
    bubble_full_width=False,
  )

  with gr.Row():
    chat_input = gr.MultimodalTextbox(
        scale=4,
        interactive=True,
        placeholder="Enter message or upload file...",
        show_label=False
    )
    clear_button = gr.ClearButton([chat_input, chatbot], value="Clear chat")

  chat_msg = chat_input.submit(add_message, [chatbot, chat_input], [chatbot, chat_input])
  bot_msg = chat_msg.then(bot_response, [chatbot], chatbot)
  bot_msg.then(lambda: gr.MultimodalTextbox(interactive=True), None, [chat_input])

demo.launch(share=True)