Plaudere Engineering

Connection error with server. Refresh the page.
Activity or inactivity time has been reached. Refresh the page

PLAUDERE ENGINEERING

AI ASSISTANCE WITH QNA MODELS

Image Text

53 views

#artificialintelligence #engineering #plaudere

Creating an artificial intelligence (AI) solution is a challenging task. It needs a very clear understanding of the problem you want to solve and how AI can help fix it. For instance, within the Plaudere website, users need to understand how to find and use the website's features, including those who read publications from others. To improve user experience and productivity, a relevant business case for Plaudere is to help users navigate and use the website effectively. This includes guiding them through the different sections and offering insights based on user generated publications. However, as server power is limited, the goal is not to use a highly sophisticated generative AI solution, but rather to use available resources that work with our current infrastructure.

The Rise of Generative AI

The sudden appearance of generative artificial intelligence, with tools like ChatGPT, was a major breakthrough across all industries. For the first time, the potential of artificial intelligence as a tool to empower people was clearly demonstrated. These recent years, since generative AI models such as ChatGPT, Gemini, Co Pilot, and DeepSeek emerged, have been years of discovery, research, and testing for developers around the world. These tools are challenging the way applications were created before. Organisations began developing solutions, often using existing Large Language Models (LLMs) from well known providers who offer them as software as a service (SaaS). The vast knowledge of LLMs is combined with specific business contexts to help create focused AI assistants. These assistants can then connect with data workflows and business applications, starting to empower business and technical users, alongside consumers and clients.

Creating software will not be the same since generative AI arrived. It is very likely that all applications will, in the future, include layers of AI to help users get more from their software. For this reason, at Plaudere, once the interface for creating spaces and posts was done and streaming features were added as an in-house experiment to build a custom streaming experience within our infrastructure limits (article link here), the next challenge was to add an AI layer to the website. This involved several experiments exploring available resources, particularly those that were open source.

The Challenge of Building Your Own LLM

At Plaudere, our first experiment involved understanding how an LLM is created and the resources this type of software requires. Imagine you have a huge amount of text written by people in a specific language. This collection of text, called a corpus, is fed into a software program. This program begins to break the text into words or their basic parts. It then starts to pay close attention to how these words relate to each other. It creates vectors of these relationships, which show how close different words are in a complex conceptual space, where each dimension represents a specific aspect of their meaning. These dimensions indeed represent concepts. A concept gives context and allows us to understand why, for instance, two words come together in one concept and have no relationship in another.

Training large, foundational transformer models (an AI architecture particularly good at identifying complex relationships between words) from scratch typically occurs on powerful servers. This process can take hours or even days to run. The model stops learning when it reaches a specific performance target, often measured by its validation performance or accuracy. The trained model is then transferred to a responding system, allowing you to use a part of your original corpus to test it, asking questions and getting answers.

If you try to do this yourself on a typical personal computer, even running it for a day or so, your model's responses would likely repeat words or produce nonsense sentences. You could improve this by giving it more corpus, running the training on multiple computers in a network at the same time (parallelising the training), and increasing the training hours. After some research, it is clear that the amount of time and computer power needed to get an LLM to perform well is simply out of reach for most developers with a limited budget.

Using Existing LLMs: A New Path

Since creating an LLM from scratch is out of reach for most of us, a new objective would be to use an existing LLM and install it on a server or local computer so it can run and serve your users. If you explore the current landscape of open source AI models, you can indeed download and run an LLM or even a tiny Language Model (LM) from platforms like Hugging Face, such as GPT2, Blenderbot from Facebook among others. However, as expected, these are not lightweight. Even if optimised or quantised (made smaller for efficiency), they might fit your local infrastructure, but they still need a lot of resources when loading and running. They can also have many incompatibilities, for instance with models built in Python or ONNX (Open Neural Network Exchange). This requires a great deal of patience to adapt the model to your infrastructure, ensuring it fits your available resources and runs within their limits.

Considering Plaudere's challenge of limited server power, using even the tiniest LLM available can pose a challenge both in terms of the model's answer quality and the significant GPU and RAM resources needed to run it within Node.js. Therefore, it became clear that it would make more sense to separate the LLM infrastructure from the current website, making it a type of API (Application Programming Interface) that could be called as an individual endpoint to offer the LLM service. However, since the objective is to keep the website as streamlined as possible for now, these types of experiments were set aside, and other solutions were considered.

Older AI Models: A Practical Approach for Plaudere

Looking back at earlier Artificial Intelligence (AI), before the rise of generative AI models, there were models used such as those based on Natural Language Processing (NLP). These models used neural networks to understand word relationships. They did not use the advanced transformer models we have today, but they offered decent solutions. Other models, like those for classification, can recognise sentences with specific tags. The model then attempts to read new texts and assign them to existing tags.

At Plaudere, various experiments with non generative AI models were conducted, to see if they could meet our initial goal: to use the existing server capabilities and provide average performance when answering user requests about the website and its posts. One model was particularly interesting to explore. It is based on DistilBert (repository link here), which was pre-trained in English, 'uncased' (meaning it ignores capital letters) and quantised (made smaller for efficiency) for optimisation. This model is able to find an answer from a given context text, predicting where to stop and extracting the relevant part of the text. This approach became the foundation of Plaudere's first AI powered website module: a question and answer model, pre-prompted with content from the website to help users navigate it.

DistilBert is a smaller, faster, and more efficient version of the powerful BERT transformer model, designed to perform similar language understanding tasks with fewer computational resources.
BERT (Bidirectional Encoder Representations from Transformers) is a groundbreaking AI model developed by Google. It processes text by understanding the context of words from both directions (left-to-right and right-to-left simultaneously), allowing it to grasp the full meaning of a sentence much more effectively than previous models.

As the website supports also Spanish, another DistilBert model pre-trained in Spanish was found (repository link here), which is also uncased. However, for use with the Xenova Transformer package (respository link here) which loads AI models from Hugging Face, this Spanish model needed to be quantised and in ONNX format. Therefore, it was quantised and converted to ONNX by using Python scripts, and was then published to Hugging Face (repository link here), to allow other developers use this model.

If you insert a great text as context more than 500 words in a QnA model, it can start losing efficiency, and for this reason, there was an npm package called Fuse (repository link here), that allows to pre filter the context text based on searching algorithms to increase the efficiency of the QnA model.

It is clear that this combined approach, using pre-filtering search and a QnA model, can be greatly improved. However, for practical purposes, it offers decent answers, even with some delays in responding. This functionality is quite beneficial: it helps users avoid needing prior experience with the website, allowing them to start using the website faster. Furthermore, it assists them in understanding posts from writers on Plaudere without reading every line of the publication.

Conclusion

This project is in its early stages but demonstrates the potential of lightweight AI models to enhance user experience with limited infrastructure. Testing in local and production environments has shown promising results, and ongoing improvements will continue to leverage the AI revolution. Have you begun experimenting with AI models like these? Just begin and keep improving.

Joe Esteves

Spaces

Privacy

About us