Leveraging Generative AI for Internal Excellence: A 24-hour Hackathon Experience

Discover more about our colleagues' experience participating in ShipIT, a 24-hour hackathon at Accesa, which offered an opportunity to leverage generative AI for internal solutions.

Philipp Farnschlader (AI Engineer) and Cristian Chiriac (Frontend Software Engineer) have crafted this article, which was previously featured in the April 2024 issue of Today Software Magazine.

Leveraging Generative AI for Internal Excellence: A 24-hour Hackathon Experience

In a world of constant change and innovation, participating in hackathons represents an extraordinary opportunity to test skills, collaborate with colleagues, and build something new and innovative. Recently, we had the privilege of participating in ShipIT, a 24-hour hackathon at Accesa, this time around focusing on implementing different internal solutions based on the possibilities that generative AI brings to the table. Specifically, our team aimed to develop a chatbot that centralises knowledge throughout the company and retrieves relevant information for all Accesa-internal questions using a technique called retrieval augmented generation (RAG) and the OpenAI API.

But, let’s start at the beginning. From our daily work and discussions with colleagues, we know that finding relevant documents and information within the company’s realm oftentimes creates significant overhead. To solve this, we envisioned a solution where employees can easily ask for relevant information stored within the company’s policies. Let’s say, for example, you would like to know if you are allowed to bring your dog to work. This information certainly resides somewhere but finding it would require loads of time or a ping to a coworker, whom you might not even have identified yet. Also, think of questions regarding specific processes and their increments, any information buried deep in some company policy or who to align with on a specific topic. This information exists if only you knew where to look for it. With a chatbot at hand, retrieving relevant information becomes much faster, especially when it can reference the documents from which it gets its content.

In the following, we will give you a brief overview of the solution architecture, followed by a more in-depth look into the front and backend technologies used in our project and how they contributed to creating an efficient and user-friendly solution. Finally, we round off with a short elaboration on our team’s hackathon experience and our solution’s success.

Image 1 - event group photo.jpg

Accesa GPT 1.1 Team; From left to right: Vlad Mihalea, Diana Talos, Irina Fizesan, Philipp Farnschlader, Simona Lucut, Alpar Kocsis, Mihai Gherasim, Cristian Chiriac

Architectural Overview

We decided to use a generative AI technique called Retrieval Augmented Generation to accomplish our goals. In essence, RAG systems are able to hand over new knowledge to a large language model (LLM) without having to retrain or finetune it. Instead of changing anything about the LLM, the idea is to hand over relevant context to the LLM that contains the knowledge it needs to formulate an appropriate answer. The diagram shows the application’s architecture. The user flow starts at the frontend, which is a simple chat interface with material design language, that is streamlined to the company’s internal communications page design. When a user queries a question in the frontend, it will be handed over to a Python backend. The backend first checks whether this is the first query or if there has been any previous interaction, so that it can retain the history of the conversation. Next, the user query is used to check for existing knowledge within a vector database based on similarity. If relevant knowledge is found, the user query is handed to the LLM (GPT-4) together with the previous chat history and the relevant knowledge. The response of the LLM is then sent back to the frontend, which displays it in a chat-like fashion. Next, we will go into depth a bit more on what exactly is happening technology-wise, and why we chose this tech stack.

TSM_Visual_April_Website (2).png

Created by the authors: AI Chatbot Architecture Overview

Chatbot Interface

In our attempt to develop an efficient application to meet the users’ needs, we chose to use the following UI technologies that promise a good user experience. Mainly, we focused on simplicity, familiarity of the design, and reusability for potential feature expansion at a later time. First in our choice was Next.js, renowned for its power in server-side rendering and static site generation. With its robust capabilities, Next.js lays down a well-built foundation for creating efficient web applications. Teamed up with React, the industry-leading user interface framework, we craft reusable components, ensuring an intuitive user journey. In order to streamline our chatbot’s design with our internal communications ecosystem, we have turned to Material-UI for its rich collection of built-on components and easy adaptability to match the aesthetics of our existing company tool landscape. We used Tailwind CSS, a useful and flexible CSS framework to add a bit of personality to our application. Tailwind allowed us to create an attractive visual appearance using predefined classes for quick and efficient styling. Finally, to add an extra interaction to the user experience, we have incorporated Typed.js into our toolkit. This dynamic JavaScript library injects real-time text animations. We chose this library with the aim of providing users with a pleasant experience of communicating with our chatbot and to make this interaction feel as realistic as possible, simulating the presence of a real person behind willing to help.

Chatbot Backend

The selection of Python as the primary backend language was motivated by its ease of learning and robust support for AI development, particularly through libraries like LangChain, which simplifies the integration with OpenAI's Large Language Models (LLMs). This choice was crucial in developing an AI-powered application capable of extracting and utilising company-specific information from internal documents.

To ground the AI’s information on facts and data inherent to Accesa, the first step was to design a pipeline that takes in real documents and transforms them into a usable format. We started out with documents from Sharepoint, which we stored in our backend to prove the concept– a next increment would be to integrate Sharepoint directly to have access to all kinds of data. These documents were parsed and divided into fixed-length chunks with some overlap to maintain semantic continuity. Each chunk was annotated with the document's name and the URL of its original storage location, which later aids in tracking the source of the information used by the LLM.

The chunks are then converted into vectors by calling OpenAI’s embeddings API, which takes any string of words (i.e., a chunk) and embeds its meaning into a high-dimensional numerical vector. The vector is then stored in a vector database, for which we used the Meta’s open FAISS vector store. This vector database now holds all vectors corresponding to all chunks and can be queried in terms of similarity.

When a user question comes in through the frontend, the question is received on backend and transformed into a vector representation, once more using OpenAI’s embeddings API. This vector representation is then compared to all the vectors in the vector database using FAISS’s inbuilt cosine similarity. The idea behind this is that we want to retrieve exactly those documents that are most like the input question. If vectors are sufficiently similar to the user query, the corresponding chunks should be handed to the LLM as context.

Next to the context and the input query, the prompt also contains a system message, which plays a critical role in shaping the LLM's response, guiding it on how to answer, format its output, and explicitly instructing it to base its responses on the provided context. The crafting of this message, or "prompt engineering," emerged as a crucial aspect, with the team discovering that too restrictive instructions could hinder the LLM's ability to provide answers.

The system message also includes directives for the LLM to cite the sources it got from the context in a specific JSON format so that the answer of the LLM can be sent straight to the frontend, which takes on the parsing of the JSON and displays the referenced PDFs accordingly. To achieve this, we used a technique called one-shot prompting, so that the LLM understands the format we would like to obtain.

This method showed that while the LLM could adequately construct the JSON and accurately reference the sources it used, there was a trade-off in terms of token usage efficiency. Nonetheless, this approach was preferred for its simplicity and effectiveness in implementation.

Finally, the integration of LangChain allowed for the seamless assembly of the user's query, the relevant context, and the system message into a single prompt. The LLM's response, based on this prompt, was then delivered to the frontend. To support continuous conversation, each interaction - comprising the user's query, the prompt, and the LLM's response – is also saved in the backend. This chat history ensured that the context of previous interactions was preserved, enhancing the user's experience with relevant and coherent follow-up responses.

Foto-346.jpg

Accesa GPT 1.1 Team; From left to right: Cristian Chiriac, Diana Talos, Irina Fizesan, Alpar Kocsis, Simona Lucut, Philipp Farnschlader, Mihai Gherasim, Vlad Mihalea

The Success of a Common Vision

After close to 24 hours of intense collaboration, we successfully demoed our chatbot live to the jury and demonstrated the functioning of our solution based on multiple example queries. Our system answered the questions correctly, while accurately referencing the PDFs it used to generate its answers.

We also checked whether the chatbot would answer to any inputs that are not specifically based on the context provided, which it succeeded in. If there was one thing that needs to be improved upon, it would be latency – potentially, a productive solution should look into hosting faster LLMs locally, as we used the relatively slow GPT-4 API.

Overall, the hackathon experience was intense, but the active and conscious involvement of the entire team made it unforgettable. This is not least due to the great organisation of the event itself, with clearly communicated goals, guidelines, timelines, and expectations. Across the board, within the team and, from what we’ve gathered, all other teams, the vibe was productive, eager, and, most importantly, a lot of fun. The demo of all the developed solutions was probably the most exciting part; seeing what our colleagues have built in this short period of time and how they used generative AI for different solutions was simply outstanding.

With that said, on behalf of our team, thanks a lot for the organisation of this highly educational hackathon, the exposure to state-of-the-art technology, and all other teams for the amazing solutions they’ve built in this short time.