HackathonParty

Framing the Problem

What is the problem you are trying to solve? Who does it affect?

Imagine this, you are on vacation, travelling to a whole new country. You've got an afternoon left before heading back home, back to work. Having visited all of your planned spots, you're not sure what to do for the time being. You don't want to stay in the hotel room, that's boring!

Over your stay, you may be enthralled by different artworks, buildings, or food, but not know exactly what it is that you are looking at. You may wish that you knew more about a particular landmark or painting once you see it in person. You may also wish to find more similar marvelous works.

Recently, you came across a piece of art unlike anything you've seen before, and there's not much information about it. You want to learn more about this artist and this unique style, but not sure where to start. If only there were a tool to help.

These are all issues that TravelBuddy AI seeks to solve.

Idea Explanation

What is your idea? How does it fix the problem?

Introducing TravelBuddy AI, a software that allows you to take a picture of or upload artwork, food, or architecture, and analyzes it. Then, TravelBuddy AI gives you a brief description of what it is, as well as a list of suggestions for similar items.

If you saw a beautiful piece of artwork earlier in the trip and snagged a photo, you can upload it to TravelBuddy AI and get some suggestions as to what undiscovered gems might interest you. Similarly, if you had the best pasta for as long as your memory serves you, TravelBuddy AI can help recommend some similar restaurants. If you passed by a breathtaking monument, TravelBuddy AI will find you other landmarks just as wonderful. No more wasted afternoons, staying in a room doomscolling. TravelBuddy AI will find interesting things for you to do on that last afternoon, before the magical vacation ends.

If you come across a piece of art, foreign to your knowledge. TravelBuddy AI will help. It can analyze the piece and find the artist, history, and styles. You can then use the suggestions it provides to help you learn more about the artist, style, and more. This is especially useful as TravelBuddy AI can find details hard to spot with our eyes, like the types of paint used. Exploring art has never been easier.

Implementation

How do all the pieces fit together? Does your frontend make requests to your backend? Where does your database fit in?

The front end is built using HTML, CSS, and JavaScript. The backend is built primarily using Python, along with Flask. We utilize a lot of external tools to bring our dream to reality. These tools include Google Cloud Vision API, Google Generative AI API, Google Places API, and REST API. These tools all work together to help spread art appreciation.

On a website hosted locally by the front end, Google Cloud Vision API looks at a photo or file, and determines what it is looking at: for example, sushi. This information is then passed to the back-end, where details about this are found using Google's Gemini LLM. Additionally, we use Google Gemini to find suggestions. For artwork, we base suggestions on the artwork's creator, era, style, and more. For food, we base suggestions on the cuisine, restaurant location, and ingredients. For architecture, we base suggestions on the building's era, style, purpose, and location. We use carefully engineered prompts to find the best recommendations possible. These prompts are designed to get responses within 46 seconds on average.

With these suggestions, we also use Google Gemini to find each Wikipedia link and use Google Places API to return a Google Maps link, as necessary. REST API allows us to run everything perfectly on a mobile device as well. Everything is stored on GitHub.

Challenges

What did you struggle with? How did you overcome it?

We needed to use a lot of external tools, along with just programming. We implemented numerous APIs, all of which we needed to figure out on a case-by-case basis. Beyond just learning to use APIs, the documentation for a lot of newer ones were inconsistent, resulting in a lot of trial and error. We also ran into multiple issues with GitHub, leading to us creating a new repository.

The Gemini Model we have access to is also not the most up-to-date one. This meant it struggled with some amounts of hallucinations, where the LLM outputs false information and presents it as true. We had to carefully prompt the LLM to avoid this. Additionally, we could not access any events in the past year, which is an issue that we could not overcome using the free model of Gemini API, which has no grounding: a way for LLMs to verify their response using the internet. Instead, we focused on using the flash version of Gemini, striving for the fastest responses possible, while not sacrificing accuracy.

Another issue is that Gemini took a long time to obtain suggestions. We are unable to mess with the actual process that Gemini goes through. We optimized this process as much as possible by improving the efficiency of our code and engineering our prompts to be even faster. We A/B tested all prompts to find the fastest way of getting an accurate response.

Accomplishments

What did you learn? What did you accomplish?

We are most proud of the fact that we made a full working product within the time constraint. TravelBuddy AI has its main functionality worked out, as well as architecture for future additions. It has a polished front-end, as well as a fairly organized back-end.

We learned how to work with and implement Google Vision API, Google Generative AI API, Google Places API, and REST API, most of which we had no prior experience working with. We learned how to A/B test prompts and what kinds of prompts an LLM responds best to.

Next Steps

What are the next steps for your project? How can you improve it?

The most imminent step is to upgrade to a better version of the Google Vision API. This should allow us to work with a larger variety of subjects, hopefully expanding our functionalities to famous jewelry, cultural clothing, famous landscapes, and much more. Additionally, we will look to upgrade to a paid version of Google's Generative AI API. This should allow for grounding, as well as an overall improvement to response time and accuracy.

Then, we could also explore training our own LLM to be really good, specifically at recognizing art. Similar to this, we could also look to implement a multi-agent system, hopefully speeding up the suggestions function. We currently have one agent who handles all interactions with Gemini, but using Google's own ADK and A2A system may help performance.

Another idea is to find a way to host a website or app. We were unable to do this as a lot of the APIs require billing information, so we had to hide their keys from the GitHub repository. We were unable to figure out an acceptable solution to this, and so everything we did can only be accessed locally. It would be great if we could use a third-party, like Vercel, to host our service, provided we find a secure way to store the API keys.

Along with this, we can try and bring the project to market. There is a genuine use for what we do; also, we can try and explore what it takes to make and maintain a successful service.

TravelBuddy AI