HackathonParty

Framing the Problem

Millions of Americans are living with preventable diseases — heart disease, type 2 diabetes, obesity — all tied not just to diet, but to ignorance by design. While the FDA and USDA regulate what’s considered “safe,” countless ingredients — from EDTA to parabens — are widespread in food and personal care products — despite carrying documented risks of endocrine disruption, carcinogenicity, and long-term bioaccumulation.

Our team first noticed this contradiction when our Materials Engineering teammate mentioned working with EDTA in a lab and being explicitly told to handle it with extreme caution. Yet, that same compound appears casually in foods and personal care products. Charlotte even remembered spotting EDTA on her cracker box, a moment that made the abstract danger startlingly personal.

Importantly, this issue isn’t fairly distributed — recent epidemiological studies have linked certain ingredients used disproportionately in hair care products primarily marketed to Black women, such as hair relaxers, to elevated cancer risks, reflecting deep racial inequities in the impact of ingredient safety standards.

Consumers aren’t failing to read labels — the system is failing to make them understandable. Many marketing terms aren’t regulated, and “natural,” “fragrance,” or “safe” often deceptively conceal underlying dangers. Our mission is simple: decode these labels using AI-powered ingredient analysis, bridging the gap between regulatory language and real-world health data. We want people to know what they’re trusting with their bodies.

Idea Explanation

We built a first-of-its-kind platform that empowers consumers to instantly understand what’s in their products without having to individually search up each individual ingredient (which the consumer may struggle to even spell to look up!). For this reason, we created a website that allows users to simply either take a picture or upload an existing photo of an ingredient label and receive information about each ingredient. This information includes things like the ingredient's common uses, health safety information, environmental impact, and if it's edible or not. In addition to this, the website gives a health safety rating for each ingredient, and gives an overall rating for the health safety of the product as a whole after analyzing all the ingredients. By educating consumers about the potential risks associated with the ingredients in their products, it empowers them with informed consent, allowing them to choose what to put in and on their bodies wisely.

Implementation

Our project, Scanadillo, uses a Next.js frontend connected to a Flask + Python backend through REST API calls. In the frontend, the user is able to either upload or capture an image of an ingredient label, which is then sent to the backend after being converted into Base64 format for easier scanning. The backend uses OpenCV to preprocess the Base64 image through enhancing contrast, sharpening, denoising, etc. It then uses GPT-4o to extract text from image, cleaning up typos, and returning a sanitized list of ingredients. This list is sent to an analyze ingredient function which uses GPT-4o to give details on each ingredients including ratings which is all packaged into a json that is sent back to the front-end to display for the user in the ingredients tab. In the Chatbot tab, the user is able to answer questions that they might have about any specific ingredient(s) and it is powered by the same backend, using GPT-4 for analysis and output. Currently, the data is processed in memory without the use of a database, but in future versions, we plan to implement image history for each user for convenience.

Our platform is built using a React/Next.js frontend and a Flask + OpenAI backend, which communicate through REST API requests.

On the frontend, users can either upload or take a photo of a product’s ingredient label. This image is sent to the Flask backend, which then:

Preprocesses the image using OpenCV and Pillow to enhance text clarity.
Extracts the ingredient list using GPT-4o’s vision model for text recognition within the image.
Cleans and verifies the text, ensuring only valid ingredient names remain.
Analyzes each ingredient with the custom IngredientEngine, which generates a short blurb, chemical details, and a numeric Health Safety Score for each one.

The backend returns this structured data to the frontend, where the Ingredients page displays:

A list of detected ingredients with expandable details,
Individual safety scores, and
An overall product safety rating.

Users can also chat with an AI assistant on the Health Chat tab. The frontend sends their questions to the backend via /api/chat, where GPT-4o-mini answers using the analyzed ingredient data as context.

In summary, the frontend handles all user interaction and display, while the backend performs text extraction, AI-powered ingredient analysis analysis, and safety scoring, allowing the two to work seamlessly together through simple JSON-based API calls.

Challenges

Text Extraction Accuracy: Many ingredient labels are curved, glossy, or torn. Early versions of the model misread or split words like “Sodium Isostearate.” We overcame this by combining OpenCV preprocessing (adaptive thresholding + morphological repair) with multi-step GPT-4o + GPT-4o-mini verification and custom regex post-processing that merges fragments and preserves multi-word INCI names.
Maintaining Correct Ingredients Order: GPT sometimes re-sorted ingredients alphabetically. We fixed this by explicitly prompting the model to preserve sequence and by logging order mismatches to the Flask console for debugging.
Preventing Hallucinations: The LLM occasionally invented plausible but nonexistent chemicals. We mitigated this with conservative prompting (“do NOT add new ingredient stems”) and post-filters removing unverified terms that the model was not 100% confident in, such as the common hallucination “Sodium Laurate.”
The Importance of Structure & Speed: Early on, our API calls were unoptimized and slow. We learned that restructuring how data was sent between the frontend and backend could significantly improve response time.
Frontend Integration: Making the data flow seamlessly between the Next.js frontend and Flask backend required careful handling, especially for image uploads and API responses. We debugged by inspecting real-time output from Flask and matching it to React state changes.
Accomplishments

Achieved near-perfect ingredients label reconstruction from images, even for labels that are partially torn, have glare, or are blurry, after integrating the multi-tiered GPT + OpenCV pipeline.
Built a full-stack AI web app from scratch using modern frameworks (Next.js + Flask + OpenAI API).
Designed an elegant, user-friendly interactive interface with expandable ingredient cards, sortable ingredients information, and an friendly AI chatbot
Developed a robust backend capable of extracting, verifying, and semantically analyzing real-world label images in seconds
Learned deep skills in LLM prompt engineering, image preprocessing, REST API integration, and frontend–backend orchestration

Next Steps

Ingredient Database Integration: Build a database (e.g., PostgreSQL or Firestore) containing verified ingredient information from trusted sources (PubChem, CosIng, USDA FoodData Central). This would let the engine cross-check GPT-4o output for consistency and dramatically reduce hallucination risk.
Mobile App & Browser Extension: In the future, we’d like to expand Scanadillo into a full mobile app and a browser extension. During the hackathon, we focused on the web version because we thought mobile deployment would be too time-intensive for a single weekend. However, a native app and extension would make scanning products in stores or online effortless.
Save User History: Allow users to save scans, track the health ratings of the products they’ve used over time, and compare products easily.
Barcode Scanning: Add a secondary input path that queries product databases (like OpenFoodFacts) when a barcode is recognized.
Community Reporting & Verification: Let users flag incorrect AI summaries and contribute verified data to continuously improve model reliability.
Product Suggestions: Implement a web searching feature that allows the AI chatbot to recommend alternative products with safer ingredients to users.

Scanadillo

Curl up in safety: Let Scanadillo find the hidden dangers lurking in your label.

Framing the Problem

Idea Explanation

Implementation

Challenges

Accomplishments

Next Steps