Crafting experience...
10/26/2025
A Project Made By
Jason Li
Engineer
Emlynn Rossiya
Engineer
Anilov Villanueva
Engineer
Skylar Liu
Engineer
Owen Brooks
Engineer
Submitted for
Built At
Gator Hack IV
Hosted By
In the modern era, tools like DALL-E, Midjourney, and Stable Diffusion have made it possible for anybody to generate stunning images in a matter of seconds. However, as we move forward with AI, it is important not to forget the true artistic inspirations underlying AI training models. Our project aims to bridge the gap between AI models and the original art used in their datasets by creating a system which can trace the origins and inspiration behind AI generated images, helping artists, users, and platforms to ensure fair recognition and transparency in digital creativity.
With the meteoric growth in artificial intelligence technology over the past few years, millions of images are being created and shared every day. It is estimated that 100 million images are generated by ChatGPT users on any given day. Many of these imitate the existing work of artists without giving credit to or compensating the original artists.
Though it has always been notoriously difficult for small artists to break through and make a living with their art, the uncredited use of artists' work in AI training datasets only makes it harder for artists to distinguish themselves. Currently, there is no effective way for artists or platforms to verify the stylistic inspiration behind AI images, or to protect the work of artists.
The main groups who are affected by this issue are:
Artists: their works and styles are being replicated without their consent, leading to a loss of recognition and loss of income
Art communities and online platforms: they struggle to moderate and label AI generated content.
At its core, our system allows users to upload AI-generated artworks and discover the original artists, styles, or influences that may have inspired the model’s output. By combining the power of CLIP (which links visual and textual concepts) with Convolutional Neural Networks (CNNs) (which analyze visual features and patterns), our application identifies both stylistic and conceptual similarities between AI-created works and real human art. This approach bridges the gap between AI creativity and human originality, promoting transparency, attribution, and respect for artistic sources in the age of generative art.
The system consists of five primary components:
Embedding Pipeline – Generates embeddings for the entire art dataset using OpenAI’s CLIP Vision Transformer (ViT-L/14) in addition to a CNN custom trained to recognize artworks. Original images and metadata are sourced from wikiart (https://www.kaggle.com/datasets/steubk/wikiart).
Database Layer – Stores and indexes embeddings using PostgreSQL with pgvector, as well as cloud databases like Pinecone
API Backend – Fast-API handles inference, search queries, and hybrid similarity computation.
Frontend Interface – A React-based web application for interactive exploration.
Cloudflare CDN – Hosts and serves roughly 90,000 images for global access and performance.
What did you struggle with? How did you overcome it?
The system to generate the embeddings took a long time to run:
Another major issue we faced was how long it took to generate embeddings for our dataset. Since we were processing hundreds (and sometimes thousands) of high-resolution images, the computation time on CPU alone was extremely slow — even small batches took several minutes to complete. To solve this, we switched our setup to use a GPU runtime, which allowed us to parallelize the embedding generation and cut processing time dramatically. This optimization made it possible to index and query our dataset efficiently before the end of the hackathon.
Deciding which deep learning model, CNN or CLIP, to use to identify the matching artist style:
Early in development, we struggled to decide whether to use CNNs (for visual feature extraction) or CLIP (for linking visual and semantic concepts). CNNs handled fine-grained texture and color well, but CLIP captured broader stylistic relationships.
After running multiple experiments, we decided to combine both models, using CLIP for conceptual matching and CNN for visual detail comparison. This hybrid setup gave us stronger and more consistent results.
The CNN model had a low confidence score when trying to find similar images in the database:
Initial CNN matches had low confidence due to inconsistent normalization and thresholds. We fixed this by unit-normalizing embeddings, tuning similarity cutoffs, and backing the search with FAISS over CLIP embeddings. Confidence and precision both improved.
What did you learn? What did you accomplish?
Throughout this project, we deepened our understanding of deep learning architectures and their applications in visual analysis. We explored and implemented both Convolutional Neural Networks (CNNs) and CLIP, studying how their respective layers process visual and semantic information differently. This helped us understand how CNNs specialize in feature extraction (such as texture, color, and structure), while CLIP bridges vision and language by linking visual patterns to conceptual meaning.
We also learned how to preprocess large-scale image datasets, generate and index embeddings efficiently using FAISS, and optimize performance for real-time similarity search. Collaborating as a team taught us valuable skills in backend integration, model deployment, and frontend visualization using modern frameworks like Next.js and Tailwind CSS.
What are the next steps for your project? How can you improve it?
Adding a feature for verified artists to upload their portfolio to the database
Display artists' names in the result and show examples of their artwork
Fine-tune the models to get more accurate matches
Add a more recent dataset of more artists and their artwork
Implement these systems alongside existing social media AI detection programs