HackathonParty

Framing the Problem
Current educational Q&A systems face three key challenges: retrieval inefficiency, limited context windows, and token inefficiency. Traditional vector databases perform linear searches with O(N) complexity, requiring hundreds of comparisons per query. Large documents often exceed LLM context limits, forcing truncation or multi-pass processing. Retrieving large text chunks wastes memory when answers are usually short. This problem affects students who waste time searching materials, teachers who cannot scale help to large classes, and institutions that find current RAG systems too costly for widespread use.

Idea Explanation
SCOPE Trees (Selective Compression for Optical Progressive Exploration) is a hierarchical tree-based RAG system that organizes documents into four levels of compression. Level 1 provides a 20× compressed gist, Level 2 divides the document into 10 compressed sections, Level 3 into 100 subsections, and Level 4 into 500 uncompressed excerpts. Using optical compression via DeepSeek-OCR’s vision-language model, text is rendered as images and encoded into compact vision tokens, achieving up to 10× compression with 97% accuracy.

The system fixes inefficiencies by replacing linear searches with a hierarchical traversal. A navigation LLM decides which sections to explore, achieving O(log N) retrieval with only three navigation decisions instead of hundreds of comparisons. Optical compression drastically reduces token usage—256 vision tokens replace over 1,000 text tokens. The retrieval process is transparent, with real-time updates showing traversal progress, and it eliminates hallucinations by retrieving text before answering.

Implementation
SCOPE includes five components:

Frontend: Flask web app for teachers (upload PDFs) and students (ask questions).
Backend: Flask server with Socket.IO handling API calls and real-time updates.
Database: SQLite storing documents, chat sessions, and logged questions.
SCOPE Core: The tree builder and traversal engine.
OCR Server: DeepSeek-OCR running on vLLM for compression and decompression.

The frontend communicates with the backend via HTTP and WebSocket. Teachers upload documents through /api/upload, triggering background indexing and progress notifications. Students send questions via WebSocket and receive live traversal updates and answers.

The database contains:

Documents: PDF metadata and token size.
Chats: Conversation histories.
Questions: Logs with answers, metrics, and response times.

DeepSeek-OCR compresses text by converting it to 2048×2048 PNG images and encoding them into 256 vision tokens. Decompression retrieves text via DeepSeek3B-MoE. Testing on a 24-page document (11,379 tokens) achieved 10–15× effective compression across layers.

Comparison:

Standard vector DB: O(N), 1500–2500 tokens per query.
GraphRAG: O(E+V), higher cost and 2000–4000 tokens per query.
SCOPE Trees: O(log N), only three navigation decisions, higher transparency, and logarithmic scaling.

The system runs Flask + Socket.IO for concurrency, vLLM for optimized inference (70+ tokens/sec), and requires ~18 GB VRAM.

Challenges
Integration was difficult due to model speed and compatibility. The team migrated DeepSeek-OCR to vLLM for faster inference and CUDA optimization, resolving version and memory issues via software updates and quantization. Coordinating multiple services (OCR health checks, WebSocket sync, background indexing, and caching 611 JSON files per document) required custom threading, retries, and LRU caching.

Accomplishments
They confirmed optical compression’s viability and learned that vision tokens can efficiently encode text with minimal accuracy loss. Inspired by human memory structure, they applied hierarchical compression to mirror progressive recall. They built a working 4-level RAG with 611 nodes per document, dual-LLM navigation, real-time visualization, and zero hallucinations. Retrieval now requires only three navigation steps versus 500 in standard RAG.

Next Steps
Future work includes improving accuracy with confidence scoring and chain-of-thought prompting, upgrading document splitting using layout detection (PP-DocLayout), and optimizing OCR speed via batching, caching, and quantization. GPU utilization is only 66%, so enhancing parallelization could cut response times from 15 s to 5 s.

Benchmarking against Pinecone, Weaviate, GraphRAG, and LightRAG is planned using metrics like accuracy, precision, latency, cost, and scalability. SCOPE is expected to excel in conceptual Q&A and scale logarithmically with document size.

Conclusion
SCOPE Trees demonstrates that hierarchical organization and optical compression can revolutionize RAG efficiency. By combining DeepSeek-OCR’s 10× compression (97% accuracy) with logarithmic retrieval, it proves that vision-language models can act as text compressors for educational AI. Built in 48 hours for GatorHacks 2025, the Teacher Assistant Bot shows that future RAG systems should focus not on larger databases but on smarter, hierarchical designs that mimic human document navigation.

TAB — "Teacher Assistant Bot"

Retrieval Augmented Generation with a Tree of Compressed Vision Tokens