<aside>

We are looking for beta testers for our product and would love to have you join our beta program. Please contact us at [email protected]

</aside>

Technology White Paper

We introduce Onida—a name derived from the Japanese word for “demon” reflecting its relentless drive for highly efficient & cost-effective video processing - a groundbreaking, neuroscience-inspired architecture that rivals the high-level performance of top vision-language models (e.g., GPT-4o, CogVLM, LLaVa) while dramatically cutting computational costs. Unlike conventional transformer-based models that demand massive datasets and powerful GPUs for training and inference, Onida takes inspiration from the human brain’s efficiency—executing complex learning and reasoning on just ~25W of power. In comparison, training a large language model is roughly 1000× more expensive, with inference costs ~100× higher.

Rather than scaling computational resources with diminishing returns, Onida redefines AI through biologically inspired design. Since vision has been the dominant sensory modality for millions of years—long before the evolution of language—our model is fundamentally rooted in visual perception. Additionally, Onida’s adaptable architecture enables seamless integration of other sensory modalities, including audio, olfaction, and touch, paving the way for a truly multimodal intelligence system.

<aside> 💡

This paradigm shift challenges the prevailing focus on large language models (LLMs) by demonstrating that brain-inspired designs starting with vision may offer more resource-efficient alternatives for advanced inference and learning tasks.

</aside>

Why Onida

CleanShot 2025-03-21 at 05.39.47@2x.png

Onida transforms video content management with:

Multi-Video Conversational Interface – Effortlessly query entire video libraries using natural language, enabling precise and context-aware insights without the risk of AI hallucinations. This ensures reliable information retrieval for decision-making.
Continuous Learning & Adaptation – Onida continuously refines its understanding of video content to align with evolving business needs. This dynamic learning process enhances accuracy while allowing seamless customization to meet specific industry or organizational requirements.
Scalable & Cost-Effective Performance – Offering high-performance video analysis at a fraction of the cost of traditional solutions, Onida makes advanced video intelligence accessible, even outperforming many open-source vision-language models (VLMs) in both efficiency and affordability.

Background

Neuroscience suggests that much of how our brain works revolves around memory—it acts as an auto-associative system that constantly stores and recalls past experiences to understand and predict new events. In simple terms, when we encounter something new, our brain first “recognizes” it by comparing it with memories of previous similar experiences, and then uses those memories to make predictions about what might happen next.

By drawing on these various types of memory, our brain is not only capable of recalling past experiences but also of predicting future events. It recognizes patterns in new information by comparing them with stored memories and then uses those patterns to forecast what might happen next. Onida generates high-quality inferences at a fraction of the computational cost required by traditional paradigms like LLMs by leveraging a memory based approach.

<aside>

“Recursive Self-Improvement refers to the property of making improvements on one's own ability of making self-improvements. It is an approach to Artificial General Intelligence that allows a system to make adjustments to its own functionality resulting in improved performance. The system could then feedback on itself with each cycle reaching ever higher levels of intelligence resulting in either a hard or soft AI takeoff.” - Source

</aside>

Recursive Query Expansion (RQE)

Recursive Query Expansion (RQE) is a form of Recursive Self-improvement that refines predictions through hypothesis testing. It works by systematically analyzing a graph of hypotheses and counter-hypotheses, combining their scores to reach a definitive classification. The process includes:

Defining Hypotheses: Creating specific hypotheses for target categories (e.g., playing golf, doing laundry, etc).
Developing Counter-Hypotheses: Creating contrasting examples to test against each hypothesis.
Embeddings Comparison: Pre-generating and storing vector signatures for both hypotheses and counter-hypotheses, then computing their similarity with incoming data embedding to determine relevance.