Technologist & Entrepreneur | Innovator in FinTech, AI, Blockchain, Quantum Computing & Emerging Technologies
I’m thrilled to announce the release of SimpleRAG, my latest open-source project hosted on GitHub. SimpleRAG is an advanced Retrieval-Augmented Generation (RAG) pipeline designed for processing images, videos, and audio, with specialized capabilities for financial and trading chart analysis.
Whether you're a researcher, trader, or developer looking to explore multimedia data, SimpleRAG offers a powerful, extensible, and user-friendly solution for embedding, analyzing, and visualizing complex datasets.
SimpleRAG is a robust pipeline that processes multimedia data—images, videos, and audio—and extracts meaningful insights, particularly for financial trading charts. It leverages state-of-the-art models like CLIP, BLIP, Whisper, and Tesseract OCR to embed, transcribe, caption, and analyze data, storing everything in a vector database (ChromaDB) for fast querying and interactive visualization.
The project is built for rapid experimentation, extensibility, and deep inspection of data and model outputs, making it ideal for trading analytics, multimedia search, and research.
SimpleRAG is designed for maximum compatibility and ease of use. You can run the entire pipeline in a Docker container to ensure a clean environment with all dependencies pre-installed. Here’s how to set it up:
Build the Docker image:
docker build -t simplerag .
Run the container:
docker run -it --rm \
--network=host \
-v /Users/igorkomolov/Movies/RAG_Videos:/app/Movies/RAG_Videos \
-v /Users/igorkomolov/Pictures/RAG_Photos:/app/Pictures/RAG_Photos \
-v "$PWD/photo_db:/app/photo_db" \
-e RAG_VIDEO_FOLDER=/app/Movies/RAG_Videos \
-e RAG_PHOTO_FOLDER=/app/Pictures/RAG_Photos \
simplerag
You should know that these directories are pointing to my system, so you would need to change it for yours.
Important MacOS Silicon users, please do not use Docker, just run start.sh or install deps yourself.
This command mounts your data folders, sets environment variables (RAG_VIDEO_FOLDER
and RAG_PHOTO_FOLDER
), and runs build_rag.py
in a Python 3.12 environment. You can modify the Dockerfile or entrypoint for custom scripts or workflows. For manual setup instructions (without Docker), refer to start.sh
or start.md
in the repository.
[d]
option to select and delete models (collections) as needed.visualize_metadata.py
to inspect metadata, including captions, OCR text, timeframes, and candlestick data.visualize_metadata_3d.py
(requires PyQt5, PyQtWebEngine, and Plotly).To run SimpleRAG locally or extend its functionality, install the following dependencies:
Core Pipeline:
pip install pytesseract opencv-python blip clip whisper ffmpeg-python chromadb
GUI/3D Visualization:
pip install PyQt5 PyQtWebEngine plotly networkx
Notes:
PyQt5==5.15.9
) and install system Qt libraries.python build_rag.py
and select or create a model.[d]
in the CLI to delete a model if needed.python visualize_metadata.py
.python visualize_metadata_3d.py
(GUI required).SimpleRAG uses ChromaDB to store embeddings and metadata in the /vectordb
directory. This enables fast, semantic searches and rich data inspection. I’m actively researching vector database optimizations to further enhance performance and scalability—stay tuned for updates!
SimpleRAG is designed for developers, researchers, and traders who need a flexible, powerful tool for multimedia analysis. Its focus on financial chart analysis makes it particularly valuable for trading analytics, while its extensible architecture supports a wide range of use cases, from multimedia search to research prototyping. The interactive CLI and visualization tools make it easy to experiment and gain insights from your data.
Check out the SimpleRAG repository on GitHub to explore the code, contribute, or provide feedback. I’m excited to see how the community uses SimpleRAG for trading, research, and beyond. Feel free to reach out on X or via the repository’s issues page with questions or ideas!
Happy coding, analyzing, and