2025
Personal Project
imagetoinsight.com
Background
This project is a smart, AI-powered web application that allows users to upload an image and automatically receive a descriptive caption based on the objects detected in the image. The system uses the Google Cloud Vision API to analyze the image and extract relevant labels (objects), which are then passed to Gemini AI (Googleβs Generative AI) to generate a natural language caption. The app is built with FastAPI and deployed as a Docker container using Google Cloud Run.
The goal of this project is to demonstrate the power of integrating multiple AI services (Vision + LLM) to produce intelligent, user-friendly outputs with minimal user input.
Tech Stack
Python: Primary programming language for building the backend logic.
FastAPI: Lightweight and fast web framework for building APIs.
Google Cloud Vision API: Extracts labels/objects from uploaded images.
Gemini AI (Generative AI by Google): Generates human-like image captions using labels.
Docker: Containerizes the app for consistent deployment and portability.
Google Cloud Run: Server less platform used to deploy and host the Dockerized app.
HTML/CSS: For the frontend interface of the application.
Project Development Process
π‘ Ideation & Planning
Designed an AI-powered web app that generates image captions using Google Cloud Vision and Gemini AI.
π§± Environment Setup
Structured the project with FastAPI, virtual environment, and necessary folders like templates/, static/, and uploads/.
βοΈ Backend Development
Built FastAPI routes to handle image uploads, run Google Vision label detection, and generate captions via Gemini.
π¨ Frontend Design
Developed user-facing HTML templates using Jinja2 for image upload, results display, and caption logs.
π§― Error Handling & Logging
Implemented logging and graceful error responses to ensure smooth processing and debugging.
π³ Containerization with Docker
Wrote a Dockerfile and containerized the application for consistent deployment.
π Deployment to Google Cloud
Deployed the container to Google Cloud Run using Artifact Registry and environment variables.
π Testing & Troubleshooting
Resolved issues with API credentials, folder permissions, and container behavior during deployment.
π Final Touches
Finalized UI styling, added logs page, and documented all steps for presentation and screencast.
Results
The deployed application successfully accepts image uploads, uses Google Cloud Vision to detect objects, and generates meaningful, human-like captions with Gemini AI. It delivers fast, reliable results with a clean UI and is accessible via a public URL on Google Cloud Run.