From Image to Insight: FastAPI-Powered Captioning with Google Vision & Gemini

2025

Personal Project

imagetoinsight.com

From Image to Insight: FastAPI-Powered Captioning with Google Vision & Gemini

Background

This project is a smart, AI-powered web application that allows users to upload an image and automatically receive a descriptive caption based on the objects detected in the image. The system uses the Google Cloud Vision API to analyze the image and extract relevant labels (objects), which are then passed to Gemini AI (Google’s Generative AI) to generate a natural language caption. The app is built with FastAPI and deployed as a Docker container using Google Cloud Run.

The goal of this project is to demonstrate the power of integrating multiple AI services (Vision + LLM) to produce intelligent, user-friendly outputs with minimal user input.

Tech Stack

Python: Primary programming language for building the backend logic.
FastAPI: Lightweight and fast web framework for building APIs.
Google Cloud Vision API: Extracts labels/objects from uploaded images.
Gemini AI (Generative AI by Google): Generates human-like image captions using labels.
Docker: Containerizes the app for consistent deployment and portability.
Google Cloud Run: Server less platform used to deploy and host the Dockerized app.
HTML/CSS: For the frontend interface of the application.

Project Development Process

💡 Ideation & Planning

Designed an AI-powered web app that generates image captions using Google Cloud Vision and Gemini AI.

🧱 Environment Setup

Structured the project with FastAPI, virtual environment, and necessary folders like templates/, static/, and uploads/.

⚙️ Backend Development

Built FastAPI routes to handle image uploads, run Google Vision label detection, and generate captions via Gemini.

🎨 Frontend Design

Developed user-facing HTML templates using Jinja2 for image upload, results display, and caption logs.

🧯 Error Handling & Logging

Implemented logging and graceful error responses to ensure smooth processing and debugging.

🐳 Containerization with Docker

Wrote a Dockerfile and containerized the application for consistent deployment.

🚀 Deployment to Google Cloud

Deployed the container to Google Cloud Run using Artifact Registry and environment variables.

🔍 Testing & Troubleshooting

Resolved issues with API credentials, folder permissions, and container behavior during deployment.

🎁 Final Touches

Finalized UI styling, added logs page, and documented all steps for presentation and screencast.

Results

The deployed application successfully accepts image uploads, uses Google Cloud Vision to detect objects, and generates meaningful, human-like captions with Gemini AI. It delivers fast, reliable results with a clean UI and is accessible via a public URL on Google Cloud Run.

Back