Project Development Process

πŸ’‘ Ideation & Planning

Designed an AI-powered web app that generates image captions using Google Cloud Vision and Gemini AI.

🧱 Environment Setup

Structured the project with FastAPI, virtual environment, and necessary folders like templates/, static/, and uploads/.

βš™οΈ Backend Development

Built FastAPI routes to handle image uploads, run Google Vision label detection, and generate captions via Gemini.

🎨 Frontend Design

Developed user-facing HTML templates using Jinja2 for image upload, results display, and caption logs.

🧯 Error Handling & Logging

Implemented logging and graceful error responses to ensure smooth processing and debugging.

🐳 Containerization with Docker

Wrote a Dockerfile and containerized the application for consistent deployment.

πŸš€ Deployment to Google Cloud

Deployed the container to Google Cloud Run using Artifact Registry and environment variables.

πŸ” Testing & Troubleshooting

Resolved issues with API credentials, folder permissions, and container behavior during deployment.

🎁 Final Touches

Finalized UI styling, added logs page, and documented all steps for presentation and screencast.

Results

The deployed application successfully accepts image uploads, uses Google Cloud Vision to detect objects, and generates meaningful, human-like captions with Gemini AI. It delivers fast, reliable results with a clean UI and is accessible via a public URL on Google Cloud Run.