Zero-Cost Grocery Automation: Comparing Swiggy and Blinkit with a Telegram Bot and Local LLM
Using MCP and Playwright browser automation
So here’s a thing that’s been bugging me.
Every week I order groceries. Swiggy Instamart or Blinkit or Zepto, whichever app I happen to open first. Add stuff, pay, done. Been doing this for over a year.
On occasion, I checked the same list on both apps before hitting checkout and I found differences in grand total ranging between 5-20% (including bank / fintech payment coupons).
Everyone knows prices vary across apps. But actually comparing 15-20 items on two different platforms? Searching each one, noting prices, dealing with different pack sizes, and different brands? It’s too much effort.
But not doing it induces some FOMO. So thanks to AI tools, I thought of building something that could help me do this, without making the whole process tedious.
I want to share how I built it, not because the approach is particularly clever, but because the tools that made it possible are surprisingly accessible, and a few folks reached out to me to ask about the approach & tools.
What it actually does:
I send photo of a grocery list to a Telegram bot from my phone. Either a photo of my handwritten list (yes, I still write lists on paper, don’t judge) or just typed text. The bot reads it, goes to both Swiggy Instamart and Blinkit at the same time, searches for each item, picks the right product, adds everything to both carts, and sends me back a comparison. Which platform is cheaper overall, and the fill rate for each.
Before I wrote any code (I mean my AI agent), I decided on one constraint: zero recurring costs. No servers. No paid APIs. No subscriptions. Nothing.
Because if we are being cheap with grocery expenses, then why not with the AI setup :)
This constraint ended up being the most interesting part. Every decision had to pass through one filter: can it run locally, for free, without depending on someone else’s infrastructure?
Why I picked each tool:
First, the architecture in summary
TELEGRAM
I needed to send the bot lists from my phone. Considered WhatsApp (I use it way more) as well. Went with Telegram because the bot API is free. Telegram lets you just poll. The bot sits on my laptop asking “got anything new for me?”.
OLLAMA + LLAMA 3.2 VISION (for reading handwritten lists)
When I send a photo of my grocery list, something needs to read it. Google Vision API would nail this in two seconds. But it costs per request.
Ollama lets you run open-source language models on your own machine. There’s a vision model: llama3.2-vision, 11 billion parameters, which can read handwritten text from photos.
It takes two minutes to process one photo. Sometimes it stalls midway. I added a 90-second hang detector that kills the process and tries to parse the partial output in that scenario.
PYTHON
The whole thing is one Python script. A file that polls Telegram, processes messages, and talks to grocery APIs. We went ahead with a single-threaded script with two threads for parallel cart filling.
MCP (MODEL CONTEXT PROTOCOL)
Thankfully, I discovered that Swiggy has an MCP server. You hit an endpoint, it tells you what tools are available (search products, manage cart, checkout), and you call them with structured requests.
Blinkit has one too. Different approach though — Swiggy’s is a clean HTTP API (fast, about 20-25 seconds for ~15 items), while Blinkit’s spawns a headless Firefox browser under the hood and literally browses the website for you (slower, about 2 mins, but it works).
OAUTH (Swiggy login)
First time you run it, a browser pops up, you log into Swiggy, and the bot saves an auth token. Stores it locally, refreshes it when it expires.
What didn’t work well and learnings:
Product matching was hard. You search “Toor Dal 500g” and get back twenty results. Half are 1kg bags. With some SKUs, you get the organic variant. Some “imported” (triple the price). There’s a “diced” variant that nobody asked for.
I kept getting issues during testing. The bot would add a 5kg bag of rice when I wanted 1kg. It’d pick high-protein eggs at ₹300 instead of regular ones at ₹85.
So I built up a set of rules:
Reject anything more than 30% off from the requested weight. If I ask for 500g, don’t give me 1kg.
No imported, organic, dried, or diced variants unless I specifically say so. This single rule fixed most of the bad matches.
Next, for faster OCR, I tried a tiny vision model (Moondream). But it couldn’t read long lists of 15-20 items accurately.
Next, I really wanted to add Zepto as a third platform. No MCP server, no usable API, no reasonable way in. Had to let it go.
Meta learnings
Running LLMs locally is doable but humbling. On paper, “local LLM for OCR” sounds cool. In practice, it means watching your laptop’s fan spin up, waiting two minutes for a response, etc. It works, but is painful to develop, requires patience.
Not everything needs to scale. This bot serves one household. Mine. It has no users page, no analytics dashboard, no Kubernetes cluster. It’s a Python script on a laptop that saves me time. Perhaps this is the future of apps - fully personalized agents that are non-scalable.
User interface for most such tasks is not going to be required in the near future. Just delegate tasks to your personal agent.
The stack, if you’re curious
→ Python 3.13
→ Ollama with llama3.2-vision:11b (local OCR)
→ Playwright (Blinkit’s browser automation)
→ Telegram Bot API
→ MCP (Model Context Protocol)
→ OAuth 2.0 PKCE (Swiggy auth)
If any of this is interesting to you, or if you’ve built something similar, I’d genuinely love to hear about it. The whole reason I’m writing this up is that the pieces are all freely available: you just have to know they exist and be stubborn enough to wire them together.



Thanks Rahul! Indeed, my horizon over AI agents got a upgrade with this architecture.