Adding LLM Professor Summaries to Culpa With 0$ Cost

Why?

Recently AI has been popping up everywhere, and all products seem to be trying to integrate it, like Google search AI overviews, or Amazon product review summaries. So naturally, we have been thinking about ways to integrate LLMs into Culpa as well.

The most natural idea to us was providing professor summaries: some professors on Culpa have well over 100 student-written reviews. While this depth is great, it can also be overwhelming—especially during course registration, when students are comparing multiple professors on a tight deadline.

We wanted to help users get the overall picture faster, without having to read every single review.

Design Choices

When thinking about how to implement this, there were a few points to consider:

Persistence vs Real-Time Generation: contrary to most AI features (such as Google search AI Overviews), Culpa has a limited number of professors (instead of an infinite space of search queries). This means that we can generate overviews once and store them in our database, instead of generating them in real time every time someone visits a professor's page. This is important because it allows us to avoid making API calls every time someone visits a page, which would be expensive and slow. If a professor's reviews change, we can just regenerate the overview once and store it again. And since professor's reviews change fairly infrequently, this is a sensible approach.
Fine-Tuned vs General-Purpose Model: for some more specific applications, fine-tuning a model might be necessary, but for our use, it doesn't make sense because: (1) our use case is very general, so a general-purpose model is completely sufficient; (2) even if we wanted to fine-tune a model, we don't have enough data to do so. Fine-tuning a model would also really complicate our system, and add the burden of getting infrastructure to run fine-tuning... So the choise here was very obvious.
Inference Infrastructure: for us, using a third-party API was the best option. Our system is not complex enough, nor has nearly enough traffic to justify running our own inference infrastructure.
Model: we chose to use Gemini 2.0 Flash through the Gemini API, primarily because it has a very generous free tier (~10 requests per minute), which allowed us to build this system without incurring cost. We also liked that it made it easy to control output length and formatting. We used the Gemini 2.0 Flash model because it was more than sufficient for our needs (no need for more powerful reasoning models).

For the prompt, we just made a nicely formatted prompt specifying the task, and then fed all of a professor's reviews as context. We experimented with a few variations until we found one that produced results we were happy with.

Queuing Requests to Stay Under Free Tier Limits

In order to run our system with no cost, we needed to ensure that we don't exceed the free tier limits of the Gemini API.

Because we generate overviews for professors whenever their reviews change, the number of requests we need to make to the API can vary over time. Because of this, we needed some way to queue requests, while feeding them slowly to the API to stay under the rate limit.

We opted for a simple queueing system backed by a database table. More complex queueing systems (like SQS) would be another option, but since our case is very simple, the extra complexity was not worth it.

When a professor’s reviews are created or updated, we enqueue a request to generate an overview. A background Typescript app checks the queue every few minutes, and makes ani API calls if necessary. The app then has the logic to space our requests according to the rate limit.

This background app is very lightweight, and just runs on the same EC2 instance as our backend. Since we don't pay any extra infrastructure costs, and stay within the free tier of the Gemini API, the total end cost of the AI Overview system is 0$, which is pretty nice

Conclusion

In the end, this feature was surprisingly simple to implement end-to-end:

No additional infra beyond a background worker
No runtime API cost thanks to Gemini’s free tier
No extra complexity for page loads—summaries are just static DB content

Yet, despite the simplicity, we believe it adds real value to students during course registration, and we hope students will find it useful!