InsightRed

Project’s GitHub Repo

About

InsightRed is an LLM-powered tool adept at extracting the latest Reddit comments from Subreddits, sorted by “Hot”, and pinpointing users who exhibit potential interest in your project or product. It’s a Reddit marketing tool to help you get your initial users for your product/project. This project was build for the ANARCHY October 2023 Hackathon.

Demo

InsightRed’s Components

🧩 Collector

The Collector collects the latest Reddit posts and that post’s comments, for a given Subreddits, by using Reddit’s API. After collecting, the collector saves the collected data to a local SQLite database. This is made easy by using the python package praw to assist with using the Reddit API and SQLAlchemy for performing CRUD operations in the local SQLite database.

🧩 Vectorizer

The Vectorizer checks the local SQLite database to see which comments have not been saved to the vector database. After getting a list of comments, it creates an embedding of the post+comment using OpenAI’s “text-embedding-ada-002” model. This embedding is used as an Index in the vector database and some metadata, in the form of a JSON, is also created. The Index and metadata is then uploaded to the vector database, which in this case is Pinecone (cloud-based). After being uploaded, the local SQLite database is updated to avoid re-uploading the same data to Pinecone. This is all done by using Pinecone’s python client (pinecone-client) for making CRUD options to the vector database and LangChain for handing the embedding process.

🧩 Interface

The interface is what is used by the user to interact with the tool. In this case, the interface is a CLI. The interface has an implementation of Retrieval-Augmented-Generation (RAG). Where the user provides a description of their product, a list of Subreddits to check, as well as some filters. Given this context, the Collector is called then the Vectorizer is called. After those two services are done processing, the inputted product description is used to make a similarly search in the vector database. The top results and the product description are then fed into a prompt template which creates the final prompt. The final prompt is then sent to OpenAI’s GPT-4 model and the final results are then presented to the user. These results will be a listing of all the Reddit comments that highly suggest the Reddit user(s) would be interested in the provided product, based on it’s description. This component works by using the Collector and Vectorizer comments, as well as, by using Anarchy’s LLM-VM to handling querying OpenAI’s GPT-4 model.

Team Members

Notable Outside Credit

  • casta (Hacker News)
    • Providing the inspiration for this project though their HN post. Since their solution was not open-source, I was motivated to create an open-source version (this project).
  • ChatGPT (GPT-4)
    • Being helpful with development as well as speeding up development, which is nice because I work full time.
    • Generating the project’s logo and YouTube thumbnail; though the it’s DALL-E 3 model.
  • James Briggs (YouTuber)
    • Explain how to use the Reddit API
    • Explain and showing how to implement basic RAG using Python.

Sources