Claude Code transcript - page 7/9

this firecrawl ran succeeded but failed to properly log in the db. likely because I thought the run failed and commuted home - disconnecting from wifi. Run logs: Starting extract for query: "This is a G2 review page. Extract all customer/user reviews from https://www.g2.com/products/notion/reviews. For each review, extract: the full review text, the star ratin...... www.g2.com

Running code...

Running code...

Running code...

Running code...

Running code...

Running code...

Running code...

Running code... www.g2.com

Research completed.

Running code... Extracted: { "reviews": [ { "text": "Visually Appealing, Flexible, and Great for Team Documentation...", "rating": 5, "date": "2026-03-20", "author": "Alejandra H. (AH)" }, { "text": "Customizable Powerhouse with a Learning Curve...", "rating": 5, "date": "2026-03-13", "author": "Matthew I. (MI)" }, { "text": "Notion’s Flexibility and Customization Keep Everything Organized in One Place...", "rating": 5, "date": "2026-03-12", "author": "Ildiko M. (IM)" }, { "text": "A Powerful Knowledge Management and Organization Tool...", "rating": 4, "date": "2026-03-11", "author": "Alan R. (AR)" }, { "text": "The ultimate internal knowledge base and operations hub...", "rating": 4.5, "date": "2026-03-09", "author": "Afzal M. (AM)" }, { "text": "Centralized Visibility & Smarter Workflow Management...", "rating": 5, "date": "2026-03-05", "author": "Natasha B. (NB)" }, { "text": "Well‑Structured Information That’s Easy to Navigate...", "rating": 5, "date": "2026-03-17", "author": "Alejandra M. (AM)" }, { "text": "Versatile and Intuitive Tool for Solopreneurs...", "rating": 5, "date": "2026-03-05", "author": "Josh S. (JS)" }, { "text": "The Swiss Army Knife of Productivity in 2026...", "rating": 4.5, "date": "2026-03-03", "author": "Maxwell K. (MK)" }, { "text": "Integration, adaptability and efficiency with Notion's virtual work systems...", "rating": 5, "date": "2026-03-12", "author": "Lindsay R. (LR)" }, { "text": "Notion’s Flexible, Clean Workspace for Organizing Everything in One Place...", "rating": 5, "date": "2026-03-13", "author": "Anbuselvam S." }, { "text": "Notion: Powerful Yet Straightforward Note-Taking...", "rating": 5, "date": "2026-02-26", "author": "Levi N." }, { "text": "Highly Customizable Tool That Keeps Me Organized and Consistent...", "rating": 5, "date": "2026-02-23", "author": "Miguel S." }, { "text": "Valuable Visibility, Challenging Learning Curve...", "rating": 2, "date": "2026-02-13", "author": "Colby F." }, { "text": "Intuitive Interface and Superior Organization...", "rating": 5, "date": "2025-11-06", "author": "fabiele s." }, { "text": "Custom Database Views Make Notion a Powerful Hub for R&D Projects...", "rating": 4.5, "date": "2026-02-17", "author": "Kwesi R." }, { "text": "Time-Saving Templates, Powerful Publish-to-Web, and Truly Helpful AI...", "rating": 4.5, "date": "2026-02-17", "author": "David Serna M." }, { "text": "Intuitive, Collaborative, and Highly Customizable...", "rating": 5, "date": "2026-02-13", "author": "Andrew W." }, { "text": "Essential Knowledge Bank with Easy Setup...", "rating": 4, "date": "2026-02-13", "author": "Neelkanth L." }, { "text": "Effortless Team Coordination with Intuitive UI...", "rating": 5, "date": "2026-03-19", "author": "Sam C." }, { "text": "Versatile Workspace with Seamless Customization...", "rating": 5, "date": "2026-02-12", "author": "Samuel S." }, { "text": "Flexible and Easy to Use, but Organization Needs a Better Folder Structure...", "rating": 3.5, "date": "2026-02-16", "author": "Gavynn B." }, { "text": "Notion’s Flexible All‑in‑one Workspace with Powerful Databases...", "rating": 4.5, "date": "2026-02-15", "author": "arghya s." }, { "text": "Great content manager; terrible project management tool...", "rating": 3, "date": "2026-02-13", "author": "Allison C." }, { "text": "Notion 3.0 is a game change on top of a great product...", "rating": 5, "date": "2026-02-13", "author": "Abbie V." }, { "text": "All‑in‑one Organization Made Easy...", "rating": 5, "date": "2026-02-13", "author": "JASMINA D." }, { "text": "Innovative, Fast, and Indispensable for Management...", "rating": 5, "date": "2026-02-12", "author": "Amilker C." }, { "text": "Notion Call Summaries and Action Items Make Note‑taking Effortless...", "rating": 4, "date": "2026-02-13", "author": "Rita F." }, { "text": "Notion as Our All‑in‑one Workspace for Efficient Team Collaboration...", "rating": 5, "date": "2026-02-13", "author": "Alessandro T." }, { "text": "The Ultimate Hub for Team Alignment...", "rating": 5, "date": "2025-05-20", "author": "Jennifer R." }, { "text": "Flexible and Powerful: Notion AI Chat Finds Solutions Fast...", "rating": 4, "date": "2026-02-12", "author": "Mauro B." }, { "text": "Great experience and makes everything streamlined for me...", "rating": 4.5, "date": "2026-02-06", "author": "Lucia J." }, { "text": "Notion Just Works: One Place for Notes, Ideas, and Everything Else...", "rating": 5, "date": "2026-02-06", "author": "Chris M." }, { "text": "User‑friendly One‑stop Shop for Docs, Tasks, Projects, and Collaboration...", "rating": 5, "date": "2026-03-12", "author": "Liliya A." }, { "text": "Flexible All‑in‑one Workspace That Keeps Everything Centralized...", "rating": 4, "date": "2026-02-04", "author": "Ameer A." }, { "text": "Notion is the backbone of how I think, plan, and work...", "rating": 5, "date": "2023-04-04", "author": "Scott B." }, { "text": "Notion AI Delivers Instant, Cross‑platform Insights for CSMs...", "rating": 4.5, "date": "2026-03-05", "author": "Christopher M." }, { "text": "Notion as a Centralized Source of Truth for Calendars, Leads, and AI Meeting Notes...", "rating": 5, "date": "2026-02-02", "author": "Katrina P." }, { "text": "From Podcast Scripts to Gear Lists, Notion Is My All‑in‑one Production Hub...", "rating": 4.5, "date": "2026-02-02", "author": "Brent H." }, { "text": "Notion Review...", "rating": 4.5, "date": "2026-02-02", "author": "Vibhor J." }, { "text": "A Workspace That Grows at the Same Pace as Your Ideas...", "rating": 5, "date": "2026-02-15", "author": "Priyanka T." }, { "text": "Notion Keeps My Student Life Organized with Flexible, Customizable Workspaces...", "rating": 4.5, "date": "2026-01-30", "author": "Rohit R." }, { "text": "Adaptable, Notebook‑like Tool with Powerful Folders, Subfolders, and Calendar Mode...", "rating": 5, "date": "2026-01-30", "author": "gabriela c." }, { "text": "A comprehensive workspace that redefines how i organize everything digital...", "rating": 4.5, "date": "2026-01-29", "author": "Luca P." }, { "text": "Notion Replaced Evernote for Less—Plus Many More AI‑powered Extras...", "rating": 5, "date": "2026-01-29", "author": "Markkus R." }, { "text": "The ultimate ‘External Brain’ for juggling work, school, and a side hustle...", "rating": 4.5, "date": "2026-01-29", "author": "Erin O." }, { "text": "Easy Model Discovery, But Too Many Note‑finding Options Can Confuse New Users...", "rating": 3.5, "date": "2026-02-12", "author": "Denise G." }, { "text": "Notion - productivity and efficiency combined...", "rating": 5, "date": "2025-05-19", "author": "Akshaya A." }, { "text": "Flexible and Organized, Needs Better AI...", "rating": 4, "date": "2026-03-05", "author": "Ali A." }, { "text": "Notion has allowed me to gather almost everything in one place...", "rating": 5, "date": "2026-01-28", "author": "Louis S." }, { "text": "Versatile Databases, Diagrams, and Docs That Make Collaboration Easy...", "rating": 4.5, "date": "2026-01-26", "author": "Garry P." }, { "text": "Notion: Intuitive, AI‑enhanced and Flexible for Students...", "rating": 5, "date": "2026-01-25", "author": "Victoria L." }, { "text": "Seamless Daily Workflow with Notion’s Flexible, Intuitive Task Management...", "rating": 4, "date": "2026-01-23", "author": "Brahmatheja Reddy M." }, { "text": "The Ultimate Personal Knowledge Hub...", "rating": 4, "date": "2026-01-22", "author": "Andrew G." }, { "text": "Versatility and Powerful Centralization, But Needs Improvement on Mobile...", "rating": 5, "date": "2026-01-22", "author": "Jennyfer R." }, { "text": "Building a Second Brain: How Notion Reduces Information Overload...", "rating": 5, "date": "2026-01-22", "author": "Prashant K." }, { "text": "Notion: An Efficient Workspace for Managing Business Challenges...", "rating": 4, "date": "2026-01-28", "author": "Subham A." }, { "text": "A True All‑in‑one Workspace: Flexible Databases, Notes, and Powerful AI...", "rating": 5, "date": "2026-01-22", "author": "Diana N." }, { "text": "I like the ease with which I can organize my work...", "rating": 5, "date": "2026-03-18", "author": "José Francisco S." }, { "text": "I love how easy it is to use Notion...", "rating": 4.5, "date": "2026-02-25", "author": "Jordi A." }, { "text": "Stay organized as a solopreneur...", "rating": 5, "date": "2026-03-20", "author": "Brianne G." }, { "text": "Team information organized and centralized...", "rating": 4.5, "date": "2026-03-05", "author": "Carlos J." }, { "text": "Well organized and clearly designed for corporate use...", "rating": 5, "date": "2026-02-12", "author": "Akshat S." }, { "text": "The unparalleled modularity...", "rating": 5, "date": "2026-02-10", "author": "Brallan G." }, { "text": "Freedom to create...", "rating": 5, "date": "2026-02-16", "author": "Chris T." }, { "text": "Organize every aspect of my life...", "rating": 5, "date": "2026-03-14", "author": "Nick D." }, { "text": "Notion AI helps manage my entire business...", "rating": 5, "date": "2026-03-12", "author": "Amy E." }, { "text": "Highly customizable templates...", "rating": 4.5, "date": "2026-01-15", "author": "Alex H." }, { "text": "Bridging Planning and Execution...", "rating": 5, "date": "2026-01-15", "author": "Angus W." }, { "text": "Versatile, Visually Appealing...", "rating": 5, "date": "2023-04-04", "author": "Jason H." }, { "text": "Digital To‑do List...", "rating": 4, "date": "2026-01-14", "author": "Umesh K." }, { "text": "Excellent project management...", "rating": 5, "date": "2026-03-20", "author": "Axel H." }, { "text": "Flexible All‑in‑one Workspace...", "rating": 4, "date": "2026-03-09", "author": "Anubhav K." }, { "text": "Extremely Flexible Workspace...", "rating": 4.5, "date": "2026-01-13", "author": "Jefersson B." }, { "text": "Flexible All‑in‑one Workspace with Intuitive Design...", "rating": 4, "date": "2026-01-13", "author": "Ayush K." }, { "text": "Bets all in one collaboration tool for teams...", "rating": 5, "date": "2023-05-20", "author": "Lukas M." }, { "text": "Notion’s Easy, Intuitive UI...", "rating": 5, "date": "2026-03-14", "author": "Dexter K." }, { "text": "All‑in‑one Organization Made Effortless...", "rating": 4.5, "date": "2026-01-09", "author": "Satish M." }, { "text": "A Lifesaver for Note‑taking and Organization...", "rating": 5, "date": "2026-03-19", "author": "Michael P." }, { "text": "Essential for Team Collaboration...", "rating": 5, "date": "2026-03-19", "author": "Colin B." }, { "text": "Effortlessly Organizes and Enhances Collaboration...", "rating": 5, "date": "2026-03-18", "author": "Raymond L." }, { "text": "Notion: total flexibility to centralize...", "rating": 5, "date": "2026-03-18", "author": "GI (Verified User)" }, { "text": "Comprehensive Tool for Documentation and Notes...", "rating": 4.5, "date": "2026-03-18", "author": "Mohit S." }, { "text": "Intuitive Productivity Tool...", "rating": 5, "date": "2026-03-18", "author": "Seema P." }, { "text": "My go‑to Software EVERY time!...", "rating": 5, "date": "2025-05-20", "author": "Dr. Sandi E." }, { "text": "Customizable and Feature‑rich...", "rating": 3.5, "date": "2026-03-17", "author": "Matthew M." }, { "text": "Flexible Platform, Ideal for Team Collaboration...", "rating": 4.5, "date": "2026-03-17", "author": "Carlie A." }, { "text": "Streamlined Organization...", "rating": 4, "date": "2026-03-17", "author": "G (Verified User)" }, { "text": "Notion AI Makes a Powerful, Searchable Knowledge Hub...", "rating": 4, "date": "2026-03-17", "author": "UT (Verified User)" }, { "text": "Versatile Platform with Powerful AI...", "rating": 5, "date": "2026-03-16", "author": "David E." }, { "text": "Highly Customizable...", "rating": 4, "date": "2026-03-16", "author": "Case W." }, { "text": "A Powerful Tool That Can Easily Derail Real Productivity...", "rating": 2.5, "date": "2026-01-27", "author": "Vanina I." }, { "text": "Versatile But Overwhelming Organizational Tool...", "rating": 3.5, "date": "2026-03-19", "author": "Scott E." }, { "text": "Essential Tool for Remote Collaboration...", "rating": 5, "date": "2026-03-15", "author": "Kaylee H." }, { "text": "Easy Collaboration and Fast Project Setup...", "rating": 5, "date": "2026-03-11", "author": "Rosy G." }, { "text": "Excellent multi‑project project management app...", "rating": 4, "date": "2026-01-19", "author": "Daniel P." }, { "text": "Exceptional AI Integration and Intuitive UX/UI...", "rating": 5, "date": "2025-12-27", "author": "Sabrina Julia J." }, { "text": "Rich Features, Complex to Master...", "rating": 4, "date": "2026-03-13", "author": "Harsheeta G." }, { "text": "Built‑in AI is very helpful for summarising notes...", "rating": 5, "date": "2026-03-13", "author": "Emily U." }, { "text": "All‑in‑one, highly‑customisable workspace...", "rating": 4.5, "date": "2026-03-17", "author": "Kapil Y." }, { "text": "Proposal outlines and scope features...", "rating": 3.5, "date": "2026-03-12", "author": "Megha B." }, { "text": "AI feature simplifies tasks...", "rating": 4.5, "date": "2026-03-12", "author": "Brian K." }, { "text": "Clear task overviews, powerful databases...", "rating": 5, "date": "2026-03-12", "author": "Ryan B." }, { "text": "Customisable command‑center feature...", "rating": 5, "date": "2026-03-12", "author": "Dianna H." }, { "text": "Flexibility as an all‑in‑one workspace...", "rating": 4, "date": "2026-03-13", "author": "Alana Z." }, { "text": "Automatic meeting‑recording and note‑taking...", "rating": 4.5, "date": "2026-03-12", "author": "Nicole H." }, { "text": "Platform for tasks, projects, ideas...", "rating": 5, "date": "2026-03-12", "author": "Paul P." }, { "text": "File‑system organisation with word‑processor...", "rating": 5, "date": "2026-03-12", "author": "Mason R." }, { "text": "Perfect Note-Taking for ADHD with Outstanding AI Features...", "rating": 5, "date": "2026-03-12", "author": "Sable M." }, { "text": "Versatile Collaboration with Notion, But Steep Learning Curve...", "rating": 4, "date": "2026-03-12", "author": "Jayla J." }, { "text": "Innovative and User‑Friendly with Scope for Speed Improvement...", "rating": 4, "date": "2026-03-12", "author": "Cody F." }, { "text": "Versatile, Yet Complex to Master...", "rating": 4, "date": "2026-03-12", "author": "Justin M." }, { "text": "Revolutionized Our Strategic Documentation...", "rating": 4, "date": "2026-03-12", "author": "Coko S." }, { "text": "Customizable Platform with AI, Needs AI Refinements...", "rating": 3.5, "date": "2026-03-12", "author": "Andres D." }, { "text": "Intuitive and Versatile for Personal Productivity...", "rating": 5, "date": "2026-03-12", "author": "Asha Y." }, { "text": "Effortless Knowledge Management with Notion...", "rating": 5, "date": "2026-03-12", "author": "Anesa A." }, { "text": "All My Notes and Projects in One Place with Notion...", "rating": 5, "date": "2026-03-12", "author": "UF (Verified User)" }, { "text": "Organizational Powerhouse with Pricey AI...", "rating": 3, "date": "2026-03-12", "author": "Gavin L." } ] }

The other runs failed because it exceeded Firecrawls limit of 500 credits per 5 daily free agentic runs. Is there a way we can lower the amount of pages the agent hits to we stay within the 500 credit limit?

Starting extract for query: "This is a G2 review page. Extract all customer/user reviews from https://www.g2.com/products/notion/reviews. For each review, extract: the full review text, the star ratin...... +3

Running code... www.g2.com

Running code... www.g2.com

Running code... www.g2.com

Running code...

Running code...

Running code...

This request reached the default 500-credit safety limit for free requests. Paid requests can set a higher limit via the maxCredits API parameter.

Good news — Firecrawl did extract 113 reviews successfully on the first run. The issue is the wifi disconnect meant the poll response never made it back to the app. And the subsequent runs burned through the 5 free daily runs.

Two things to fix:

  1. Lower credit usage — tell the agent to only scrape the first page (don't paginate), which keeps it well under 500 credits
  2. Lower maxCredits — cap at 250 to fail fast instead of burning the full 500 allowance
Read
{
  "file_path": "/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/ingest.py",
  "offset": 135,
  "limit": 45
}
   135→
   136→FIRECRAWL_API_URL = "https://api.firecrawl.dev/v2"
   137→
   138→_AGENT_POLL_INTERVAL = 2   # seconds between status checks
   139→_AGENT_TIMEOUT = 300       # total seconds before giving up
   140→_AGENT_MAX_CREDITS = 500   # conservative cap for zero-cost constraint
   141→
   142→_REVIEW_SCHEMA = {
   143→    "type": "object",
   144→    "properties": {
   145→        "reviews": {
   146→            "type": "array",
   147→            "items": {
   148→                "type": "object",
   149→                "properties": {
   150→                    "text": {"type": "string", "description": "Full review text written by the customer"},
   151→                    "rating": {"type": ["number", "null"], "description": "Star rating from 1 to 5"},
   152→                    "date": {"type": ["string", "null"], "description": "Date posted in YYYY-MM-DD format"},
   153→                    "author": {"type": ["string", "null"], "description": "Reviewer's display name"},
   154→                },
   155→                "required": ["text"],
   156→            },
   157→        }
   158→    },
   159→    "required": ["reviews"],
   160→}
   161→
   162→
   163→def _build_agent_prompt(url: str, platform: str) -> str:
   164→    platform_hint = f"This is a {platform} review page. " if platform else ""
   165→    return (
   166→        f"{platform_hint}"
   167→        f"Extract all customer/user reviews from {url}. "
   168→        "For each review, extract: the full review text, the star rating as a number "
   169→        "from 1 to 5 (if shown), the date it was posted in YYYY-MM-DD format (if shown), "
   170→        "and the reviewer's name (if shown). "
   171→        "Only extract actual user-submitted reviews — not editorial content, product "
   172→        "descriptions, or page navigation text. If reviews span multiple pages, "
   173→        "navigate through all available pages to collect them."
   174→    )
   175→
   176→
   177→async def scrape_url(url: str, platform: str = "") -> list[Review]:
   178→    """Best-effort URL scraping via Firecrawl agent. Returns structured reviews."""
   179→    api_key = os.getenv("FIRECRAWL_API_KEY", "")
✏️ Edit ingest.py
/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/ingest.py
_AGENT_MAX_CREDITS = 500   # conservative cap for zero-cost constraint
+
_AGENT_MAX_CREDITS = 250   # stay well under the 500/run free-tier limit
The file /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/ingest.py has been updated successfully.
✏️ Edit ingest.py
/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/ingest.py
def _build_agent_prompt(url: str, platform: str) -> str:
    platform_hint = f"This is a {platform} review page. " if platform else ""
    return (
        f"{platform_hint}"
        f"Extract all customer/user reviews from {url}. "
        "For each review, extract: the full review text, the star rating as a number "
        "from 1 to 5 (if shown), the date it was posted in YYYY-MM-DD format (if shown), "
        "and the reviewer's name (if shown). "
        "Only extract actual user-submitted reviews — not editorial content, product "
        "descriptions, or page navigation text. If reviews span multiple pages, "
        "navigate through all available pages to collect them."
    )
+
def _build_agent_prompt(url: str, platform: str) -> str:
    platform_hint = f"This is a {platform} review page. " if platform else ""
    return (
        f"{platform_hint}"
        f"Extract customer/user reviews from {url}. "
        "For each review, extract: the full review text, the star rating as a number "
        "from 1 to 5 (if shown), the date it was posted in YYYY-MM-DD format (if shown), "
        "and the reviewer's name (if shown). "
        "Only extract actual user-submitted reviews — not editorial content, product "
        "descriptions, or page navigation text. "
        "IMPORTANT: Only scrape the first page of reviews. Do NOT click through to "
        "additional pages or follow pagination links. Extract what is visible on the "
        "initial page load only."
    )
The file /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/ingest.py has been updated successfully.

Two changes: - maxCredits 500 → 250 — fails fast instead of burning the full free-tier allowance - Prompt now explicitly says first page only — no pagination, no following "next" links. The first run got 113 reviews from what looks like ~4 pages of G2 results; single-page should still get 10-25 reviews which is plenty for a demo

Btw the one that DID run was because it was hitting my free tier credit limit (really hammered it too), it was the 6th run and thus not the free 5 daily runs.

Also, for the one that did work - it seems to not have extracted full review text.

Good catches. So the successful run was actually a paid credit run, and the review texts are truncated — just titles/summaries rather than the full review bodies. Let me update the prompt to be explicit about extracting the complete text.

✏️ Edit ingest.py
/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/ingest.py
def _build_agent_prompt(url: str, platform: str) -> str:
    platform_hint = f"This is a {platform} review page. " if platform else ""
    return (
        f"{platform_hint}"
        f"Extract customer/user reviews from {url}. "
        "For each review, extract: the full review text, the star rating as a number "
        "from 1 to 5 (if shown), the date it was posted in YYYY-MM-DD format (if shown), "
        "and the reviewer's name (if shown). "
        "Only extract actual user-submitted reviews — not editorial content, product "
        "descriptions, or page navigation text. "
        "IMPORTANT: Only scrape the first page of reviews. Do NOT click through to "
        "additional pages or follow pagination links. Extract what is visible on the "
        "initial page load only."
    )
+
def _build_agent_prompt(url: str, platform: str) -> str:
    platform_hint = f"This is a {platform} review page. " if platform else ""
    return (
        f"{platform_hint}"
        f"Extract customer/user reviews from {url}. "
        "For each review, extract: "
        "(1) the COMPLETE review text — the full body/content of what the reviewer "
        "wrote, not just the title or summary. Include both the review title and the "
        "full paragraph(s) of text. On G2 and similar sites, you may need to expand "
        "'Read more' or similar elements to get the full text. "
        "(2) the star rating as a number from 1 to 5 (if shown), "
        "(3) the date it was posted in YYYY-MM-DD format (if shown), "
        "(4) the reviewer's name (if shown). "
        "Only extract actual user-submitted reviews — not editorial content, product "
        "descriptions, or page navigation text. "
        "IMPORTANT: Only scrape the first page of reviews. Do NOT click through to "
        "additional pages or follow pagination links. Extract what is visible on the "
        "initial page load only."
    )
The file /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/ingest.py has been updated successfully.

Updated the prompt to explicitly ask for the complete review body, not just titles. Also calls out that it may need to expand "Read more" elements — G2 truncates reviews behind those.

On the new analysis modal, add a third tab - CSV Upload, URL Scrape, and Sample Data - I created a new folder called sample-data containing csvs - can just list em out in a dropdown.

$ Bash
List sample data files
ls -la /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/sample-data/
amazon_sony_wh1000xm5_reviews.csv  16.0K
data.csv  573.6K
google_maps_bella_napoli_reviews.csv  13.2K

3 files, 0 dirs (3.csv)
$ Bash
Check CSV headers and structure
head -2 /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/sample-data/amazon_sony_wh1000xm5_reviews.csv
echo "---"
head -2 /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/sample-data/data.csv
echo "---"
head -2 /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/sample-data/google_maps_bella_napoli_reviews.csv
review_id,product_name,platform,rating,title,review_text,author,date,verified_purchase
1,Sony WH-1000XM5,Amazon,5,Best noise cancelling headphones I've ever owned,"I upgraded from the XM4 and the noise cancellation is noticeably better. The new design is lighter and more comfortable for long wearing sessions. Sound quality is phenomenal with deep bass and clear mids. Battery life easily gets me through a full work week. Worth every penny.",JohnT_audio,2024-11-15,Yes
---
,Customer Name,Review Title,Rating,Reviews
0,Rahul T,Overall I'll call it value for money although it has several limitations,4.0 out of 5 stars,"  It consumes around 10gb of storage in 24hours in 'HD' mode.Pros :Night mode recording.Clarity of camera.Multiple users can watch camera recording at same time.Two way communication.1080p HD recording.Can record videos without internet , requires internet connection after switching camera on - after that you can disconnect & recording will continue.ICons:It doesn't support AUTO 360° rotation. User has to manually adjust camera from application.If there is a power outage or by any chance you disconnect power source , camera would start recording again after you connect it to internet.You need to keep camera connected to a power source for 24hours - charging option would've been helpful here.Application sometimes sends motion alerts after delay of 30minutes(approx).Memory card of storage capacity greater than 64gb are not supported."
---
review_id,business_name,platform,rating,review_text,author,date,response_from_owner
1,Bella Napoli Italian Restaurant,Google Maps,5,"Absolutely the best Italian food outside of Italy. The homemade pasta is incredible — you can taste the difference. The carbonara was rich and perfectly seasoned. Service was attentive without being overbearing. We'll definitely be back.",SarahK,2024-12-01,Thank you Sarah! Our chef makes pasta fresh every morning. Hope to see you again soon!

Good — three CSVs with different structures. Now let me add the sample data tab, a backend endpoint to list/load them, and serve them through the existing upload flow.

Read
{
  "file_path": "/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/templates/app.html",
  "offset": 185,
  "limit": 50
}
   185→            <div class="chat-input-area">
   186→                <form id="chat-form" onsubmit="sendMessage(event)">
   187→                    <div class="input-row">
   188→                        <textarea id="chat-input" name="message" placeholder="Ask about the reviews..." rows="1" autofocus></textarea>
   189→                        <button type="submit" class="btn-send" id="send-btn">
   190→                            <svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
   191→                                <line x1="5" y1="12" x2="19" y2="12"/><polyline points="12 5 19 12 12 19"/>
   192→                            </svg>
   193→                        </button>
   194→                    </div>
   195→                </form>
   196→                <p class="input-hint">Analyses only the uploaded dataset. No external knowledge.</p>
   197→            </div>
   198→
   199→            {% else %}
   200→            <!-- Empty state -->
   201→            <div class="empty-state">
   202→                <svg width="48" height="48" viewBox="0 0 32 32" fill="none" class="empty-icon">
   203→                    <circle cx="16" cy="16" r="14" stroke="currentColor" stroke-width="1" opacity="0.3"/>
   204→                    <circle cx="16" cy="16" r="8" stroke="currentColor" stroke-width="1" opacity="0.5"/>
   205→                    <circle cx="16" cy="16" r="2.5" fill="currentColor" opacity="0.6"/>
   206→                </svg>
   207→                <h2>ReviewLens <span class="serif-accent">AI</span></h2>
   208→                <p>Upload customer reviews and explore them through a guardrailed Q&A interface.</p>
   209→                <button class="btn btn-primary" onclick="openModal()">
   210→                    <svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round">
   211→                        <line x1="12" y1="5" x2="12" y2="19"/><line x1="5" y1="12" x2="19" y2="12"/>
   212→                    </svg>
   213→                    New Analysis
   214→                </button>
   215→            </div>
   216→            {% endif %}
   217→        </main>
   218→    </div>
   219→
   220→    <!-- ── New Analysis Modal ───────────────────────────────────── -->
   221→    <div class="modal-backdrop" id="modal-backdrop" onclick="closeModal()"></div>
   222→    <div class="modal" id="analysis-modal">
   223→        <div class="modal-header">
   224→            <h2>New Analysis</h2>
   225→            <button class="modal-close" onclick="closeModal()">&times;</button>
   226→        </div>
   227→
   228→        <div class="modal-tabs">
   229→            <button class="modal-tab active" onclick="switchTab(this, 'tab-csv')">CSV Upload</button>
   230→            <button class="modal-tab" onclick="switchTab(this, 'tab-url')">URL Scrape <span class="badge">Best-effort</span></button>
   231→        </div>
   232→
   233→        <!-- CSV Tab -->
   234→        <div class="modal-tab-content active" id="tab-csv">
Read
{
  "file_path": "/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/templates/app.html",
  "offset": 280
}
   280→                    </div>
   281→                </div>
   282→
   283→                <button type="submit" class="btn btn-primary">
   284→                    <span class="btn-text">Analyse Reviews</span>
   285→                    <span class="btn-spinner" id="csv-spinner" style="display:none">
   286→                        <span class="spinner"></span> Indexing...
   287→                    </span>
   288→                </button>
   289→
   290→                <div id="csv-result"></div>
   291→            </form>
   292→        </div>
   293→
   294→        <!-- URL Tab -->
   295→        <div class="modal-tab-content" id="tab-url">
   296→            <form id="scrape-form"
   297→                  hx-post="/scrape"
   298→                  hx-target="#scrape-result"
   299→                  hx-indicator="#scrape-spinner">
   300→
   301→                <div class="form-group">
   302→                    <label for="scrape-product">Product / Entity</label>
   303→                    <input type="text" id="scrape-product" name="product_name" placeholder="e.g., AcmeCo Widget Pro">
   304→                </div>
   305→
   306→                <div class="form-group">
   307→                    <label for="scrape-platform">Platform</label>
   308→                    <select id="scrape-platform" name="platform">
   309→                        <option value="">Auto-detect</option>
   310→                        <option value="Amazon">Amazon</option>
   311→                        <option value="Google Maps">Google Maps</option>
   312→                        <option value="G2">G2</option>
   313→                        <option value="Capterra">Capterra</option>
   314→                        <option value="Trustpilot">Trustpilot</option>
   315→                        <option value="Yelp">Yelp</option>
   316→                    </select>
   317→                </div>
   318→
   319→                <div class="form-group">
   320→                    <label for="url">Review Page URL</label>
   321→                    <input type="url" id="url" name="url" placeholder="https://..." required>
   322→                </div>
   323→
   324→                <button type="submit" class="btn btn-secondary">
   325→                    <span class="btn-text">Scrape &amp; Analyse</span>
   326→                    <span class="btn-spinner" id="scrape-spinner" style="display:none">
   327→                        <span class="spinner"></span> Scraping...
   328→                    </span>
   329→                </button>
   330→
   331→                <div id="scrape-result"></div>
   332→            </form>
   333→        </div>
   334→    </div>
   335→
   336→    <script>
   337→        var SESSION_ID = "{{ active_id or '' }}";
   338→    </script>
   339→    <script src="/static/app.js"></script>
   340→</body>
   341→</html>
   342→

Now I'll add the sample data tab and backend endpoint. Need to: add the tab button, add the tab content with a dropdown, add a /sample POST route, and pass sample file list to the template.

✏️ Edit app.html
/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/templates/app.html
        <div class="modal-tabs">
            <button class="modal-tab active" onclick="switchTab(this, 'tab-csv')">CSV Upload</button>
            <button class="modal-tab" onclick="switchTab(this, 'tab-url')">URL Scrape <span class="badge">Best-effort</span></button>
        </div>
+
        <div class="modal-tabs">
            <button class="modal-tab active" onclick="switchTab(this, 'tab-csv')">CSV Upload</button>
            <button class="modal-tab" onclick="switchTab(this, 'tab-url')">URL Scrape <span class="badge">Best-effort</span></button>
            <button class="modal-tab" onclick="switchTab(this, 'tab-sample')">Sample Data</button>
        </div>
The file /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/templates/app.html has been updated successfully.
✏️ Edit app.html
/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/templates/app.html
                <div id="scrape-result"></div>
            </form>
        </div>
    </div>
+
                <div id="scrape-result"></div>
            </form>
        </div>

        <!-- Sample Data Tab -->
        <div class="modal-tab-content" id="tab-sample">
            <p class="upload-desc" style="margin-bottom:1rem">Load a pre-built dataset to explore ReviewLens without needing your own data.</p>

            <form id="sample-form"
                  hx-post="/sample"
                  hx-target="#sample-result"
                  hx-indicator="#sample-spinner">

                <div class="form-group">
                    <label for="sample-file">Dataset</label>
                    <select id="sample-file" name="filename" required>
                        <option value="" disabled selected>Choose a sample dataset...</option>
                        {% for sf in sample_files %}
                        <option value="{{ sf.filename }}">{{ sf.label }}</option>
                        {% endfor %}
                    </select>
                </div>

                <button type="submit" class="btn btn-primary">
                    <span class="btn-text">Load Dataset</span>
                    <span class="btn-spinner" id="sample-spinner" style="display:none">
                        <span class="spinner"></span> Loading...
                    </span>
                </button>

                <div id="sample-result"></div>
            </form>
        </div>
    </div>
The file /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/templates/app.html has been updated successfully.

Now add the backend — list sample files and the /sample POST route:

Read
{
  "file_path": "/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/main.py",
  "offset": 62,
  "limit": 25
}
    62→
    63→# ── App shell (home + chat share the same layout) ───────────────────
    64→
    65→def _shell_context(request: Request, session=None):
    66→    """Build template context for the app shell."""
    67→    sessions = store.list_sessions()
    68→    ctx = {
    69→        "request": request,
    70→        "sessions": sessions,
    71→        "session": session,
    72→        "summary": session.summary if session else None,
    73→        "messages": session.messages if session else [],
    74→        "active_id": session.session_id if session else None,
    75→    }
    76→    return ctx
    77→
    78→
    79→@app.get("/", response_class=HTMLResponse)
    80→async def index(request: Request):
    81→    return templates.TemplateResponse("app.html", _shell_context(request))
    82→
    83→
    84→@app.get("/chat/{session_id}", response_class=HTMLResponse)
    85→async def chat_page(request: Request, session_id: str):
    86→    session = store.load_session(session_id)
✏️ Edit main.py
/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/main.py
# ── App shell (home + chat share the same layout) ───────────────────

def _shell_context(request: Request, session=None):
    """Build template context for the app shell."""
    sessions = store.list_sessions()
    ctx = {
        "request": request,
        "sessions": sessions,
        "session": session,
        "summary": session.summary if session else None,
        "messages": session.messages if session else [],
        "active_id": session.session_id if session else None,
    }
    return ctx
+
# ── Sample data discovery ────────────────────────────────────────────

SAMPLE_DIR = BASE_DIR / "sample-data"


def _list_sample_files() -> list[dict[str, str]]:
    """List available sample CSVs with human-readable labels."""
    if not SAMPLE_DIR.exists():
        return []
    files = []
    for f in sorted(SAMPLE_DIR.glob("*.csv")):
        label = f.stem.replace("_", " ").replace("-", " ").title()
        files.append({"filename": f.name, "label": label})
    return files


# ── App shell (home + chat share the same layout) ───────────────────

def _shell_context(request: Request, session=None):
    """Build template context for the app shell."""
    sessions = store.list_sessions()
    ctx = {
        "request": request,
        "sessions": sessions,
        "session": session,
        "summary": session.summary if session else None,
        "messages": session.messages if session else [],
        "active_id": session.session_id if session else None,
        "sample_files": _list_sample_files(),
    }
    return ctx
The file /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/main.py has been updated successfully.

Now add the /sample POST route. I'll put it right after the /upload route:

Read
{
  "file_path": "/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/main.py",
  "offset": 140,
  "limit": 20
}
   140→        session_id=session_id,
   141→        product_name=summary.product_name,
   142→        platform=summary.platform,
   143→        summary=summary,
   144→        status="indexing",
   145→    )
   146→    store.save_session(session)
   147→    store.save_reviews_json(session_id, [r.model_dump(mode="json") for r in reviews])
   148→
   149→    indexed = vectordb.index_reviews(session_id, reviews)
   150→    summary.total_reviews = indexed
   151→    store.update_summary(session_id, summary)
   152→    store.set_status(session_id, "ready")
   153→
   154→    return HTMLResponse(
   155→        status_code=200,
   156→        content=f'<script>window.location.href="/chat/{session_id}";</script>',
   157→        headers={"HX-Redirect": f"/chat/{session_id}"},
   158→    )
   159→
Read
{
  "file_path": "/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/main.py",
  "offset": 159,
  "limit": 5
}
   159→
   160→
   161→# ── URL Scrape ───────────────────────────────────────────────────────
   162→
   163→@app.post("/scrape")
✏️ Edit main.py
/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/main.py
# ── URL Scrape ───────────────────────────────────────────────────────

@app.post("/scrape")
+
# ── Sample Data ──────────────────────────────────────────────────────

@app.post("/sample")
async def load_sample(
    request: Request,
    filename: str = Form(...),
):
    # Sanitise: only allow filenames that exist in sample-data/
    path = SAMPLE_DIR / filename
    if not path.exists() or not path.suffix == ".csv" or ".." in filename:
        return templates.TemplateResponse(
            "partials/error.html",
            {"request": request, "error": "Sample file not found."},
            status_code=400,
        )

    content = path.read_bytes()
    # Derive product name and platform from the filename
    stem = path.stem.replace("_", " ").replace("-", " ")
    product_name = stem.title()
    platform = ""
    for plat in ("Amazon", "Google Maps", "G2", "Capterra", "Trustpilot", "Yelp"):
        if plat.lower().replace(" ", "_") in path.stem.lower() or plat.lower().replace(" ", "") in path.stem.lower():
            platform = plat
            break

    reviews = parse_csv(content, platform=platform, product_name=product_name)

    if not reviews:
        return templates.TemplateResponse(
            "partials/error.html",
            {"request": request, "error": "Could not parse reviews from this sample file."},
            status_code=400,
        )

    session_id = str(uuid.uuid4())
    summary = build_summary(reviews, source_type="csv", product_name=product_name, platform=platform)

    session = Session(
        session_id=session_id,
        product_name=summary.product_name,
        platform=summary.platform,
        summary=summary,
        status="indexing",
    )
    store.save_session(session)
    store.save_reviews_json(session_id, [r.model_dump(mode="json") for r in reviews])

    indexed = vectordb.index_reviews(session_id, reviews)
    summary.total_reviews = indexed
    store.update_summary(session_id, summary)
    store.set_status(session_id, "ready")

    return HTMLResponse(
        status_code=200,
        content=f'<script>window.location.href="/chat/{session_id}";</script>',
        headers={"HX-Redirect": f"/chat/{session_id}"},
    )


# ── URL Scrape ───────────────────────────────────────────────────────

@app.post("/scrape")
The file /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/main.py has been updated successfully.
$ Bash
Syntax check
source .venv/bin/activate && python3 -c "import ast; ast.parse(open('app/main.py').read()); print('OK')"
OK
$ Bash
Verify sample tab and dropdown populate
# Verify the sample tab renders and files are listed
curl -s http://localhost:8000/ | grep -o 'tab-sample' | head -1 && echo "Tab present"
curl -s http://localhost:8000/ | grep -o 'Amazon Sony Wh1000xm5 Reviews\|Data\|Google Maps Bella Napoli Reviews' | sort
tab-sample
Tab present
Data
Data
Data
Data
Data
Google Maps Bella Napoli Reviews

Rendering. Let me test loading a sample:

$ Bash
Test loading a sample dataset
curl -s -X POST http://localhost:8000/sample \
  -F "filename=google_maps_bella_napoli_reviews.csv" \
  -D - | head -6
HTTP/1.1 200 OK
date: Sun, 22 Mar 2026 14:09:19 GMT
server: uvicorn
hx-redirect: /chat/03ca6b39-4f85-4244-bf34-b52e3df331db
content-length: 83
content-type: text/html; charset=utf-8

Working. The "Amazon Sony" one didn't show in the grep because the label has different casing — let me check it's actually there:

$ Bash
Verify all three sample files in dropdown
curl -s http://localhost:8000/ | grep -i 'sony\|bella\|data.csv\|sample-file' | head -6
                            <span class="session-name">Google Maps Bella Napoli Reviews</span>
                    <label for="sample-file">Dataset</label>
                    <select id="sample-file" name="filename" required>
                        <option value="amazon_sony_wh1000xm5_reviews.csv">Amazon Sony Wh1000Xm5 Reviews</option>
                        <option value="data.csv">Data</option>
                        <option value="google_maps_bella_napoli_reviews.csv">Google Maps Bella Napoli Reviews</option>

All three are in the dropdown. Done — new "Sample Data" tab in the modal with a dropdown of all CSVs from sample-data/. Selecting one and clicking "Load Dataset" parses, indexes, and redirects to the chat just like a manual CSV upload.

remove the "Best Effort" from the URL Scrape tab

✏️ Edit app.html
/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/templates/app.html
            <button class="modal-tab" onclick="switchTab(this, 'tab-url')">URL Scrape <span class="badge">Best-effort</span></button>
+
            <button class="modal-tab" onclick="switchTab(this, 'tab-url')">URL Scrape</button>
The file /Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/templates/app.html has been updated successfully.

Done.

The files include header rows of:

review_id,product_name,platform,rating,title,review_text,author,date,verified_purchase

review_id,product_name,platform,rating,title,review_text,reviewer_role,company_size,date,pros,cons

review_id,business_name,platform,rating,review_text,author,date,response_from_owner

The current csv system can handle them, correct?

Read
{
  "file_path": "/Users/shaynelarocque/Documents/GitHub/futuresight/reviewlens/app/ingest.py",
  "offset": 19,
  "limit": 30
}
    19→
    20→# ── CSV Ingestion (primary path) ────────────────────────────────────
    21→
    22→# Common column name variants we normalise to our schema.
    23→_COL_MAP: dict[str, list[str]] = {
    24→    "text": ["text", "review", "review_text", "content", "body", "comment", "review_body", "reviews", "feedback"],
    25→    "rating": ["rating", "score", "stars", "star_rating", "review_rating", "overall_rating"],
    26→    "date": ["date", "review_date", "created_at", "timestamp", "time", "posted_date", "review_time"],
    27→    "author": ["author", "reviewer", "user", "username", "reviewer_name", "name", "user_name"],
    28→    "platform": ["platform", "source", "site", "channel"],
    29→}
    30→
    31→
    32→def _normalise_col(name: str) -> str:
    33→    return re.sub(r"[^a-z0-9]", "", name.lower().strip())
    34→
    35→
    36→def _map_columns(headers: list[str]) -> dict[str, str]:
    37→    """Map CSV column names → our field names. Returns {our_field: csv_col}."""
    38→    mapping: dict[str, str] = {}
    39→    normalised = {_normalise_col(h): h for h in headers}
    40→
    41→    for field, aliases in _COL_MAP.items():
    42→        for alias in aliases:
    43→            norm = _normalise_col(alias)
    44→            if norm in normalised:
    45→                mapping[field] = normalised[norm]
    46→                break
    47→    return mapping
    48→

Checking each CSV against the alias map:

CSV 1: review_id,product_name,platform,rating,title,review_text,author,date,verified_purchase - review_text → matches review_text in text aliases ✓ - rating ✓, date ✓, author ✓, platform ✓ - title, verified_purchase, product_name, review_id → unmapped, go into metadata

CSV 2: review_id,product_name,platform,rating,title,review_text,reviewer_role,company_size,date,pros,cons - review_text ✓, rating ✓, date ✓, platform ✓ - authorno match. There's no author/reviewer/user/name column. reviewer_role normalises to reviewerrole which doesn't match reviewer (the alias check is exact after normalisation) - pros, cons, title, etc. → metadata

CSV 3: review_id,business_name,platform,rating,review_text,author,date,response_from_owner - review_text ✓, rating ✓, date ✓, author ✓, platform

So CSV 2 will lose the author field (no author column exists in that file — reviewer_role is a role like "Manager", not a name). That's fine since G2 reviews often don't have real names. The pros and cons columns will end up in metadata rather than being concatenated into the review text though — that's a minor loss since the review_text column in that file should have the main content.

All three will parse and index correctly. The only thing worth noting: title from CSVs 1 and 2 goes to metadata, not prepended to the review text. If you want titles included in the searchable text, that would be a small change — but for now it works as-is.