Agentic AI and NLP based sentiment for portfolio monitoring
- Brian MacSweeney
- Aug 12
- 8 min read
Brian MacSweeney & Quang Viet Tran [with guidance from Milind Sharma]***
*** Both authors are MSCF candidates at Carnegie Mellon
Introduction
In modern financial markets, news flow serves as a primary catalyst for stock price movements. Large one-day moves, driven by events like earnings surprises, regulatory developments, mergers and acquisitions (M&A), analyst upgrades/downgrades, or idiosyncratic shocks, require traders and portfolio managers to quickly answer: What caused this? This question is critical for intraday trading, portfolio rebalancing, risk attribution, and internal reporting. Historically, answering it involved manual analysis of news terminals, a process that is slow, error-prone, and unscalable across a broad equity universe. Keyword-based alert systems often produce noisy, low-precision results, further complicating the task.
This blog presents a scalable, end-to-end framework for attributing large stock price movements using agentic AI. Our Python-based pipeline integrates real-time price screening, multi-source news aggregation, large language model (LLM)-driven relevance scoring and summarization, and sentiment classification using FinBERT as well as LLMs. FinBERT is trained specifically on financial text, providing domain-optimized sentiment scoring that captures subtle tone shifts in market language. Designed to emulate the intuition of a seasoned analyst, the system delivers precise, low-latency explanations for stock price movements, often within seconds of market close. This pipeline was created using low to no cost resources as a proof of concept. By democratizing access to sophisticated financial analysis, it empowers practitioners, regardless of budget constraints, to scale insights across portfolios, making it a practical and reproducible tool for quantitative research, trading, and risk management.
System Architecture Overview
The pipeline is composed of six main stages, each of which replicates a part of the analyst's workflow when investigating large price swings:
Price Movement Detection: Identify stocks with abnormal daily returns.
Multi-Source News Retrieval: Aggregate relevant news from diverse sources.
Relevance Scoring with Gemini: Filter articles for company-specific relevance.
Article Summarization and Key Point Extraction: Condense news into actionable insights.
Sentiment Classification with FinBERT: Quantify the sentiment of news summaries.
Structured Output Generation and Reporting: Produce interpretable reports.
1. Price Signal Detection: Identifying Abnormal Returns
The pipeline begins with the union set of stocks from either of QMIT's QUMN or QLBO indices:
Using the Yahoo finance library, we retrieve daily adjusted closing prices for the past three days and compute one-day percentage returns. Stocks with one-day absolute returns exceeding ±5% are flagged as significant. This threshold balances sensitivity to meaningful events (e.g., earnings, M&A rumors) with the need to filter out routine volatility. On average, 3–7 stocks are flagged daily, ensuring a manageable yet diverse set of candidates for analysis.
2. Multi-Source News Retrieval
Once the abnormal moves are flagged, the pipeline collects relevant news for each ticker-date combination. We use three major sources to maximize coverage and redundancy:
NewsAPI: Provides premium content from sources like Reuters, Bloomberg, The Wall Street Journal, CNBC, and Financial Times. We query using the company name or ticker, targeting articles within a ±1-day window of the price move (for relevance).
Google News RSS: Uses structured queries (e.g., "TICKER stock after:YYYY-MM-DD before:YYYY-MM-DD") to capture additional headlines, enhancing recall.
Yahoo Finance RSS: Supplies company-specific headlines, often with high relevance to stock movements.
Article headlines, URLs, and (where publicly accessible) full text are retrieved. Full article content is scraped using BeautifulSoup, with URLs normalized to remove query strings that may cause 404 errors. Headlines and URLs are deduplicated based on normalized text and domain paths to eliminate redundant or syndicated content, ensuring a clean dataset for analysis.
It’s critical to rely only on credible, reputable news sources to prevent contamination of the feed by rampant dis-information on the internet.
3. Relevance Scoring with Gemini
Raw news feeds often include irrelevant content–industry-wide trends, broad market commentary, or passing mentions of a company with no material bearing on its stock price. To isolate genuinely explanatory articles, we employ the Gemini-2.5-Pro large language model to score the relevance of each article with appropriate prompts. A relevance score is extracted from Gemini's output. Articles scoring below a default threshold of 0.70 are discarded. If no articles meet that cutoff, the system lowers the threshold successively to 0.50 and 0.25. If no article crosses even the lowest bar, the system outputs: “No news articles found meeting relevancy score criteria.” This multi-tiered fallback ensures resilience while avoiding spurious attributions.
4. Structured Summarization of Key News
For the subset of high-relevance articles, we re-engage Gemini with a summarization task. The prompt instructs the model to extract up to three sentences that summarize the key drivers of the move, followed by a single narrative explanation of the likely cause and sentiment (positive, negative, or mixed). The key is to enumerate the list of 20 catalysts/ news categories that we care about:
1. Earnings Report
2. Guidance/ Forecast
3. Merger & Acquisition (M&A)
4. Share Buyback
5. Analyst Upgrade/Downgrade
6. Dividend Changes
7. FDA/ Clinical-Trial News
8. CEO/ CFO Change
9. Stock Split Announcement
10. Index Rebalancing
11. Activist Investor News
12. Short Squeeze / Meme Hype (Wall Street Bets)
13. Insider Transactions
14. Litigation / Regulatory Action
15. Secondary Offering (Equity Issuance)
16. Cybersecurity Breach/ Data Leak
17. Trade/ Tariff News
18. Economic Data / Fed Announcements
19. Major Contract Wins or Product Launches
20. Patent / Licensing News
This stage of the process transforms a large corpus of unstructured text into a compact, human-readable summary suitable for internal notes, dashboards, or research briefings.
5. Sentiment Classification with FinBERT
Each Gemini-generated summary is then passed to FinBERT, a pre-trained transformer model fine-tuned for sentiment analysis in financial text. FinBERT outputs probabilities for three classes:
Positive
Negative
Neutral
We compute a sentiment score for the single paragraph synopsis of all stories as the difference between positive and negative probabilities for each stock-date pair. It provides a normalized, interpretable measure of sentiment intensity. In most cases, the sentiment aligns with the direction of the stock move (e.g., highly positive scores on strong up days), providing additional signal validation. Formula for sentiment score:
Sentiment Score = P(positive) - P(negative)
6. Final Report Generation
All results are consolidated into a structured output table that includes the following fields for each stock and date:
Date: Date of the price movement.
Ticker: Stock ticker symbol.
Percent Change: One-day percentage return.
Number of Relevant Articles: Count of articles meeting the relevance threshold.
Sentiment Score: Sentiment score for the single paragraph synopsis.
Overall Sentiment Label: Classified as Positive (>0.2), Negative (<-0.2), or Neutral.
Summary Paragraph: List of key news summaries.
Full LLM Explanation: Gemini’s narrative explanation.
The final output is suitable for internal reporting, post-trade review, or alerting systems. It is both interpretable and extensible—supporting applications in quant signal development, trader commentary, and risk diagnostics.
Preliminary Results and Observations
Testing on a sample of U.S. small- and mid-cap stocks on August 7, 2025, demonstrates the pipeline’s effectiveness:
Coverage: The system identified relevant articles for ~95% of large-move days as per our +-5% threshold, with Yahoo Finance and Google RSS providing robust coverage when premium sources were limited by paywalls.
Accuracy: LLM-generated summaries accurately pinpointed catalysts in cases with clear drivers (e.g., earnings announcements), validated by sentiment scores aligning with price direction (positive scores > 0.2 for up days, negative scores <-0.2 for down days). FinBERT sentiment is fairly reliable based on the paragraph synopsis approach but the far more parsimonious FinBERT model has clearly been superseded by GPT-5 and Gemini 2.5 Pro.
Null Cases: When no relevant news was found, the system correctly flagged ambiguous or technical price movements, mirroring a human analyst’s judgment.
Limitations: Paywalled articles reduced full-text availability, forcing reliance on headlines and descriptions. This occasionally limited Gemini’s summarization depth but did not significantly impair relevance scoring. Clearly not a realistic issue for any commercial production system.
Case Study: ARIS +19.32% on August 7, 2025
Number of Relevant Articles Used: 5
Date: 2025-08-07
Ticker: ARIS (Aris Water Solutions)
Percent Change: +19.32%
Overall Sentiment Score: 0.9500 (Positive)
Top Headlines Summary:
Aris Water Solutions surged after Western Midstream Partners announced a definitive agreement to acquire the company. The $1.5 billion transaction was the key driver of the price increase. Acquisition news created a decisive positive sentiment, reflected in the immediate market reaction.
Analysis of Output: A major M&A announcement with clear, unambiguous implications drove strong positive sentiment and a near 20% gain; an excellent example of event-driven price action captured accurately by the LLM.
Case Study: CROX -29.24% on August 7, 2025
Number of Relevant Articles Used: 14
Date: 2025-08-07
Ticker: CROX (Crocs)
Percent Change: -29.24%
Overall Sentiment Score: -0.9000 (Negative)
Top Headlines Summary:
Crocs beat Q2 earnings and revenue estimates but issued weak forward guidance. The company forecasted a sales decline for the upcoming quarter and withheld its full-year outlook. Concerns about consumer demand and potential tariffs amplified the negative market reaction.
Analysis of Output: Despite an earnings beat, the sharp downward guidance shift generated a strong negative sentiment, fully aligned with the steep sell-off. This is a textbook "guidance-over-earnings" market reaction and one which the LLM was able to discern that nuance as compared to FinBERT.
Case Study: UDMY -5.43% on August 7, 2025
Number of Relevant Articles Used: 0
Date: 2025-08-07
Ticker: UDMY (Udemy)
Percent Change: -5.43%
Overall Sentiment Score: 0.0000 (Neutral)
Top Headlines Summary:
No articles met the relevancy cutoff for this ticker.
Analysis of Output: The absence of relevant news articles explains the neutral sentiment score despite the stock decline. This demonstrates the LLM's ability to avoid manufacturing sentiment where no substantive news context exists.
Discussion
The pipeline replicates an analyst’s workflow with the ability to identify and explain notable movers both during market hours and immediately after close with significant advantages:
Scalability: Processes hundreds of stocks daily, far exceeding manual capabilities.
Speed: Delivers insights within seconds, supporting real-time decision-making.
Precision: Combines LLM reasoning with transformer-based sentiment analysis for high-fidelity attributions.
Challenges include:
Ambiguity: In cases of no relevant news, price moves may stem from technical factors (e.g., short squeezes, rebalancing) not captured by news feeds.
Technological Vulnerability: Exposed to both reliability of cloud provider, ISP, and API.
We present a tabular comparison of GPT vs FinBERT below in Table 1 to demonstrate that in order to accurately capture the nuances of earnings calls and FOMC statements it’s important to ingest the full text via an LLM instead of the far smaller FinBERT (finance‑specialized BERT classifier). The attention mechanism of the LLM transformer architecture allows it to pick up on seemingly conflicting sentiment when dealing with earnings vs revenue beats/ misses and forward guidance with much greater accuracy and richer nuance, but at materially higher cost.
Dimension | FinBERT | LLM‑based sentiment (e.g., GPT‑4/ChatGPT) |
Model family & size | BERT‑base encoder (≈110 M params, 12 layers). Built for classification, no generative head. | Decoder‑only or Mixture‑of‑Experts LLMs with 7 B‑1 T+ params; full text generation plus instruction tuning. |
Training corpus | Continues BERT pre‑training on financial news (Reuters TRC2) and fine‑tunes on Financial PhraseBank sentences. | Trained first on massive general‑web corpora; sentiment queries rely on zero‑/few‑shot prompting or light domain‑fine‑tuning. Domain knowledge is incidental unless an extra financial adaptor is added. |
Task formulation | Supervised 3‑way classifier (positive / negative / neutral) with a soft‑max head; outputs probabilities directly. | Sentiment expressed via generated text (e.g., “POSITIVE, score = 0.87”). Requires prompt design or post‑processing to map prose into numeric scores. |
Compute & latency | Runs on a single CPU/GPU and accepts ≤512 tokens; deterministic inference. | High VRAM/TPU demand, longer context windows, higher latency and cost. |
Adaptability | Must be re‑trained to add classes or new jargon; architecture fixed. | New labels or dimensions (tone, policy stance, sarcasm) can be added with prompt engineering; no weight update needed. |
Empirical accuracy | Long‑standing baseline for finance–beats generic BERT on news and 10‑K sentiment. | Recent studies show GPT‑3.5/4 zero‑shot beats FinBERT by ~35 % F1 on forex‑news sentiment, given carefully designed prompts. |
Explainability | Attention heat‑maps; limited to 512‑token window. | Can return chain‑of‑thought justifications (if enabled), aiding auditability but also raising privacy risk. |
Licensing / cost | Apache‑2.0, free to deploy on‑prem. | Mostly proprietary API access billed per‑token; open‑weight LLMs exist but still heavier than FinBERT. |
Conclusion and Future Work
This system provides a practical and scalable solution to a common workflow bottleneck in institutional asset management: determining the reason behind a stock’s large move. By combining structured data filtering, LLM-based reasoning, and sentiment modeling, it produces high-precision, low-latency attributions that align closely with how analysts and PMs think.
Future improvements may include:
Using web search APIs like Perplexity or Bing for broader news discovery
Incorporating SEC filings or earnings call transcripts for better attribution
Real-time deployment via cloud functions or trading dashboards
Extending the system to international stocks or ETFs
As markets become increasingly data-driven, such tools will be essential for bridging human expertise with computational scale, enabling faster and more informed financial decisions. This highlights the power of the Agentic-AI framework and the promise of deploying multi-agent systems to handle the multi-faceted issues above.