FMP
Jan 06, 2026
Financial markets react to news before numbers show up on charts. Earnings surprises, regulatory updates, analyst commentary, and macro headlines often shift sentiment first and price later. Traders and analysts who track only prices miss this early signal.
Manual news reading does not scale. Headline-based sentiment tools also fall short because they ignore context, tone, and company-specific language. Financial news carries nuance—words like guidance, downgrade, or headwinds change meaning based on how and where they appear.
In this article, we build an NLP-powered sentiment analyzer using the FMP News API as the data source. We fetch structured financial news, clean and normalize the text, apply an NLP sentiment model, and aggregate sentiment at the company level. The goal is simple: convert raw financial news into machine-readable sentiment signals that plug directly into quantitative workflows.
Sentiment in this pipeline acts as an input signal that complements price, volume, and fundamentals—it is not intended to function as a standalone trading decision.
Markets move on perception before confirmation. Price reacts after participants interpret news, not when the event actually occurs. News sentiment captures this interpretation layer early, often before fundamentals or technical indicators adjust.
Quantitative systems benefit from sentiment because it adds context to price and volume data. A price breakout backed by positive news sentiment carries more conviction than a breakout in isolation. Similarly, rising negative sentiment can warn of downside risk even when prices still look stable.
News sentiment also helps normalize information overload. Hundreds of articles may cover the same company in a single week. NLP-based sentiment analysis compresses this unstructured text into structured signals that models can consume consistently.
Instead of describing events, sentiment analysis captures market interpretation. This makes it useful for alpha generation, risk monitoring, and event-driven strategies.
The FMP News API provides structured, machine-readable financial news sourced from trusted publishers and market feeds. Instead of scraping headlines or parsing unstructured articles, you receive clean JSON responses designed for programmatic analysis.
Each news item includes essential metadata such as the company symbol, publication timestamp, title, and full article text. This structure makes it easy to filter news by ticker, date range, or relevance before passing the content into an NLP pipeline.
For sentiment analysis, data quality matters more than volume. The FMP News API delivers consistently formatted text, which reduces preprocessing overhead and minimizes noise during tokenization and scoring. You spend less time cleaning data and more time extracting signal.
Because the API comes from Financial Modeling Prep, it integrates naturally with other core financial datasets such as stock prices, company fundamentals, and earnings data. This makes it straightforward to combine news sentiment with market movements and financial performance in a single workflow. You can access these related datasets through FMP's stable APIs.
A reliable sentiment system starts with a clear pipeline, not with a model. Each step should transform unstructured news into structured signals without losing financial context.
The pipeline begins by ingesting raw news from the FMP News API. At this stage, the system filters articles by symbol, time window, or relevance to avoid unnecessary noise. Clean inputs keep downstream processing efficient and predictable.
Next, the pipeline standardizes the text. It removes boilerplate content, normalizes casing, and preserves financially meaningful tokens such as tickers, percentages, and currency values. This step ensures the sentiment model sees consistent and comparable inputs.
The sentiment engine then processes the cleaned text and assigns scores or labels to each article. Instead of treating sentiment as a binary outcome, the pipeline keeps continuous scores to capture intensity and uncertainty.
Finally, the system aggregates article-level sentiment into company-level signals. These outputs serve as structured features that downstream models, screens, or research workflows can consume directly. The result is a clean, repeatable pipeline that converts financial news into actionable sentiment data.
FMP gives you two simple ways to pull stock news:
FMP uses a stable base URL and expects your API key as a query parameter.
To make these API calls, you'll need your own Financial Modeling Prep API key. You can generate one by creating an FMP account and selecting a plan that includes news data access.
Rate limits and per-request limits depend on your plan tier, which means parameters such as limit may behave differently across accounts. If you hit unexpected caps or truncated responses, it's usually due to plan-level constraints rather than an issue with the API itself.
You can review available plans and generate an API key here.
|
import os import requests import pandas as pd FMP_API_KEY = os.getenv("FMP_API_KEY") # set this in your environment BASE_URL = "https://financialmodelingprep.com/stable" def fetch_stock_news_by_symbols(symbols, limit=50): """ Pull company-specific stock news for one or more symbols. Example endpoint: /stable/news/stock?symbols=AAPL """ if not FMP_API_KEY: raise ValueError("Missing FMP_API_KEY env var.") if isinstance(symbols, (list, tuple, set)): symbols = ",".join(symbols) url = f"{BASE_URL}/news/stock" params = { "symbols": symbols, "limit": limit, # if your plan supports it "apikey": FMP_API_KEY } r = requests.get(url, params=params, timeout=30) r.raise_for_status() data = r.json() df = pd.DataFrame(data) # Keep only the fields we typically need for NLP keep = [c for c in ["symbol", "publishedDate", "title", "text", "url", "site"] if c in df.columns] return df[keep] if keep else df def fetch_latest_stock_news(page=0, limit=20): """ Pull the latest stock news stream. Example endpoint: /stable/news/stock-latest?page=0&limit=20 """ if not FMP_API_KEY: raise ValueError("Missing FMP_API_KEY env var.") url = f"{BASE_URL}/news/stock-latest" params = { "page": page, "limit": limit, "apikey": FMP_API_KEY } r = requests.get(url, params=params, timeout=30) r.raise_for_status() data = r.json() return pd.DataFrame(data) # Example usage news_df = fetch_stock_news_by_symbols(["AAPL", "MSFT"], limit=50) print(news_df.head(3)) |
This step gives you a clean DataFrame with the fields you need for NLP: title + full text + timestamp + symbol.
Next, we'll standardize the text so the sentiment model sees consistent inputs.
Raw financial news contains noise that can distort sentiment scores if you pass it directly to a model. Boilerplate phrases, disclaimers, and formatting artifacts add tokens without adding meaning. A focused preprocessing step fixes this early.
Start by merging the title and article text. Headlines often carry strong sentiment cues, while the body provides context. Treat them as a single document to avoid underweighting either signal.
Next, normalize the text:
At the same time, preserve financially meaningful tokens. Do not strip tickers, percentages, or currency values. Terms like +5%, $2B, or EPS often carry sentiment weight and should remain intact.
Finally, remove generic stopwords while keeping finance-specific vocabulary. Words such as upgrade, downgrade, guidance, miss, or beat must stay. Over-aggressive cleaning weakens sentiment accuracy in financial text.
Preprocessing choices directly influence sentiment score distributions. Small changes such as removing certain tokens, collapsing numbers, or filtering boilerplate too aggressively can shift the balance between positive, neutral, and negative labels. For this reason, preprocessing steps should be validated with spot checks and basic distribution reviews before trusting downstream sentiment outputs.
|
import re import html import pandas as pd # --- Core cleaners (fast + practical) --- _URL_RE = re.compile(r"https?://\S+|www\.\S+") _HTML_TAG_RE = re.compile(r"<[^>]+>") _MULTI_SPACE_RE = re.compile(r"\s+") # Optional: remove common boilerplate phrases (you can expand over time) BOILERPLATE_PATTERNS = [ r"\bclick here\b.*", # "click here to..." r"\bread more\b.*", r"\bsubscribe\b.*", r"\bthis story was (originally )?published\b.*", r"\bnot financial advice\b.*", ] _BOILERPLATE_RE = re.compile("|".join(BOILERPLATE_PATTERNS), flags=re.IGNORECASE) def clean_financial_news_text(title: str, text: str) -> str: """ Clean financial news while preserving important tokens: - tickers (AAPL, TSLA) - currency ($, ₹, €, etc.) - percentages (5%, 2.5%) - abbreviations like EPS, YoY, QoQ """ title = title or "" text = text or "" # Combine headline + body (headline carries strong sentiment) combined = f"{title}. {text}".strip() # Decode HTML entities (e.g., & -> &) combined = html.unescape(combined) # Remove URLs and HTML tags combined = _URL_RE.sub(" ", combined) combined = _HTML_TAG_RE.sub(" ", combined) # Remove boilerplate (optional but helpful) combined = _BOILERPLATE_RE.sub(" ", combined) # Normalize apostrophes and dashes a bit combined = combined.replace("\u2019", "'").replace("\u2013", "-").replace("\u2014", "-") # Keep finance tokens, remove weird leftover symbols except common finance ones # Allowed: letters/numbers, spaces, and these symbols: $ ₹ € % . , - + / & ' : combined = re.sub(r"[^a-zA-Z0-9\s\$\₹\€\%\.\,\-\+\/\&\'\:]", " ", combined) # Lowercase for consistency (works well for most models) combined = combined.lower() # Collapse extra whitespace combined = _MULTI_SPACE_RE.sub(" ", combined).strip() return combined def preprocess_news_df(news_df: pd.DataFrame) -> pd.DataFrame: """ Adds: - doc_text: cleaned text used for sentiment """ df = news_df.copy() if "title" not in df.columns: df["title"] = "" if "text" not in df.columns: df["text"] = "" df["doc_text"] = [ clean_financial_news_text(t, x) for t, x in zip(df["title"].astype(str), df["text"].astype(str)) ] # Drop empty docs (no signal) df = df[df["doc_text"].str.len() > 0].reset_index(drop=True) return df # Example usage # news_df = fetch_stock_news_by_symbols(["AAPL", "MSFT"], limit=50) clean_df = preprocess_news_df(news_df) |
Now you have doc_text ready for scoring. In this step, you convert each news document into a sentiment label and a numeric score you can aggregate later. Sentiment labels and scores are model-dependent outputs and should be treated as probabilistic signals rather than ground truth classifications.
I'll share a production-friendly setup with two options:
|
import numpy as np import pandas as pd from transformers import pipeline def build_finbert_pipeline(device: int = -1): """ device = -1 for CPU, >=0 for GPU """ model_name = "ProsusAI/finbert" # widely used finance sentiment model clf = pipeline( task="text-classification", model=model_name, tokenizer=model_name, return_all_scores=True, device=device ) return clf def finbert_score_one(clf, text: str): """ Returns: - label: positive/negative/neutral - score: continuous sentiment score in [-1, 1] computed as P(positive) - P(negative) - probs: dict of probabilities """ scores = clf(text[:512])[0] # truncate to avoid very long texts probs = {d["label"].lower(): float(d["score"]) for d in scores} pos = probs.get("positive", 0.0) neg = probs.get("negative", 0.0) neu = probs.get("neutral", 0.0) sentiment_score = pos - neg # range ~[-1, 1] label = max([("positive", pos), ("neutral", neu), ("negative", neg)], key=lambda x: x[1])[0] return label, sentiment_score, probs def score_news_finbert(df: pd.DataFrame, text_col: str = "doc_text", device: int = -1) -> pd.DataFrame: clf = build_finbert_pipeline(device=device) labels, scores, pos_probs, neg_probs, neu_probs = [], [], [], [], [] for text in df[text_col].fillna("").astype(str): label, score, probs = finbert_score_one(clf, text) labels.append(label) scores.append(score) pos_probs.append(probs.get("positive", 0.0)) neg_probs.append(probs.get("negative", 0.0)) neu_probs.append(probs.get("neutral", 0.0)) out = df.copy() out["sentiment_label"] = labels out["sentiment_score"] = scores out["p_pos"] = pos_probs out["p_neg"] = neg_probs out["p_neu"] = neu_probs return out # Example usage scored_df = score_news_finbert(clean_df, text_col="doc_text", device=-1) |
Note that inputs are truncated to 512 tokens to match model limits. For long-form articles, this can bias sentiment by dropping later context. A chunking-based approach for handling long texts is discussed later in the limitations section.
VADER is a general-purpose sentiment model and is not trained on financial language. It works well for quick prototyping, sanity checks, or baseline comparisons, but it should not be used for serious production-grade financial sentiment analysis where domain-specific models are required.
|
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer def score_news_vader(df: pd.DataFrame, text_col: str = "doc_text") -> pd.DataFrame: analyzer = SentimentIntensityAnalyzer() out = df.copy() compound_scores = [] labels = [] for text in out[text_col].fillna("").astype(str): s = analyzer.polarity_scores(text) compound = float(s["compound"]) # in [-1, 1] compound_scores.append(compound) if compound >= 0.05: labels.append("positive") elif compound <= -0.05: labels.append("negative") else: labels.append("neutral") out["sentiment_score"] = compound_scores out["sentiment_label"] = labels return out # Example usage # scored_df = score_news_vader(clean_df, text_col="doc_text") |
At the end of this section, you have article-level sentiment per row.
Next, we'll aggregate these scores into a company-level sentiment signal you can track day-by-day or over rolling windows.
At this stage, scored_df contains one sentiment score per article. Now you convert it into a company-level signal by aggregating sentiment by symbol and date. This step makes the output usable for screening, modeling, and monitoring.
Aggregation windows should be chosen based on the downstream use case. Short windows may suit event-driven trading, while longer windows are more appropriate for monitoring trends or supporting research workflows. Using daily sentiment signals without aligning them to the intended application can lead to overreaction or misinterpretation.
|
import pandas as pd def add_news_date(df: pd.DataFrame, published_col: str = "publishedDate") -> pd.DataFrame: out = df.copy() out[published_col] = pd.to_datetime(out[published_col], errors="coerce", utc=True) out = out.dropna(subset=[published_col]).reset_index(drop=True) out["news_date"] = out[published_col].dt.date # daily bucket return out scored_df = add_news_date(scored_df) |
|
def daily_company_sentiment(df: pd.DataFrame) -> pd.DataFrame: """ Returns one row per (symbol, news_date) with: - avg sentiment - volume (#articles) - positive/negative/neutral counts """ g = df.groupby(["symbol", "news_date"], as_index=False) daily = g.agg( avg_sentiment=("sentiment_score", "mean"), med_sentiment=("sentiment_score", "median"), news_count=("sentiment_score", "size"), pos_count=("sentiment_label", lambda s: (s == "positive").sum()), neg_count=("sentiment_label", lambda s: (s == "negative").sum()), neu_count=("sentiment_label", lambda s: (s == "neutral").sum()), ) # Optional: a simple “balance” metric that penalizes negative coverage daily["sentiment_balance"] = (daily["pos_count"] - daily["neg_count"]) / daily["news_count"].clip(lower=1) return daily.sort_values(["symbol", "news_date"]).reset_index(drop=True) daily_df = daily_company_sentiment(scored_df) daily_df.head(10) |
Rolling features help you avoid overreacting to one viral headline. Rolling windows introduce lag by design. While they reduce noise, they can delay signal response, which is an important tradeoff to consider when using sentiment in faster or event-driven strategies.
|
def add_rolling_sentiment(daily_df: pd.DataFrame, window: int = 7) -> pd.DataFrame: out = daily_df.copy() out["news_date"] = pd.to_datetime(out["news_date"]) out = out.sort_values(["symbol", "news_date"]) out[f"roll_{window}d_sentiment"] = ( out.groupby("symbol")["avg_sentiment"] .transform(lambda s: s.rolling(window=window, min_periods=1).mean()) ) out[f"roll_{window}d_news_count"] = ( out.groupby("symbol")["news_count"] .transform(lambda s: s.rolling(window=window, min_periods=1).sum()) ) return out daily_df = add_rolling_sentiment(daily_df, window=7) daily_df[["symbol", "news_date", "avg_sentiment", "roll_7d_sentiment", "news_count", "roll_7d_news_count"]].head(10) |
Now you have a compact company-level dataset: daily sentiment + volume + rolling sentiment.
Even with a clean pipeline, financial news sentiment comes with edge cases. Knowing these limits helps you design safer and more reliable systems.
Transformer models like FinBERT process a limited number of tokens. Truncation can drop important context from long reports.
Improvement: chunk long articles and average sentiment across chunks.
|
def chunk_text(text: str, max_chars: int = 450): return [text[i:i + max_chars] for i in range(0, len(text), max_chars)] def score_with_chunks(clf, text: str): chunks = chunk_text(text) scores = [] for c in chunks: _, s, _ = finbert_score_one(clf, c) scores.append(s) return sum(scores) / len(scores) if scores else 0.0 |
The same story often appears across multiple publishers. Counting all of them inflates sentiment strength.
Improvement: de-duplicate using headline similarity.
|
from difflib import SequenceMatcher def is_duplicate(title_a, title_b, threshold=0.9): return SequenceMatcher(None, title_a.lower(), title_b.lower()).ratio() >= threshold |
Language changes. Market jargon evolves.
Improvement: periodically re-evaluate sentiment distributions and retrain or fine-tune when drift appears.
|
scored_df["sentiment_bucket"] = pd.cut( scored_df["sentiment_score"], bins=[-1, -0.3, 0.3, 1], labels=["negative", "neutral", "positive"] ) |
These refinements keep sentiment signals stable, interpretable, and production-ready.
In this article, we built an end-to-end NLP-powered sentiment analyzer using the FMP News API as the data foundation. Starting from structured financial news, we designed a clean pipeline that preprocesses text, applies a finance-aware sentiment model, and aggregates signals at the company level.
This approach avoids fragile heuristics and manual interpretation. It converts unstructured news into consistent, machine-readable sentiment features that fit naturally into quantitative workflows. Because the pipeline stays modular, you can extend it with better models, event tagging, or tighter aggregation logic as your use case evolves.
With high-quality news data from Financial Modeling Prep and a disciplined NLP pipeline, sentiment analysis becomes a practical signal—not a black box.
Introduction In corporate finance, assessing how effectively a company utilizes its capital is crucial. Two key metri...
Bank of America analysts reiterated a bullish outlook on data center and artificial intelligence capital expenditures fo...
Pinduoduo Inc., listed on the NASDAQ as PDD, is a prominent e-commerce platform in China, also operating internationally...