Market News Education How To Discounted Cash Flow Model Developer

NLP-Powered Sentiment Analyzer Using FMP News API

Jan 06, 2026

Get More Data

FMP API

Company Profile Financial Statements Analyst Estimates Dividends

Financial markets react to news before numbers show up on charts. Earnings surprises, regulatory updates, analyst commentary, and macro headlines often shift sentiment first and price later. Traders and analysts who track only prices miss this early signal.

Manual news reading does not scale. Headline-based sentiment tools also fall short because they ignore context, tone, and company-specific language. Financial news carries nuance—words like guidance, downgrade, or headwinds change meaning based on how and where they appear.

In this article, we build an NLP-powered sentiment analyzer using the FMP News API as the data source. We fetch structured financial news, clean and normalize the text, apply an NLP sentiment model, and aggregate sentiment at the company level. The goal is simple: convert raw financial news into machine-readable sentiment signals that plug directly into quantitative workflows.

Sentiment in this pipeline acts as an input signal that complements price, volume, and fundamentals—it is not intended to function as a standalone trading decision.

Why Use News Sentiment in Financial Analysis

Markets move on perception before confirmation. Price reacts after participants interpret news, not when the event actually occurs. News sentiment captures this interpretation layer early, often before fundamentals or technical indicators adjust.

Quantitative systems benefit from sentiment because it adds context to price and volume data. A price breakout backed by positive news sentiment carries more conviction than a breakout in isolation. Similarly, rising negative sentiment can warn of downside risk even when prices still look stable.

News sentiment also helps normalize information overload. Hundreds of articles may cover the same company in a single week. NLP-based sentiment analysis compresses this unstructured text into structured signals that models can consume consistently.

Instead of describing events, sentiment analysis captures market interpretation. This makes it useful for alpha generation, risk monitoring, and event-driven strategies.

Overview of FMP News API

The FMP News API provides structured, machine-readable financial news sourced from trusted publishers and market feeds. Instead of scraping headlines or parsing unstructured articles, you receive clean JSON responses designed for programmatic analysis.

Each news item includes essential metadata such as the company symbol, publication timestamp, title, and full article text. This structure makes it easy to filter news by ticker, date range, or relevance before passing the content into an NLP pipeline.

For sentiment analysis, data quality matters more than volume. The FMP News API delivers consistently formatted text, which reduces preprocessing overhead and minimizes noise during tokenization and scoring. You spend less time cleaning data and more time extracting signal.

Because the API comes from Financial Modeling Prep, it integrates naturally with other core financial datasets such as stock prices, company fundamentals, and earnings data. This makes it straightforward to combine news sentiment with market movements and financial performance in a single workflow. You can access these related datasets through FMP's stable APIs.

Designing the NLP Sentiment Pipeline

A reliable sentiment system starts with a clear pipeline, not with a model. Each step should transform unstructured news into structured signals without losing financial context.

The pipeline begins by ingesting raw news from the FMP News API. At this stage, the system filters articles by symbol, time window, or relevance to avoid unnecessary noise. Clean inputs keep downstream processing efficient and predictable.

Next, the pipeline standardizes the text. It removes boilerplate content, normalizes casing, and preserves financially meaningful tokens such as tickers, percentages, and currency values. This step ensures the sentiment model sees consistent and comparable inputs.

The sentiment engine then processes the cleaned text and assigns scores or labels to each article. Instead of treating sentiment as a binary outcome, the pipeline keeps continuous scores to capture intensity and uncertainty.

Finally, the system aggregates article-level sentiment into company-level signals. These outputs serve as structured features that downstream models, screens, or research workflows can consume directly. The result is a clean, repeatable pipeline that converts financial news into actionable sentiment data.

Fetching Financial News Using FMP News API

FMP gives you two simple ways to pull stock news:

Company-specific feed when you already know the ticker (best for sentiment per symbol)
Latest stock news feed when you want a broad stream (best for watchlists and scanners)

FMP uses a stable base URL and expects your API key as a query parameter.

To make these API calls, you'll need your own Financial Modeling Prep API key. You can generate one by creating an FMP account and selecting a plan that includes news data access.

Rate limits and per-request limits depend on your plan tier, which means parameters such as limit may behave differently across accounts. If you hit unexpected caps or truncated responses, it's usually due to plan-level constraints rather than an issue with the API itself.

You can review available plans and generate an API key here.

import os

import requests

import pandas as pd

FMP_API_KEY = os.getenv("FMP_API_KEY") # set this in your environment

BASE_URL = "https://financialmodelingprep.com/stable"

def fetch_stock_news_by_symbols(symbols, limit=50):

"""

Pull company-specific stock news for one or more symbols.

Example endpoint: /stable/news/stock?symbols=AAPL

"""

if not FMP_API_KEY:

raise ValueError("Missing FMP_API_KEY env var.")

if isinstance(symbols, (list, tuple, set)):

symbols = ",".join(symbols)

url = f"{BASE_URL}/news/stock"

params = {

"symbols": symbols,

"limit": limit, # if your plan supports it

"apikey": FMP_API_KEY

}

r = requests.get(url, params=params, timeout=30)

r.raise_for_status()

data = r.json()

df = pd.DataFrame(data)

# Keep only the fields we typically need for NLP

keep = [c for c in ["symbol", "publishedDate", "title", "text", "url", "site"] if c in df.columns]

return df[keep] if keep else df

def fetch_latest_stock_news(page=0, limit=20):

"""

Pull the latest stock news stream.

Example endpoint: /stable/news/stock-latest?page=0&limit=20

"""

if not FMP_API_KEY:

raise ValueError("Missing FMP_API_KEY env var.")

url = f"{BASE_URL}/news/stock-latest"

params = {

"page": page,

"limit": limit,

"apikey": FMP_API_KEY

}

r = requests.get(url, params=params, timeout=30)

r.raise_for_status()

data = r.json()

return pd.DataFrame(data)

# Example usage

news_df = fetch_stock_news_by_symbols(["AAPL", "MSFT"], limit=50)

print(news_df.head(3))

This step gives you a clean DataFrame with the fields you need for NLP: title + full text + timestamp + symbol.

Next, we'll standardize the text so the sentiment model sees consistent inputs.

Text Cleaning and Preprocessing for Financial News

Raw financial news contains noise that can distort sentiment scores if you pass it directly to a model. Boilerplate phrases, disclaimers, and formatting artifacts add tokens without adding meaning. A focused preprocessing step fixes this early.

Start by merging the title and article text. Headlines often carry strong sentiment cues, while the body provides context. Treat them as a single document to avoid underweighting either signal.

Next, normalize the text:

Convert text to lowercase for consistency
Remove URLs, HTML tags, and publisher boilerplate
Collapse extra whitespace and line breaks

At the same time, preserve financially meaningful tokens. Do not strip tickers, percentages, or currency values. Terms like +5%, $2B, or EPS often carry sentiment weight and should remain intact.

Finally, remove generic stopwords while keeping finance-specific vocabulary. Words such as upgrade, downgrade, guidance, miss, or beat must stay. Over-aggressive cleaning weakens sentiment accuracy in financial text.

Preprocessing choices directly influence sentiment score distributions. Small changes such as removing certain tokens, collapsing numbers, or filtering boilerplate too aggressively can shift the balance between positive, neutral, and negative labels. For this reason, preprocessing steps should be validated with spot checks and basic distribution reviews before trusting downstream sentiment outputs.

import re

import html

import pandas as pd

# --- Core cleaners (fast + practical) ---

_URL_RE = re.compile(r"https?://\S+|www\.\S+")

_HTML_TAG_RE = re.compile(r"<[^>]+>")

_MULTI_SPACE_RE = re.compile(r"\s+")

# Optional: remove common boilerplate phrases (you can expand over time)

BOILERPLATE_PATTERNS = [

r"\bclick here\b.*", # "click here to..."

r"\bread more\b.*",

r"\bsubscribe\b.*",

r"\bthis story was (originally )?published\b.*",

r"\bnot financial advice\b.*",

]

_BOILERPLATE_RE = re.compile("|".join(BOILERPLATE_PATTERNS), flags=re.IGNORECASE)

def clean_financial_news_text(title: str, text: str) -> str:

"""

Clean financial news while preserving important tokens:

- tickers (AAPL, TSLA)

- currency ($, ₹, €, etc.)

- percentages (5%, 2.5%)

- abbreviations like EPS, YoY, QoQ

"""

title = title or ""

text = text or ""

# Combine headline + body (headline carries strong sentiment)

combined = f"{title}. {text}".strip()

# Decode HTML entities (e.g., & -> &)

combined = html.unescape(combined)

# Remove URLs and HTML tags

combined = _URL_RE.sub(" ", combined)

combined = _HTML_TAG_RE.sub(" ", combined)

# Remove boilerplate (optional but helpful)

combined = _BOILERPLATE_RE.sub(" ", combined)

# Normalize apostrophes and dashes a bit

combined = combined.replace("\u2019", "'").replace("\u2013", "-").replace("\u2014", "-")

# Keep finance tokens, remove weird leftover symbols except common finance ones

# Allowed: letters/numbers, spaces, and these symbols: $ ₹ € % . , - + / & ' :

combined = re.sub(r"[^a-zA-Z0-9\s\$\₹\€\%\.\,\-\+\/\&\'\:]", " ", combined)

# Lowercase for consistency (works well for most models)

combined = combined.lower()

# Collapse extra whitespace

combined = _MULTI_SPACE_RE.sub(" ", combined).strip()

return combined

def preprocess_news_df(news_df: pd.DataFrame) -> pd.DataFrame:

"""

Adds:

- doc_text: cleaned text used for sentiment

"""

df = news_df.copy()

if "title" not in df.columns:

df["title"] = ""

if "text" not in df.columns:

df["text"] = ""

df["doc_text"] = [

clean_financial_news_text(t, x) for t, x in zip(df["title"].astype(str), df["text"].astype(str))

]

# Drop empty docs (no signal)

df = df[df["doc_text"].str.len() > 0].reset_index(drop=True)

return df

# Example usage

# news_df = fetch_stock_news_by_symbols(["AAPL", "MSFT"], limit=50)

clean_df = preprocess_news_df(news_df)

Building the Sentiment Analyzer

Now you have doc_text ready for scoring. In this step, you convert each news document into a sentiment label and a numeric score you can aggregate later. Sentiment labels and scores are model-dependent outputs and should be treated as probabilistic signals rather than ground truth classifications.

I'll share a production-friendly setup with two options:

FinBERT (recommended) for finance-focused sentiment
VADER (fallback) when you want a lightweight, no-GPU baseline

Option A: FinBERT-based sentiment (finance-focused)

import numpy as np

import pandas as pd

from transformers import pipeline

def build_finbert_pipeline(device: int = -1):

"""

device = -1 for CPU, >=0 for GPU

"""

model_name = "ProsusAI/finbert" # widely used finance sentiment model

clf = pipeline(

task="text-classification",

model=model_name,

tokenizer=model_name,

return_all_scores=True,

device=device

)

return clf

def finbert_score_one(clf, text: str):

"""

Returns:

- label: positive/negative/neutral

- score: continuous sentiment score in [-1, 1]

computed as P(positive) - P(negative)

- probs: dict of probabilities

"""

scores = clf(text[:512])[0] # truncate to avoid very long texts

probs = {d["label"].lower(): float(d["score"]) for d in scores}

pos = probs.get("positive", 0.0)

neg = probs.get("negative", 0.0)

neu = probs.get("neutral", 0.0)

sentiment_score = pos - neg # range ~[-1, 1]

label = max([("positive", pos), ("neutral", neu), ("negative", neg)], key=lambda x: x[1])[0]

return label, sentiment_score, probs

def score_news_finbert(df: pd.DataFrame, text_col: str = "doc_text", device: int = -1) -> pd.DataFrame:

clf = build_finbert_pipeline(device=device)

labels, scores, pos_probs, neg_probs, neu_probs = [], [], [], [], []

for text in df[text_col].fillna("").astype(str):

label, score, probs = finbert_score_one(clf, text)

labels.append(label)

scores.append(score)

pos_probs.append(probs.get("positive", 0.0))

neg_probs.append(probs.get("negative", 0.0))

neu_probs.append(probs.get("neutral", 0.0))

out = df.copy()

out["sentiment_label"] = labels

out["sentiment_score"] = scores

out["p_pos"] = pos_probs

out["p_neg"] = neg_probs

out["p_neu"] = neu_probs

return out

# Example usage

scored_df = score_news_finbert(clean_df, text_col="doc_text", device=-1)

Note that inputs are truncated to 512 tokens to match model limits. For long-form articles, this can bias sentiment by dropping later context. A chunking-based approach for handling long texts is discussed later in the limitations section.

Option B: Lightweight fallback with VADER (fast baseline)

VADER is a general-purpose sentiment model and is not trained on financial language. It works well for quick prototyping, sanity checks, or baseline comparisons, but it should not be used for serious production-grade financial sentiment analysis where domain-specific models are required.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

def score_news_vader(df: pd.DataFrame, text_col: str = "doc_text") -> pd.DataFrame:

analyzer = SentimentIntensityAnalyzer()

out = df.copy()

compound_scores = []

labels = []

for text in out[text_col].fillna("").astype(str):

s = analyzer.polarity_scores(text)

compound = float(s["compound"]) # in [-1, 1]

compound_scores.append(compound)

if compound >= 0.05:

labels.append("positive")

elif compound <= -0.05:

labels.append("negative")

else:

labels.append("neutral")

out["sentiment_score"] = compound_scores

out["sentiment_label"] = labels

return out

# Example usage

# scored_df = score_news_vader(clean_df, text_col="doc_text")

At the end of this section, you have article-level sentiment per row.

Next, we'll aggregate these scores into a company-level sentiment signal you can track day-by-day or over rolling windows.

Aggregating Sentiment at Company Level

At this stage, scored_df contains one sentiment score per article. Now you convert it into a company-level signal by aggregating sentiment by symbol and date. This step makes the output usable for screening, modeling, and monitoring.

Aggregation windows should be chosen based on the downstream use case. Short windows may suit event-driven trading, while longer windows are more appropriate for monitoring trends or supporting research workflows. Using daily sentiment signals without aligning them to the intended application can lead to overreaction or misinterpretation.

1) Prepare a clean “news date” column

import pandas as pd

def add_news_date(df: pd.DataFrame, published_col: str = "publishedDate") -> pd.DataFrame:

out = df.copy()

out[published_col] = pd.to_datetime(out[published_col], errors="coerce", utc=True)

out = out.dropna(subset=[published_col]).reset_index(drop=True)

out["news_date"] = out[published_col].dt.date # daily bucket

return out

scored_df = add_news_date(scored_df)

2) Daily aggregation per symbol (mean score + article counts)

def daily_company_sentiment(df: pd.DataFrame) -> pd.DataFrame:

"""

Returns one row per (symbol, news_date) with:

- avg sentiment

- volume (#articles)

- positive/negative/neutral counts

"""

g = df.groupby(["symbol", "news_date"], as_index=False)

daily = g.agg(

avg_sentiment=("sentiment_score", "mean"),

med_sentiment=("sentiment_score", "median"),

news_count=("sentiment_score", "size"),

pos_count=("sentiment_label", lambda s: (s == "positive").sum()),

neg_count=("sentiment_label", lambda s: (s == "negative").sum()),

neu_count=("sentiment_label", lambda s: (s == "neutral").sum()),

)

# Optional: a simple “balance” metric that penalizes negative coverage

daily["sentiment_balance"] = (daily["pos_count"] - daily["neg_count"]) / daily["news_count"].clip(lower=1)

return daily.sort_values(["symbol", "news_date"]).reset_index(drop=True)

daily_df = daily_company_sentiment(scored_df)

daily_df.head(10)

3) Rolling sentiment (smooth the signal over N days)

Rolling features help you avoid overreacting to one viral headline. Rolling windows introduce lag by design. While they reduce noise, they can delay signal response, which is an important tradeoff to consider when using sentiment in faster or event-driven strategies.

def add_rolling_sentiment(daily_df: pd.DataFrame, window: int = 7) -> pd.DataFrame:

out = daily_df.copy()

out["news_date"] = pd.to_datetime(out["news_date"])

out = out.sort_values(["symbol", "news_date"])

out[f"roll_{window}d_sentiment"] = (

out.groupby("symbol")["avg_sentiment"]

.transform(lambda s: s.rolling(window=window, min_periods=1).mean())

)

out[f"roll_{window}d_news_count"] = (

out.groupby("symbol")["news_count"]

.transform(lambda s: s.rolling(window=window, min_periods=1).sum())

)

return out

daily_df = add_rolling_sentiment(daily_df, window=7)

daily_df[["symbol", "news_date", "avg_sentiment", "roll_7d_sentiment", "news_count", "roll_7d_news_count"]].head(10)

Now you have a compact company-level dataset: daily sentiment + volume + rolling sentiment.

Limitations and Improvement Areas

Even with a clean pipeline, financial news sentiment comes with edge cases. Knowing these limits helps you design safer and more reliable systems.

1) Longer context and truncation

Transformer models like FinBERT process a limited number of tokens. Truncation can drop important context from long reports.

Improvement: chunk long articles and average sentiment across chunks.

def chunk_text(text: str, max_chars: int = 450):

return [text[i:i + max_chars] for i in range(0, len(text), max_chars)]

def score_with_chunks(clf, text: str):

chunks = chunk_text(text)

scores = []

for c in chunks:

_, s, _ = finbert_score_one(clf, c)

scores.append(s)

return sum(scores) / len(scores) if scores else 0.0

2) Duplicate and syndicated news

The same story often appears across multiple publishers. Counting all of them inflates sentiment strength.

Improvement: de-duplicate using headline similarity.

from difflib import SequenceMatcher

def is_duplicate(title_a, title_b, threshold=0.9):

return SequenceMatcher(None, title_a.lower(), title_b.lower()).ratio() >= threshold

3) Model drift over time

Language changes. Market jargon evolves.

Improvement: periodically re-evaluate sentiment distributions and retrain or fine-tune when drift appears.

scored_df["sentiment_bucket"] = pd.cut(

scored_df["sentiment_score"],

bins=[-1, -0.3, 0.3, 1],

labels=["negative", "neutral", "positive"]

)

These refinements keep sentiment signals stable, interpretable, and production-ready.

Conclusion

In this article, we built an end-to-end NLP-powered sentiment analyzer using the FMP News API as the data foundation. Starting from structured financial news, we designed a clean pipeline that preprocesses text, applies a finance-aware sentiment model, and aggregates signals at the company level.

This approach avoids fragile heuristics and manual interpretation. It converts unstructured news into consistent, machine-readable sentiment features that fit naturally into quantitative workflows. Because the pipeline stays modular, you can extend it with better models, event tagging, or tighter aggregation logic as your use case evolves.

With high-quality news data from Financial Modeling Prep and a disciplined NLP pipeline, sentiment analysis becomes a practical signal—not a black box.

Financial data for every need

Access real-time quotes and over 30 years of financial data — including historical prices, fundamentals, insider transactions and more via API.

WACC vs ROIC: Evaluating Capital Efficiency and Value Creation

Introduction In corporate finance, assessing how effectively a company utilizes its capital is crucial. Two key metri...

BofA Sees AI Capex Boom in 2025, Backs Nvidia and Broadcom

Bank of America analysts reiterated a bullish outlook on data center and artificial intelligence capital expenditures fo...

Pinduoduo Inc. (PDD) Q1 2025 Earnings Report Analysis

Pinduoduo Inc., listed on the NASDAQ as PDD, is a prominent e-commerce platform in China, also operating internationally...

FMP