Parse SEC 424B Prospectus Filings with Python

EdgarTools adds a Python parser for SEC 424B prospectus filings — extract IPO pricing, underwriting syndicates, offering types, deal terms, shelf lifecycle data, and selling stockholder tables from SEC EDGAR into structured Python objects and DataFrames.

Parse SEC 424B Prospectus Filings with Python

When a company raises capital — whether it's an IPO, a follow-on equity offering, a debt issuance, or a structured note — the details live in a 424B prospectus filed with the SEC. These filings are some of the most information-dense documents on EDGAR: offering prices, underwriting syndicates with per-bank allocations, dilution tables, selling stockholder lists, and the shelf registration history that ties it all together.

They're also some of the hardest to work with programmatically. 424B filings are unstructured HTML — there's no XBRL. The same information appears in different places depending on the offering type: a firm commitment IPO looks nothing like an at-the-market program or a Bank of America structured note. Until now, if you wanted structured data from a prospectus in Python, you had to write your own parser for each variant.

EdgarTools 5.23.0 adds Prospectus424B, a parser that handles all SEC 424B form variants (424B1 through 424B8) and extracts structured data from them. Call filing.obj() on any 424B filing and get back a typed Python object with pricing, underwriting, offering terms, and more.

from edgar import Company

company = Company("INGM")
filing = company.get_filings(form="424B4")[0]
prospectus = filing.obj()

prospectus.offering_type                    # OfferingType.FIRM_COMMITMENT
prospectus.cover_page.company_name          # 'Ingram Micro Holding Corp'
prospectus.cover_page.offering_price        # '$22.25'
prospectus.underwriting.lead_manager        # 'Morgan Stanley & Co. LLC'
len(prospectus.underwriting.underwriters)   # 16

Classifying 424B Offering Types with Python

The first challenge is figuring out what kind of offering a 424B filing represents. The SEC form type tells you almost nothing — 424B5 is used for everything from a $50 billion Goldman Sachs structured note to a $3 million biotech PIPE. The offering type determines what data you should expect to find and where to look for it.

The classifier reads the first 3,000 characters of the prospectus cover page and matches against a cascade of signal patterns:

  • Structured notes — "pricing supplement", index-linked references, barrier/buffer terms
  • ATM offerings — "at-the-market", equity distribution agreements, sales agent language
  • Exchange offers — "offer to exchange", tendering language
  • Rights offerings — subscription rights, subscription price
  • PIPE resale — "resale by the selling stockholders", no-proceeds language
  • Debt offerings — fixed-rate notes with maturity dates, aggregate principal amounts
  • Best efforts — placement agent references, securities purchase agreements, pre-funded warrants
  • Firm commitment — public offering price tables, underwriting discounts, named bulge-bracket banks

Each signal gets a confidence score. The cascade runs from most distinctive (structured notes have unambiguous vocabulary) to least (firm commitment is the default for large underwritten offerings). The result is a classification with highmedium, or low confidence.

prospectus.offering_type          # OfferingType.ATM
prospectus.offering_type_detail   # {'type': 'atm', 'confidence': 'high',
                                  #  'signals': ['at_the_market', ...]}

Here's what a parsed prospectus looks like in the terminal — the Ingram Micro IPO with cover page fields, pricing breakdown, and the full 16-bank underwriting syndicate extracted automatically:

Prospectus424B display — Ingram Micro IPO
Prospectus424B display — Ingram Micro IPO

Extracting Structured Data from SEC Prospectus Tables

A typical 424B filing contains 30–80 HTML tables. Most are layout containers, bullet lists, or tables of contents. A few contain the data that matters: offering prices, dilution figures, selling stockholder lists, capitalization breakdowns, and underwriting syndicate allocations.

The parser classifies every table in the document into one of 11 semantic types using structural heuristics — column counts, keyword combinations, numeric cell presence, and cell length distributions. Layout tables (single-cell containers, bullet lists, page numbers) are filtered out first. Then each remaining table is tested against type-specific predicates:

Table classification types
Table classification types

Once classified, the extraction functions pull structured data from each table type:

# Pricing data from the pricing table
prospectus.pricing.columns[0].offering_price   # '$2.48'
prospectus.pricing.columns[0].fee_or_discount  # '$0.1488'
prospectus.pricing.columns[-1].proceeds        # '$3,496,800.00'

# Selling stockholders
prospectus.selling_stockholders.count           # 12
entry = prospectus.selling_stockholders.stockholders[0]
entry.name              # 'Armistice Capital Master Fund Ltd.'
entry.shares_before     # 1500000
entry.shares            # 1500000
entry.pct_after         # 0.0

# As a DataFrame
df = prospectus.selling_stockholders.to_dataframe()

PIPE Resale Prospectuses and Selling Stockholder Data

One of the most common reasons researchers and analysts parse 424B filings is to study PIPE transactions — private investments in public equity. When a company raises capital through a PIPE, it sells unregistered securities to institutional investors. Those securities can't be resold on the open market until the company files a resale prospectus — typically a 424B3 — registering them with the SEC.

The selling stockholder table in that resale prospectus is the definitive record of who participated in the PIPE: fund names, share counts before and after the offering, percentage ownership, and warrants. This data is valuable for studying institutional investment patterns, hedge fund positioning, and biotech financing — but it's locked inside unstructured HTML.

The parser extracts selling stockholder tables and returns them as structured Python objects with numeric properties:

from edgar import Company

company = Company("PBYI")  # Puma Biotechnology
filing = company.get_filings(form="424B3")[0]
prospectus = filing.obj()

# Iterate over selling stockholders
for holder in prospectus.selling_stockholders.stockholders:
    print(f"{holder.name:40s}  {holder.shares_before:>12,}  {holder.shares:>12,}")

# Export to Excel or Stata for analysis
df = prospectus.selling_stockholders.to_dataframe()
df.to_excel("pipe_stockholders.xlsx", index=False)
df.to_stata("pipe_stockholders.dta", write_index=False)

For bulk analysis across hundreds or thousands of PIPE filings, you can filter by company, industry, or date range and concatenate the results into a single dataset:

import pandas as pd
from edgar import Company

biotechs = ["PBYI", "MRNA", "SGEN", "RARE", "ALNY"]

all_holders = []
for ticker in biotechs:
    company = Company(ticker)
    for filing in company.get_filings(form="424B3"):
        try:
            p = filing.obj()
            if p.selling_stockholders and p.selling_stockholders.count > 0:
                df = p.selling_stockholders.to_dataframe()
                df["company"] = p.company
                df["ticker"] = ticker
                df["filing_date"] = str(filing.filing_date)
                all_holders.append(df)
        except Exception:
            continue

result = pd.concat(all_holders, ignore_index=True)
result.to_excel("biotech_pipe_holders.xlsx", index=False)

Not every 424B3 contains a selling stockholder table — the form is used for many purposes including M&A stock-for-stock exchanges, base prospectus updates, and debt offerings. The parser's offering type classifier identifies PIPE resales specifically, so you can filter to just the filings that have the data you need.

Normalized Deal Terms: Price, Proceeds, and Underwriting Economics

The raw extracted data is spread across cover page fields, pricing tables, offering terms, and underwriting sections. Different filings store the same economic fact in different places — the offering price might appear on the cover page, in the pricing table, or only be computable from total proceeds divided by shares.

The Deal object synthesizes a single normalized view by triangulating across all available data sources:

deal = prospectus.deal

deal.price            # 2.48
deal.shares           # 1500000
deal.gross_proceeds   # 3720000.0
deal.net_proceeds     # 3496800.0
deal.fee_per_share    # 0.1488
deal.discount_rate    # 0.06 (6%)
deal.lead_bookrunner  # 'H.C. Wainwright & Co.'
Deal summary display
Deal summary display

The resolution logic follows a priority chain. For price: cover page first (most reliable for clean per-share values), then pricing table per-unit column. For gross proceeds: cover page total amount, then pricing table aggregate, then shares × price as a last resort. Each property documents its source chain so you can trace where a value came from.

Deal also exposes dilution metrics when available:

deal.dilution_per_share  # 1.73
deal.ntbv_before         # 0.27
deal.ntbv_after          # 0.75
deal.shares_before       # 5000000
deal.shares_after        # 6500000

Tracking Shelf Registrations and Takedown History

Most 424B filings are "takedowns" from a shelf registration. A company files an S-3 (the shelf), the SEC declares it effective, and then the company can issue securities under that shelf for up to three years by filing 424B prospectus supplements.

The ShelfLifecycle object reconstructs this history from the filing's related filings — giving you the full capital-raising timeline in one call:

lifecycle = prospectus.lifecycle

lifecycle.shelf_registration    # Filing(form='S-3', date='2023-08-02')
lifecycle.effective_date        # '2023-08-14'
lifecycle.review_period_days    # 12
lifecycle.shelf_expires         # date(2026, 8, 2)
lifecycle.days_to_expiry        # 144

lifecycle.takedown_number       # 5
lifecycle.total_takedowns       # 5
lifecycle.is_latest_takedown    # True
lifecycle.avg_days_between_takedowns  # 228.0

The lifecycle computation uses the SGML file_number field to find related filings without loading each one individually, which makes it fast enough to run on every prospectus without noticeable delay.

The Rich display shows a complete timeline — the shelf registration, SEC effectiveness, and every takedown with the current filing highlighted:

Shelf Lifecycle display
Shelf Lifecycle display

Parsing Structured Note Pricing Supplements (424B2)

Structured notes are a special case. Banks like Goldman Sachs, J.P. Morgan, and Bank of America file hundreds of 424B2 prospectuses per year — each one a pricing supplement for a single note linked to an index, stock, or basket. These filings have a distinctive structure: a key terms table with CUSIP, maturity date, underlying reference, and payoff parameters.

The parser detects structured notes from cover page signals ("pricing supplement", "market-linked", index references) and extracts the key terms into Python objects:

prospectus.structured_note_terms.issuer       # 'Bank of America Corporation'
prospectus.structured_note_terms.cusip        # '06055HBV7'
prospectus.structured_note_terms.maturity_date  # 'January 20, 2028'
prospectus.structured_note_terms.underlying     # 'Russell 2000 Index'
prospectus.structured_note_terms.buffer_amount  # '10%'

LLM-Ready Context for AI-Powered SEC Analysis

Both Prospectus424B and ShelfLifecycle have to_context() methods that produce condensed text summaries sized for LLM context windows. If you're building AI workflows that analyze offerings — screening deals, summarizing terms, or flagging anomalies — you can feed the prospectus context directly into a prompt without serializing the full object:

context = prospectus.to_context(detail='standard')
# Returns compact Markdown-KV text with cover page, pricing, and deal terms

lifecycle_context = prospectus.lifecycle.to_context(detail='full')
# Returns timeline with available actions

Getting Started with 424B Prospectus Parsing

If you work with equity offerings, debt issuances, or structured products on SEC EDGAR, you no longer need to parse HTML yourself. filing.obj() on any 424B filing returns a Prospectus424B with structured cover page data, offering type classification, pricing, underwriting terms, and a deal summary. The shelf lifecycle tells you where this offering sits in the company's broader capital-raising history.

from edgar import get_filings

# Screen recent at-the-market offerings
for filing in get_filings(form="424B5").head(20):
    p = filing.obj()
    if p.is_atm:
        print(f"{p.company:30s}  {p.cover_page.offering_amount}")

# Extract underwriting data for analysis
filing = get_filings(form="424B4")[0]
p = filing.obj()
for uw in p.underwriting.underwriters:
    print(f"{uw.name:35s}  {uw.shares_allocated}")

The parser handles the full range of 424B variants — firm commitment IPOs, at-the-market programs, best efforts PIPEs, rights offerings, exchange offers, debt issuances, and structured notes — behind a single Python API.


Install or Upgrade

pip install --upgrade edgartools

Full release notes: CHANGELOG

EdgarTools is an open-source Python library for working with SEC EDGAR data. Used by financial analysts, quant researchers, and compliance teams to extract structured data from SEC filings.  Documentation · PyPI · GitHub

Follow as I build

I am excited to see what the community will build with the new Prospectus framework. I plan to use it to add features to edgar.tools:

Subscribe to EdgarTools

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe