Skip to content

AI System using SEC Electronic Data Gathering, Analysis, and Retrieval (EDGAR) data

Notifications You must be signed in to change notification settings

emorynlp/edgar-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

EDGAR Database Schema

This document describes a database design for storing information from the U.S. Securities and Exchange Commission’s Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, specifically focusing on Form 10‑Q (quarterly report) filings. It includes:

  1. Company information (one record per reporting entity).
  2. Historical changes to any company-level attribute.
  3. Filing information for each Form 10‑Q, including metadata and the complete HTML of the filing.

1. Entity Relationship Diagram

  1. Table company holds the most recent values for each reporting entity. It has one row per reporting entity identified by its Central Index Key (CIK).
  2. Table company_attribute_history preserves every change to any company attribute. It has one row per change event for any attribute of a company.
  3. Table filing stores metadata plus the raw text of each Form 10‑Q. It has one row per Form 10‑Q submission. This can also store information like income, liabilities, patents, lawsuits, etc.

2. Table Definitions with Detailed Comments

Below are Structured Query Language (SQL) definitions for each table. Inline comments explain the purpose of each column in full detail.

2.1 Table: company

-- Table: company
-- Purpose: Stores the current snapshot of metadata for each reporting entity.
CREATE TABLE company (
  cik CHAR(10) PRIMARY KEY,  
      -- Central Index Key: a unique identifier the SEC assigns to each reporting entity.
  
  registrant_name TEXT NOT NULL,  
      -- The full legal name of the entity as most recently filed.

  trading_symbol VARCHAR(10),  
      -- The current stock ticker symbol under which the entity’s equity trades (if applicable).

  exchange VARCHAR(10), 
      -- The Exchange under which the company is listed (e.g. Nasdaq).

  sic_code CHAR(4),  
      -- Standard Industry Classification code (four digits) indicating the primary business sector.

  entity_type TEXT, 
      -- Type of entity (e.g., "Operating Company", "Trust").

  owner_organization TEXT,
      -- If the filer belongs to a broader reporting group, the identifier for that parent entity.
  
  employer_identification_number VARCHAR(20),
      -- U.S. Internal Revenue Service Employer Identification Number (EIN) of the entity.

  legal_entity_identifier VARCHAR(20),
      -- Global Legal Entity Identifier (LEI).

  state_of_incorporation VARCHAR(20),
      -- Jurisdiction (U.S. state or foreign country code)

  fiscal_year_end CHAR(4),  
      -- The end of the entity’s fiscal year in "MMDD" format (e.g. "0331" for March 31).

  website TEXT, 
      -- URL of the entity’s primary corporate website.

  investor_website TEXT, 
      -- URL of the investor relations website, if different from the main site.

  addresses JSONB,
      -- JSON array of address objects, each containing lines, city, state, postal code, and country. Format: {'mailing': {'street1':'100 MAIN ST.','street2':'SUITE 1','city':'NEW YORK',...},'business':{...}}.

  phone VARCHAR(20),
      -- Primary telephone number for investor relations or corporate contact.

);
Column Name Data Type Detailed Explanation
cik CHAR(10) Central Index Key. Permanent, unique identifier for each entity that submits filings to the SEC.
registrant_name TEXT Official legal name of the entity as reported in its most recent filings.
trading_symbol VARCHAR(10) Stock exchange ticker symbol; can be NULL if the entity is not publicly traded.
sic_code CHAR(4) Four-digit code describing the entity’s industry classification under the SEC taxonomy.
entity_type TEXT Denotes entity form (e.g., corporation, partnership, trust).
owner_organization TEXT Identifier or name of a parent or group if multiple filers are consolidated under a single reporting umbrella.
employer_identification_number VARCHAR(20) IRS Employer Identification Number for U.S. entities, useful for cross-reference with tax data.
legal_entity_identifier VARCHAR(20) ISO standard LEI, if the entity has one.
state_of_incorporation VARCHAR(20) Code for the state (e.g., "DE" for Delaware) or country where the entity is incorporated.
fiscal_year_end CHAR(4) Month and day when the entity’s fiscal year ends, formatted as two-digit month and two-digit day.
website TEXT URL for corporate homepage, which may provide access to investor relations, financial data, and press releases.
investor_website TEXT URL specifically for investor relations content or financial reports.
addresses JSONB Collection of mailing and business addresses, preserving structure (street lines, city, state, postal code, country).
phone VARCHAR(20) Contact telephone number for general inquiries or investor relations.

2.2 Table: company_attribute_history

-- Table: company_attribute_history
-- Purpose: Preserves a history of changes for each attribute of every reporting entity.
CREATE TABLE company_attribute_history (
  cik CHAR(10) NOT NULL REFERENCES company(cik),
      -- Central Index Key of the entity whose attribute changed. Foreign key referencing company.cik.
  attribute VARCHAR(50) NOT NULL,
      -- The name of the attribute that changed (for example, "registrant_name" or "trading_symbol").
  value TEXT NOT NULL,
      -- The previous value of the attribute, exactly as it appeared before the change.
  valid_from DATE NOT NULL,
      -- The calendar date on which this old attribute value began being in effect.
  valid_to DATE NOT NULL,
      -- The calendar date on which this attribute value ceased being in effect.
  PRIMARY KEY (cik, attribute, valid_from)
      -- Ensures only one record per entity-attribute per start date.
);

-- Index to speed up searches by attribute name and value.
CREATE INDEX ON company_attribute_history (attribute, value);
Column Name Data Type Detailed Explanation
cik CHAR(10) Links to the cik column in company, indicating which entity experienced the change.
attribute VARCHAR(50) Label of what changed. It matches the column name in company (for clarity in queries).
value TEXT The attribute’s previous setting before the change took effect.
valid_from DATE Start date when this old value was active, inclusive.
valid_to DATE End date when this old value was replaced.

2.3 Table: filing

-- Table: filing
-- Purpose: Stores metadata and the full HTML for each Form 10‑Q submission.
CREATE TABLE filing (
  filing_id BIGSERIAL PRIMARY KEY,
      -- Auto-incrementing unique identifier for each filing record in our database.
  accession_number VARCHAR(20) UNIQUE NOT NULL,
      -- SEC-issued identifier for this particular filing, guaranteed globally unique.
  cik CHAR(10) NOT NULL REFERENCES company(cik),
      -- Central Index Key linking the filing to its reporting entity.
  form_type VARCHAR(10),
      -- Type of SEC form (e.g. "10-Q" - this is for future-proofing, if we include other form types).
  filing_date DATE,
      -- The date on which the SEC officially received and accepted this filing.
  report_date DATE,
      -- The end date of the financial period covered by this filing (for example, the quarter-end date).
  primary_doc_url TEXT,
      -- A direct URL pointing to the primary document (text) on the SEC website.
  filer_category VARCHAR(50),
      -- The SEC filing category in effect on the filing date (e.g. "Accelerated Filer").
  raw_html TEXT,
      -- The complete text of the Form 10‑Q, including all narrative sections, for full-text search or post-processing.
  total_assets INT,
      -- The total assets ($) at the time of filing
  total_liabilities INT,
      -- The total liabilities ($) at the time of filing
  total_shareholders_equity INT,
      -- The total shareholder equity ($) at the time of filing
);

-- Indexes to speed up common queries by reporting entity and date, or by form type.
CREATE INDEX ON filing(cik, filing_date);
CREATE INDEX ON filing(form_type, filing_date);
Column Name Data Type Detailed Explanation
filing_id BIGSERIAL Internally generated numeric ID for fast joins and key references.
accession_number VARCHAR(20) Official SEC identifier for the filing, used to fetch documents from EDGAR.
cik CHAR(10) Foreign key linking back to company.cik, indicating which entity filed this document.
form_type VARCHAR(10) Specifies exactly which form was filed (quarterly report or amendment).
filing_date DATE Date that the SEC headquarters received and stamped the filing as official.
report_date DATE The financial period’s end date covered by the filing (quarterly report endpoint).
primary_doc_url TEXT Web link to the main document content.
filer_category VARCHAR(50) Company size classification on filing date, which determines regulatory requirements and deadlines.
raw_html TEXT Entire narrative and table content of the filing, for use in searches or postprocessing.
total_assets INT Assets at the time of filing.
total_liabilities INT Liabilities at the time of filing.
total_shareholders_equity INT Shareholder equity at the time of filing.

3. Workflow for Inserting Filings

  1. Extract: Read each saved filing’s metadata and full text.
  2. Load:
    • company: Insert or update using the Central Index Key (cik).
    • filing: Insert or update using the accession number.
  3. Compare: Compare new values for mutable columns in company (such as registrant_name or filer_category) against existing values.
  4. Track changes: Whenever a mutable value differs:
    • Insert a new company_attribute_history record for that attribute to set its valid_to date equal to that filing's date. valid_from will be the first date that company filed, or the previous valid_to date if this attribute-company pair already exists in company_attribute_history.

4. Sample Queries

  • Retrieve the current legal name for entity 0001000045:

    SELECT registrant_name
      FROM company
     WHERE cik = '0001000045';
  • Retrieve the legal name in effect on June 30, 2018, for entity 0001000045:

    WITH target AS (
      SELECT 
        c.registrant_name    AS current_name,
        h.value              AS historical_name
      FROM company c
      LEFT JOIN company_attribute_history h
        ON h.cik = c.cik
      AND h.attribute = 'registrant_name'
      AND :target_date BETWEEN h.valid_from AND h.valid_to
      WHERE c.cik = :cik
    )
    SELECT
      -- If a history row matched, use that; otherwise fall back to the current name.
      COALESCE(historical_name, current_name) AS name_on_that_date
    FROM target;
  • List all Form 10‑Q filings for entity 0001000045:

    SELECT accession_number, filing_date
      FROM filing
     WHERE cik = '0001000045'
     ORDER BY filing_date DESC;
  • Perform a full-text search for the word “patent” in all 10‑Q filings:

    SELECT accession_number
      FROM filing
     WHERE raw_html ILIKE '%patent%';

About

AI System using SEC Electronic Data Gathering, Analysis, and Retrieval (EDGAR) data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published