This document describes a database design for storing information from the U.S. Securities and Exchange Commission’s Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, specifically focusing on Form 10‑Q (quarterly report) filings. It includes:
- Company information (one record per reporting entity).
- Historical changes to any company-level attribute.
- Filing information for each Form 10‑Q, including metadata and the complete HTML of the filing.
- Table company holds the most recent values for each reporting entity. It has one row per reporting entity identified by its Central Index Key (CIK).
- Table company_attribute_history preserves every change to any company attribute. It has one row per change event for any attribute of a company.
- Table filing stores metadata plus the raw text of each Form 10‑Q. It has one row per Form 10‑Q submission. This can also store information like income, liabilities, patents, lawsuits, etc.
Below are Structured Query Language (SQL) definitions for each table. Inline comments explain the purpose of each column in full detail.
-- Table: company
-- Purpose: Stores the current snapshot of metadata for each reporting entity.
CREATE TABLE company (
cik CHAR(10) PRIMARY KEY,
-- Central Index Key: a unique identifier the SEC assigns to each reporting entity.
registrant_name TEXT NOT NULL,
-- The full legal name of the entity as most recently filed.
trading_symbol VARCHAR(10),
-- The current stock ticker symbol under which the entity’s equity trades (if applicable).
exchange VARCHAR(10),
-- The Exchange under which the company is listed (e.g. Nasdaq).
sic_code CHAR(4),
-- Standard Industry Classification code (four digits) indicating the primary business sector.
entity_type TEXT,
-- Type of entity (e.g., "Operating Company", "Trust").
owner_organization TEXT,
-- If the filer belongs to a broader reporting group, the identifier for that parent entity.
employer_identification_number VARCHAR(20),
-- U.S. Internal Revenue Service Employer Identification Number (EIN) of the entity.
legal_entity_identifier VARCHAR(20),
-- Global Legal Entity Identifier (LEI).
state_of_incorporation VARCHAR(20),
-- Jurisdiction (U.S. state or foreign country code)
fiscal_year_end CHAR(4),
-- The end of the entity’s fiscal year in "MMDD" format (e.g. "0331" for March 31).
website TEXT,
-- URL of the entity’s primary corporate website.
investor_website TEXT,
-- URL of the investor relations website, if different from the main site.
addresses JSONB,
-- JSON array of address objects, each containing lines, city, state, postal code, and country. Format: {'mailing': {'street1':'100 MAIN ST.','street2':'SUITE 1','city':'NEW YORK',...},'business':{...}}.
phone VARCHAR(20),
-- Primary telephone number for investor relations or corporate contact.
);
Column Name | Data Type | Detailed Explanation |
---|---|---|
cik |
CHAR(10) |
Central Index Key. Permanent, unique identifier for each entity that submits filings to the SEC. |
registrant_name |
TEXT |
Official legal name of the entity as reported in its most recent filings. |
trading_symbol |
VARCHAR(10) |
Stock exchange ticker symbol; can be NULL if the entity is not publicly traded. |
sic_code |
CHAR(4) |
Four-digit code describing the entity’s industry classification under the SEC taxonomy. |
entity_type |
TEXT |
Denotes entity form (e.g., corporation, partnership, trust). |
owner_organization |
TEXT |
Identifier or name of a parent or group if multiple filers are consolidated under a single reporting umbrella. |
employer_identification_number |
VARCHAR(20) |
IRS Employer Identification Number for U.S. entities, useful for cross-reference with tax data. |
legal_entity_identifier |
VARCHAR(20) |
ISO standard LEI, if the entity has one. |
state_of_incorporation |
VARCHAR(20) |
Code for the state (e.g., "DE" for Delaware) or country where the entity is incorporated. |
fiscal_year_end |
CHAR(4) |
Month and day when the entity’s fiscal year ends, formatted as two-digit month and two-digit day. |
website |
TEXT |
URL for corporate homepage, which may provide access to investor relations, financial data, and press releases. |
investor_website |
TEXT |
URL specifically for investor relations content or financial reports. |
addresses |
JSONB |
Collection of mailing and business addresses, preserving structure (street lines, city, state, postal code, country). |
phone |
VARCHAR(20) |
Contact telephone number for general inquiries or investor relations. |
-- Table: company_attribute_history
-- Purpose: Preserves a history of changes for each attribute of every reporting entity.
CREATE TABLE company_attribute_history (
cik CHAR(10) NOT NULL REFERENCES company(cik),
-- Central Index Key of the entity whose attribute changed. Foreign key referencing company.cik.
attribute VARCHAR(50) NOT NULL,
-- The name of the attribute that changed (for example, "registrant_name" or "trading_symbol").
value TEXT NOT NULL,
-- The previous value of the attribute, exactly as it appeared before the change.
valid_from DATE NOT NULL,
-- The calendar date on which this old attribute value began being in effect.
valid_to DATE NOT NULL,
-- The calendar date on which this attribute value ceased being in effect.
PRIMARY KEY (cik, attribute, valid_from)
-- Ensures only one record per entity-attribute per start date.
);
-- Index to speed up searches by attribute name and value.
CREATE INDEX ON company_attribute_history (attribute, value);
Column Name | Data Type | Detailed Explanation |
---|---|---|
cik |
CHAR(10) |
Links to the cik column in company , indicating which entity experienced the change. |
attribute |
VARCHAR(50) |
Label of what changed. It matches the column name in company (for clarity in queries). |
value |
TEXT |
The attribute’s previous setting before the change took effect. |
valid_from |
DATE |
Start date when this old value was active, inclusive. |
valid_to |
DATE |
End date when this old value was replaced. |
-- Table: filing
-- Purpose: Stores metadata and the full HTML for each Form 10‑Q submission.
CREATE TABLE filing (
filing_id BIGSERIAL PRIMARY KEY,
-- Auto-incrementing unique identifier for each filing record in our database.
accession_number VARCHAR(20) UNIQUE NOT NULL,
-- SEC-issued identifier for this particular filing, guaranteed globally unique.
cik CHAR(10) NOT NULL REFERENCES company(cik),
-- Central Index Key linking the filing to its reporting entity.
form_type VARCHAR(10),
-- Type of SEC form (e.g. "10-Q" - this is for future-proofing, if we include other form types).
filing_date DATE,
-- The date on which the SEC officially received and accepted this filing.
report_date DATE,
-- The end date of the financial period covered by this filing (for example, the quarter-end date).
primary_doc_url TEXT,
-- A direct URL pointing to the primary document (text) on the SEC website.
filer_category VARCHAR(50),
-- The SEC filing category in effect on the filing date (e.g. "Accelerated Filer").
raw_html TEXT,
-- The complete text of the Form 10‑Q, including all narrative sections, for full-text search or post-processing.
total_assets INT,
-- The total assets ($) at the time of filing
total_liabilities INT,
-- The total liabilities ($) at the time of filing
total_shareholders_equity INT,
-- The total shareholder equity ($) at the time of filing
);
-- Indexes to speed up common queries by reporting entity and date, or by form type.
CREATE INDEX ON filing(cik, filing_date);
CREATE INDEX ON filing(form_type, filing_date);
Column Name | Data Type | Detailed Explanation |
---|---|---|
filing_id |
BIGSERIAL |
Internally generated numeric ID for fast joins and key references. |
accession_number |
VARCHAR(20) |
Official SEC identifier for the filing, used to fetch documents from EDGAR. |
cik |
CHAR(10) |
Foreign key linking back to company.cik , indicating which entity filed this document. |
form_type |
VARCHAR(10) |
Specifies exactly which form was filed (quarterly report or amendment). |
filing_date |
DATE |
Date that the SEC headquarters received and stamped the filing as official. |
report_date |
DATE |
The financial period’s end date covered by the filing (quarterly report endpoint). |
primary_doc_url |
TEXT |
Web link to the main document content. |
filer_category |
VARCHAR(50) |
Company size classification on filing date, which determines regulatory requirements and deadlines. |
raw_html |
TEXT |
Entire narrative and table content of the filing, for use in searches or postprocessing. |
total_assets |
INT |
Assets at the time of filing. |
total_liabilities |
INT |
Liabilities at the time of filing. |
total_shareholders_equity |
INT |
Shareholder equity at the time of filing. |
- Extract: Read each saved filing’s metadata and full text.
- Load:
- company: Insert or update using the Central Index Key (
cik
). - filing: Insert or update using the accession number.
- company: Insert or update using the Central Index Key (
- Compare: Compare new values for mutable columns in
company
(such asregistrant_name
orfiler_category
) against existing values. - Track changes: Whenever a mutable value differs:
- Insert a new
company_attribute_history
record for that attribute to set itsvalid_to
date equal to that filing's date.valid_from
will be the first date that company filed, or the previousvalid_to
date if this attribute-company pair already exists incompany_attribute_history
.
- Insert a new
-
Retrieve the current legal name for entity 0001000045:
SELECT registrant_name FROM company WHERE cik = '0001000045';
-
Retrieve the legal name in effect on June 30, 2018, for entity 0001000045:
WITH target AS ( SELECT c.registrant_name AS current_name, h.value AS historical_name FROM company c LEFT JOIN company_attribute_history h ON h.cik = c.cik AND h.attribute = 'registrant_name' AND :target_date BETWEEN h.valid_from AND h.valid_to WHERE c.cik = :cik ) SELECT -- If a history row matched, use that; otherwise fall back to the current name. COALESCE(historical_name, current_name) AS name_on_that_date FROM target;
-
List all Form 10‑Q filings for entity 0001000045:
SELECT accession_number, filing_date FROM filing WHERE cik = '0001000045' ORDER BY filing_date DESC;
-
Perform a full-text search for the word “patent” in all 10‑Q filings:
SELECT accession_number FROM filing WHERE raw_html ILIKE '%patent%';