Skip to content

pipeline/is-contract/0.1

Pre-release
Pre-release
Compare
Choose a tag to compare
@afparsons afparsons released this 25 Apr 16:10
· 10 commits to master since this release

Scikit-Learn Pipeline

Name Class State
transformerpreprocessor TransformerPreprocessor head_character_n=2000, normalizer=<lexnlp.ml.normalizers.Normalizer object>
transformervectorizer TransformerVectorizer vectorizers=(<lexnlp.ml.vectorizers.VectorizerDoc2Vec object>, <lexnlp.ml.vectorizers.VectorizerKeywordSearch object>)
minmaxscaler MinMaxScaler feature_range=(-1.0, 1.0)
gaussiannb GaussianNB

Training data

Dataset Description Hyperlink
corpus/uspto-sample/0.1 A sample of Patent grant backgrounds from the United States Patent and Trademark Office https://github.com/EleutherAI/pile-uspto
corpus/govinfo-fr-2021/0.1 United States Federal Register, 2021 https://www.govinfo.gov/bulkdata/FR/2021
corpus/contract-types/0.1 A sample of labeled contract types obtained from SEC EDGAR https://www.sec.gov/edgar.shtml
corpus/bonds/0.1 A sample of municipal bonds ?
corpus/caselaw-access-project-ark-ill-nc-nm-subset-144million-characters/0.1 Caselaw Access Project; official, book-published state case law from from Arkansas, Illinois, North Carolina, New Mexico https://case.law/download/bulk_exports/latest/by_jurisdiction/case_text_open/
corpus/atticus-cuad-v1-plaintext/0.1 Atticus CUAD v1 contracts https://www.atticusprojectai.org/cuad
corpus/eurlex-sample-10000/0.1 EUR-Lex documents downloaded via api.epdb.eu https://eur-lex.europa.eu/ http://api.epdb.eu/
corpus/arxiv-abstracts-with-agreement/0.1 ArXiv abstracts containing "agreement" https://www.kaggle.com/datasets/Cornell-University/arxiv
corpus/sec-edgar-forms-3-4-5-8k-10k-sample/0.1 assorted SEC EDGAR filings https://www.sec.gov/edgar.shtml

Metrics

              precision    recall  f1-score   support

       False       1.00      1.00      1.00     20652
        True       0.85      0.91      0.88       580

    accuracy                           0.99     21232
   macro avg       0.93      0.95      0.94     21232
weighted avg       0.99      0.99      0.99     21232

Confusion matrix: true (vertical) vs. predicted (horizontal)

0 1
0 20562 90
1 50 530

Usage

with open('pipeline_is_contract_classifier.cloudpickle', 'rb') as f:
    pipeline_is_contract_classifier: Pipeline = cloudpickle.load(f)

probability_predictor_is_contract: ProbabilityPredictorIsContract = \
    ProbabilityPredictorIsContract(pipeline=pipeline_is_contract_classifier)

probability_predictor_is_contract.is_contract(
    text='...',
    min_probability=0.5,
    return_probability=True,
)