pipeline/is-contract/0.1
Pre-release
Pre-release
afparsons
released this
25 Apr 16:10
·
10 commits
to master
since this release
Scikit-Learn Pipeline
Name | Class | State |
---|---|---|
transformerpreprocessor | TransformerPreprocessor | head_character_n=2000, normalizer=<lexnlp.ml.normalizers.Normalizer object> |
transformervectorizer | TransformerVectorizer | vectorizers=(<lexnlp.ml.vectorizers.VectorizerDoc2Vec object>, <lexnlp.ml.vectorizers.VectorizerKeywordSearch object>) |
minmaxscaler | MinMaxScaler | feature_range=(-1.0, 1.0) |
gaussiannb | GaussianNB |
Training data
Dataset | Description | Hyperlink |
---|---|---|
corpus/uspto-sample/0.1 |
A sample of Patent grant backgrounds from the United States Patent and Trademark Office | https://github.com/EleutherAI/pile-uspto |
corpus/govinfo-fr-2021/0.1 |
United States Federal Register, 2021 | https://www.govinfo.gov/bulkdata/FR/2021 |
corpus/contract-types/0.1 |
A sample of labeled contract types obtained from SEC EDGAR | https://www.sec.gov/edgar.shtml |
corpus/bonds/0.1 |
A sample of municipal bonds | ? |
corpus/caselaw-access-project-ark-ill-nc-nm-subset-144million-characters/0.1 |
Caselaw Access Project; official, book-published state case law from from Arkansas, Illinois, North Carolina, New Mexico | https://case.law/download/bulk_exports/latest/by_jurisdiction/case_text_open/ |
corpus/atticus-cuad-v1-plaintext/0.1 |
Atticus CUAD v1 contracts | https://www.atticusprojectai.org/cuad |
corpus/eurlex-sample-10000/0.1 |
EUR-Lex documents downloaded via api.epdb.eu | https://eur-lex.europa.eu/ http://api.epdb.eu/ |
corpus/arxiv-abstracts-with-agreement/0.1 |
ArXiv abstracts containing "agreement" | https://www.kaggle.com/datasets/Cornell-University/arxiv |
corpus/sec-edgar-forms-3-4-5-8k-10k-sample/0.1 |
assorted SEC EDGAR filings | https://www.sec.gov/edgar.shtml |
Metrics
precision recall f1-score support
False 1.00 1.00 1.00 20652
True 0.85 0.91 0.88 580
accuracy 0.99 21232
macro avg 0.93 0.95 0.94 21232
weighted avg 0.99 0.99 0.99 21232
Confusion matrix: true (vertical) vs. predicted (horizontal)
0 | 1 | |
---|---|---|
0 | 20562 | 90 |
1 | 50 | 530 |
Usage
with open('pipeline_is_contract_classifier.cloudpickle', 'rb') as f:
pipeline_is_contract_classifier: Pipeline = cloudpickle.load(f)
probability_predictor_is_contract: ProbabilityPredictorIsContract = \
ProbabilityPredictorIsContract(pipeline=pipeline_is_contract_classifier)
probability_predictor_is_contract.is_contract(
text='...',
min_probability=0.5,
return_probability=True,
)