You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we have an Extractor class, whose job is to extract all metadata about a repository.
With #70 and #68, Extractor has the ability to list_files() present in a repository and access their contents.
Ideally, the responsibility of an extractor should stop there. It should not be responsible for extracting metadata from file contents.
The proposal here is to have a separate object responsible for it: Parser. A Parser would take a file as input and extract specific RDF triples from it. The Repo's RDF graph could then be enriched using the Parser graphs.
A rough example of what the parser interface may look like. Each parser would only need to implement the parsing algorithm and the logic to define compatible resources.
classParser(ABC):
def__init__(self, max_size_kb: Optional[int]=2048):
self.max_size=max_size@abstractmethoddef_parse(self, input: Resource) ->rdflib.Graph:
"""Extract triples"""
...
@abstractmethoddefcan_parse(self, input: Resource) ->bool:
"""Match based on filename (content and size?)"""
...
defparse(self, input: Resource) ->Optional[rdflib.Graph]:
ifself.can_parse(input):
returnself._parse(input)
returnNone# Potentially more helper methods that will be available to all parsers
Currently, we have an
Extractor
class, whose job is to extract all metadata about a repository.With #70 and #68,
Extractor
has the ability tolist_files()
present in a repository and access their contents.Ideally, the responsibility of an extractor should stop there. It should not be responsible for extracting metadata from file contents.
The proposal here is to have a separate object responsible for it:
Parser
. AParser
would take a file as input and extract specific RDF triples from it. The Repo's RDF graph could then be enriched using the Parser graphs.Parsers could be added for pyproject.toml, setup.py, licenses, Cargo.toml, R's DESCRIPTION, package.json, etc...
The text was updated successfully, but these errors were encountered: