Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support extracting Libre office documents #67

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

VarshaUN
Copy link

In Scancode toolkit #4001 [https://github.com/aboutcode-org/scancode-toolkit/issues/4001] required allowing extraction of libre office docs.

Signed-off-by : Varsha U N varshaun58@gmail.com

Signed-off-by: Varsha U N <varshamaddur2006@gmail.com>
@stefan6419846
Copy link

print should not be used in production code. And having to call it explicitly does not seem right.

The goal is to use the following code and allow for corresponding extraction, where target_directory should hold the single files afterwards:

from extractcode import all_kinds
from extractcode.api import extract_archive
from extractcode.archive import should_extract


# Example file: https://github.com/mar10/wunderbaum/blob/main/test/gui_test.ods
archive_path = '/path/to/gui_test.ods'
if should_extract(location=archive_path, kinds=all_kinds):
    for _event in extract_archive(location=archive_path, target=target_directory):
        pass

@VarshaUN
Copy link
Author

@stefan6419846 Thanks for letting me know ! My bad , will this be right now?

from extractcode.api import extract_archive
from extractcode.archive import should_extract
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def extract_libre_office_document(location, target):
    """Extract Libre Office documents (e.g., .ods files) as ZIP archives."""
    if should_extract(location=location, kinds=all_kinds):
        for _event in extract_archive(location=location, target=target):
            logger.info(f"Extracting {_event}")
    else:
        raise ValueError(f"File at {location} is not recognized as a supported archive type.")


archive_path = /path/to/gui_test.ods
target_directory = /path/to/extracted_files

try:
    extract_libre_office_document(archive_path, target_directory)
except ValueError as e:
    logger.error(e) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants