Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite packed file parser with finite state machines #16

Merged
merged 1 commit into from
Jan 15, 2025
Merged

Conversation

nanxstats
Copy link
Collaborator

Closes #14

This PR rewrites the packed file parser using finite state machines. The current implementation uses a stateful approach. The new implementation makes the parser easier to read and maintain.

Using the following shell script, I did some real-world tests to compare the pkglite unpack output directories (one or multiple) from this implementation and the existing implementation.

compare.sh
#!/bin/bash

function compute_md5 {
    md5 "$1" | awk '{print $4}'
}

function compare_directories {
    local dir1="$1"
    local dir2="$2"
    local mismatch_count=0

    IFS=$'\n' dir1_files=($(find "$dir1" -type f -o -type d | sort))
    IFS=$'\n' dir2_files=($(find "$dir2" -type f -o -type d | sort))

    # Normalize paths
    dir1_base_len=${#dir1}
    dir2_base_len=${#dir2}
    for i in "${!dir1_files[@]}"; do
        dir1_files[$i]="${dir1_files[$i]:$dir1_base_len}"
    done
    for i in "${!dir2_files[@]}"; do
        dir2_files[$i]="${dir2_files[$i]:$dir2_base_len}"
    done

    # Check if directory structures match
    if [ "${#dir1_files[@]}" -ne "${#dir2_files[@]}" ]; then
        echo "Directory structures differ."
        echo "Extra files or directories:" >&2
        diff <(printf "%s\n" "${dir1_files[@]}") <(printf "%s\n" "${dir2_files[@]}") | head -10 >&2
        exit 1
    fi

    for i in "${!dir1_files[@]}"; do
        if [ "${dir1_files[i]}" != "${dir2_files[i]}" ]; then
            echo "Mismatch in directory structure: ${dir1_files[i]} vs ${dir2_files[i]}"
            mismatch_count=$((mismatch_count + 1))
            if [ $mismatch_count -ge 10 ]; then break; fi
        fi
    done

    # Check file contents match
    for i in "${!dir1_files[@]}"; do
        local path1="$dir1/${dir1_files[i]}"
        local path2="$dir2/${dir2_files[i]}"

        # Skip directories
        if [ -d "$path1" ]; then
            continue
        fi

        if ! cmp -s "$path1" "$path2"; then
            echo "File content mismatch: $path1 vs $path2"
            mismatch_count=$((mismatch_count + 1))
            if [ $mismatch_count -ge 10 ]; then break; fi
        fi
    done

    if [ $mismatch_count -eq 0 ]; then
        echo "Directories are identical."
    else
        echo "Directories differ. See details above."
    fi
}

# Check arguments
if [ "$#" -ne 2 ]; then
    echo "Usage: $0 <directory1> <directory2>"
    exit 1
fi

# Compare directories
compare_directories "$1" "$2"

They are identical.

@nanxstats nanxstats merged commit 6cc5bca into main Jan 15, 2025
6 checks passed
@nanxstats nanxstats deleted the parser branch January 15, 2025 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use state machine for file parser
1 participant