Skip to content

Commit

Permalink
Merge pull request #3 from Lercas/2.0
Browse files Browse the repository at this point in the history
update readme and dockerfile
  • Loading branch information
Lercas authored Dec 20, 2024
2 parents 738f150 + 0d66bc4 commit b5cdcf2
Show file tree
Hide file tree
Showing 2 changed files with 117 additions and 56 deletions.
17 changes: 10 additions & 7 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
FROM golang:1.21.5-alpine AS builder
FROM golang:1.23-alpine AS builder

WORKDIR /app
COPY go.mod ./

RUN go mod download
COPY go.mod go.sum ./
RUN go mod download && go mod verify

COPY . .
RUN CGO_ENABLED=0 go build -o sneakpeeker .

RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o sneakpeeker .

FROM alpine:3.20

FROM gcr.io/distroless/static-debian11
WORKDIR /root/
COPY --from=builder /app/sneakpeeker .
COPY --from=builder /app/sneakpeeker /usr/local/bin/sneakpeeker

ENTRYPOINT ["./sneakpeeker"]
ENTRYPOINT ["sneakpeeker"]
156 changes: 107 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,105 +1,163 @@
# SneakPeeker

SneakPeeker is a tool designed to detect suspicious URLs in various file formats. It processes ZIP, DOCX, XLSX, PPTX, and PDF files to uncover hidden links that might indicate unauthorized access or data leaks.

## Features

- Scans directories and files for suspicious URLs
- Supports ZIP, DOCX, XLSX, PPTX, and PDF file formats
- Outputs found URLs to the console
- Optionally removes canary tokens from files
- Generates a JSON report file

## Installation

1. Clone the repository:
SneakPeeker is a versatile tool designed to detect and optionally remove suspicious URLs (such as canary tokens) from various file formats. In addition to its original capabilities, it now supports more formats, provides detailed reports, can run in dry-run mode, and leverages parallel processing for faster scanning.

# Key Features
- Multi-format Scanning: Supports scanning of `.pdf`, `.zip`, `.docx`, `.xlsx`, `.pptx`, `.txt` and `.html` files for suspicious URLs.
- Detailed JSON Reports:
- Includes file path, file size, processing time, whether the file is suspicious, and the URLs found.
- Provides a summary with total files scanned, number of suspicious files, and number of normal files.
- URL Removal and Backups:
- Optionally remove suspicious URLs from files.
- Creates a backup of the original file before modifications for safety.
- Dry-Run Mode:
- Analyze files without making any changes, useful for previewing results before final action.
- Parallel Processing:
- Scan multiple files concurrently for faster results.
- Configure the number of worker goroutines to improve performance on large directories.
- Verbose Logging:
- Verbose (`-v`) mode for more detailed output, useful for debugging or tracking the scanning process.
- Custom Ignored Domains:
- Specify domains to ignore during URL scanning via the `--ignore` option.

Installation
1. Clone the repository:
```bash
git clone https://github.com/Lercas/SneakPeeker.git
cd SneakPeeker
```

2. Build the tool:
2. Build the tool:
```bash
go build -o sneakpeeker main.go
```

Or you can install:
Or install directly:
```bash
go install github.com/Lercas/SneakPeeker@v1.0.1
```
go install github.com/Lercas/SneakPeeker@v2.0.0
```

## Usage
# Usage

```bash
./sneakpeeker [-f] [-r report_file] FILE_OR_DIRECTORY_PATH
./sneakpeeker [options] FILE_OR_DIRECTORY_PATH
```

### Parameters
## Options

`-f`: (Optional) Remove canary tokens from files.
`-r report_file`: (Optional) Specify the name of the JSON report file. Default is report.json.
`FILE_OR_DIRECTORY_PATH`: Path to the file or directory you want to scan.
- `-f` – Remove suspicious URLs (e.g., canary tokens) from files.
- `-r report_file` – Specify the name of the JSON report file. Default is report.json.
- `-v` – Enable verbose (debug-level) logging.
- `--dry-run` – Perform the scan without modifying any files.
- `--ignore domains` – Comma-separated list of domains to ignore during scanning.
- `-w workers` – Number of worker goroutines to use for parallel file scanning. Default is 5.

## Examples

Scan a directory and output results to the console:

Scan a directory and output results:
```bash
./sneakpeeker /path/to/directory

```

Scan a file and output results to the console:
Scan a file and output results:
```bash

```

Scan a PDF and remove canary tokens:
```bash
./sneakpeeker /path/to/file.docx

```

Scan a file and remove canary tokens:
Scan a file in dry-run mode (no modifications):
```bash

```

Scan with a custom JSON report name:
```bash
./sneakpeeker -f /path/to/file.pdf

```

Scan a file and generate a JSON report:
Scan with a custom JSON report name:
```bash

```

Use multiple workers for faster scanning:
```bash
./sneakpeeker -r myreport.json /path/to/file.pdf

```

How It Works
# How It Works

- PDF Files: Scans for URL patterns in decompressed PDF streams.
- ZIP, DOCX, XLSX, PPTX Files: Decompresses the files and scans for URL patterns in the extracted contents.
- PDF Files: Extracts and decompresses PDF streams to search for URL patterns.
- ZIP/DOCX/XLSX/PPTX Files: Decompresses and inspects the content files inside for URLs.
- TXT/HTML Files: Directly scans the text content for URLs.

Example Output
If suspicious URLs are found and the `-f` option is enabled (and not in `--dry-run mode`), they are removed from the file after creating a backup (.bak).

```bash
## Example Output

```plaintext
[INFO] The file /path/to/file.docx is suspicious. URLs found:
http://suspicious-example.local
https://another-suspicious-example.local
http://test-example.local
https://another-test-example.local
[DEBUG] The file /path/to/normalfile.txt seems normal.
```

[INFO] The file /path/to/anotherfile.pdf seems normal.
### Generated JSON report example:

```json
{
"reports": [
{
"file_path": "/path/to/file.docx",
"suspicious": true,
"found_urls": ["http://test-example.local", "https://another-test-example.local"],
"file_size": 2048,
"processed_at": "2024-12-20T15:04:05Z"
},
{
"file_path": "/path/to/normalfile.txt",
"suspicious": false,
"found_urls": [],
"file_size": 512,
"processed_at": "2024-12-20T15:04:10Z"
}
],
"summary": {
"total_files": 2,
"suspicious_files": 1,
"normal_files": 1
}
}
```

## Docker Usage
# Docker Usage

First, build the Docker image:
Build the Docker image:
```bash
docker build -t sneakpeeker:latest .
```

Scan a directory mounted into /data:
```bash
docker build -t canarycatcher:1.0 .
docker run --rm -v /path/to/scan:/data sneakpeeker:latest /data
```

Then, run the container with volume mounting to access your files. For example, if you want to scan the /path/to/scan directory on your host, you can run:
Use -f to remove suspicious URLs:
```bash
docker run --rm -v /path/to/scan:/data canarycatcher:1.0 /data
docker run --rm -v /path/to/scan:/data sneakpeeker:latest -f /data
```

If you want to use additional flags -f to remove canary tokens, you can do so as follows:
Generate a JSON report:
```bash
docker run --rm -v /path/to/scan:/data canarycatcher:1.0 -f /data
docker run --rm -v /path/to/scan:/data sneakpeeker:latest -r /data/report.json /data
```

Example command to run with a JSON report:
Add verbose logging and dry-run mode:
```bash
docker run --rm -v /path/to/scan:/data canarycatcher:1.0 -r /data/report.json /data
docker run --rm -v /path/to/scan:/data sneakpeeker:latest -v --dry-run /data
```

0 comments on commit b5cdcf2

Please sign in to comment.