Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pkl migration #65

Merged
merged 53 commits into from
Dec 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
ad041e5
pkl config init
Vypolor Nov 25, 2024
cb3b2a3
change metrics and logger packages
Vypolor Nov 25, 2024
bda8287
transform pkl implementation
Vypolor Nov 25, 2024
0a3bce8
unit tests + configuration changes
Vypolor Nov 26, 2024
7f96bc9
implement config components
Vypolor Nov 27, 2024
4aea9dc
fix Components module
Vypolor Nov 27, 2024
4b6c63a
fix Components module
Vypolor Nov 27, 2024
45e0ff1
pipelaner pkl usage example
Vypolor Nov 27, 2024
c3afc77
refactoring
alexeyxo Dec 2, 2024
a04b568
add logger implementation
alexeyxo Dec 3, 2024
2148364
fix cfg
alexeyxo Dec 3, 2024
a4f7e7a
pipelaner@0.0.5
alexeyxo Dec 3, 2024
5116ee9
refactorinf sources
alexeyxo Dec 4, 2024
776d1df
fix tests
alexeyxo Dec 4, 2024
8cc384f
update pkl configs
alexeyxo Dec 5, 2024
4fb067f
update kafka config
alexeyxo Dec 5, 2024
d4d69d8
refactoring
alexeyxo Dec 5, 2024
1ca24b0
refactoring
alexeyxo Dec 5, 2024
88a88ef
fix linter
alexeyxo Dec 5, 2024
f2cce96
pkl 0.0.6
alexeyxo Dec 6, 2024
c41e6c2
update threads parameter
alexeyxo Dec 6, 2024
3e65047
update
alexeyxo Dec 6, 2024
61cad96
fix race condtitions
alexeyxo Dec 6, 2024
fbfddf2
gracefull shutdown
alexeyxo Dec 6, 2024
e503c31
Merge remote-tracking branch 'origin/main' into pkl-migration
alexeyxo Dec 6, 2024
4d5dfff
update linter
alexeyxo Dec 6, 2024
d53a139
update linter config
alexeyxo Dec 6, 2024
e0e17f7
fix tests
alexeyxo Dec 6, 2024
02b2d82
Merge remote-tracking branch 'origin/main' into pkl-migration
alexeyxo Dec 6, 2024
f329c08
update pkl version
alexeyxo Dec 6, 2024
9193a4f
bump version
alexeyxo Dec 6, 2024
797cf0e
bump version
alexeyxo Dec 6, 2024
2e29822
refactoring
alexeyxo Dec 9, 2024
c8e2321
add images
alexeyxo Dec 9, 2024
593f4ba
bump pkl version
alexeyxo Dec 10, 2024
e1a05ba
add documentations
alexeyxo Dec 12, 2024
77f9422
fix lint
alexeyxo Dec 12, 2024
d221a63
add new examples
alexeyxo Dec 13, 2024
42be9ad
ref documantation
alexeyxo Dec 13, 2024
5a996bd
add documentation for generators
alexeyxo Dec 13, 2024
34eaf64
fix generators docs
alexeyxo Dec 13, 2024
d619068
fix generators docs
alexeyxo Dec 13, 2024
3f5430f
add transforms docs
alexeyxo Dec 13, 2024
6db6eba
fix transforms docs
alexeyxo Dec 13, 2024
47f9a0a
add sinks documentations
alexeyxo Dec 13, 2024
d6d9d0c
Added pkl release workflow (#66)
KingTur Dec 13, 2024
00e6db3
test Releaase
alexeyxo Dec 13, 2024
4040150
Merge branch 'pkl-migration' of github.com:pipelane/pipelaner into pk…
alexeyxo Dec 13, 2024
a894344
fix pathes and bump piplaner version
alexeyxo Dec 13, 2024
a570df7
fix examples
alexeyxo Dec 13, 2024
bfff64b
update sinks docs
alexeyxo Dec 18, 2024
aef8faf
update copyrights
alexeyxo Dec 18, 2024
de9b51b
update docs and examples
alexeyxo Dec 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ jobs:
- name: golangci-lint
uses: golangci/golangci-lint-action@v6
with:
version: v1.60
args: --timeout=5m
version: v1.62.2
args: --timeout=5m --config=.golangci.yml
skip-cache: false
# skip-build-cache: true

Expand Down
50 changes: 50 additions & 0 deletions .github/workflows/pkl.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Pkl Automation and Release

on:
push:
tags:
- 'pipelaner@*'

env:
PKL_VERSION: 0.27.1
PKL_GO_VERSION: v0.8.1

jobs:
pkl-create-release:
runs-on: ubuntu-latest
permissions:
contents: write

steps:
- name: Check out the repository
uses: actions/checkout@v4

- uses: actions/setup-go@v5
with:
go-version: stable
cache: false

- name: Install Pkl and Go tools
run: |
mkdir -p $HOME/.local/bin
curl -L -o $HOME/.local/bin/pkl https://github.com/apple/pkl/releases/download/${PKL_VERSION}/pkl-linux-amd64
curl -L -o $HOME/.local/bin/pkl-gen-go https://github.com/apple/pkl-go/releases/download/${PKL_GO_VERSION}/pkl-gen-go-linux-amd64.bin
chmod +x $HOME/.local/bin/pkl*
echo "$HOME/.local/bin" >> $GITHUB_PATH

- name: Update version in pkl config
run: |
TAG=${{ github.ref_name }}
VERSION=${TAG#*@}
sed -i "s/version = \".*\"/version = \"${VERSION}\"/" pkl/PklProject
shell: bash

- name: Generate Go code and Package Pkl project
run: |
pkl-gen-go pkl/Pipelaner.pkl
pkl project package pkl

- name: Create GitHub Release
uses: softprops/action-gh-release@v1
with:
files: .out/${{ github.ref_name }}/*
4 changes: 2 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v4
with:
go-version: '1.21.x'
go-version: '1.23.x'
- name: Install dependencies
run: |
go mod tidy
- name: Testing
run: |
go test -v ./
go test -count=1 -v ./... -race

1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@
go.work
.DS_Store
.idea
bin
22 changes: 17 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

.PHONY: install-linter
install-linter:
@go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
Expand All @@ -8,12 +9,23 @@ lint:

.PHONY: test
test:
@go test -count=1 -v ./

@go test -count=1 -v ./... -race
.PHONY: proto
proto:
@rm -rf source/shared/proto
@mkdir source/shared/proto
@rm -rf sources/shared/proto
@mkdir sources/shared/proto
@docker run -v $(PWD):/defs namely/protoc-all:1.51_2 -i proto -d proto -o go -l go && \
mv go/github.com/pipelane/pipelaner/source/shared/proto source/shared && \
mv go/github.com/pipelane/pipelaner/sources/shared/proto sources/shared && \
rm -rf go

.PHONY: install-pkl-go
install-pkl-go:
go install github.com/apple/pkl-go/cmd/pkl-gen-go@v0.8.1

.PHONY: pkl-generate-go
pkl-generate-go:
pkl-gen-go pkl/Pipelaner.pkl

.PHONY: pkl-build
pkl-project:
pkl project package pkl
173 changes: 170 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,170 @@
[![Go Lint](https://github.com/pipelane/pipelaner/actions/workflows/lint.yml/badge.svg)](https://github.com/pipelane/pipelaner/actions/workflows/lint.yml)
[![Go Test](https://github.com/pipelane/pipelaner/actions/workflows/test.yml/badge.svg)](https://github.com/pipelane/pipelaner/actions/workflows/test.yml)
# pipelaner

# **Pipelaner**

**Pipelaner** is a high-performance and efficient **Framework and Agent** for creating data pipelines. The core of pipeline descriptions is based on the **_Configuration As Code_** concept and the [**Pkl**](https://github.com/apple/pkl) configuration language by **Apple**.

Pipelaner manages data streams through three key entities: **Generator**, **Transform** and **Sink**.

---

## 📖 **Contents**
- [Core Entities](#core-entities)
- [Generator](#generator)
- [Transform](#transform)
- [Sink](#sink)
- [Basic Parameters](#basic-parameters)
- [Built-in Pipeline Elements](#built-in-pipeline-elements)
- [Generators](#generators)
- [Transforms](#transforms)
- [Sinks](#sinks)
- [Scalability](#scalability)
- [Single-Node Deployment](#single-node-deployment)
- [Multi-Node Deployment](#multi-node-deployment)
- [Examples](#examples)
- [Support](#support)
- [License](#license)

---

## 📌 **Core Entities**

### **Generator**
The component responsible for creating or retrieving source data for the pipeline. Generators can produce messages, events, or retrieve data from various sources such as files, databases, or APIs.

- **Example use case:**
Reading data from a file or receiving events via webhooks.

---

### **Transform**
The component that processes data within the pipeline. Transforms perform operations such as filtering, aggregation, data transformation, or cleaning to prepare it for further processing.

- **Example use case:**
Filtering records based on specific conditions or converting data format from JSON to CSV.

---

### **Sink**
The final destination for the data stream. Sinks send processed data to a target system, such as a database, API, or message queue.

- **Example use case:**
Saving data to PostgreSQL or sending it to a Kafka topic.

---

### **Basic Parameters**
| **Parameter** | **Type** | **Description** |
|-----------------------|---------|---------------------------------------------------------------------------------------------------|
| `name` | String | Unique name of the pipeline element. |
| `threads` | Int | Number of threads for processing messages. Defaults to the value of `GOMAXPROC`. |
| `outputBufferSize` | Int | Size of the output buffer. **Not applicable to Sink components.** |

---

## 📦 **Built-in Pipeline Elements**

### **Generators**
| **Name** | **Description** |
|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| [**cmd**](https://github.com/pipelane/pipelaner/tree/main/sources/generator/cmd) | Reads the output of a command, e.g., `"/usr/bin/log" "stream --style ndjson"`. |
| [**kafka**](https://github.com/pipelane/pipelaner/tree/main/sources/generator/kafka) | Apache Kafka consumer that streams `Value` into the pipeline. |
| [**pipelaner**](https://github.com/pipelane/pipelaner/tree/main/sources/generator/pipelaner) | GRPC server that streams values via [gRPC](https://github.com/pipelane/pipelaner/tree/main/proto/service.proto). |

---

### **Transforms**
| **Name** | **Description** |
|--------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| [**batch**](https://github.com/pipelane/pipelaner/tree/main/sources/transform/batch) | Forms batches of data with a specified size. |
| [**chunks**](https://github.com/pipelane/pipelaner/tree/main/sources/transform/chunks) | Splits incoming data into chunks. |
| [**debounce**](https://github.com/pipelane/pipelaner/tree/main/sources/transform/debounce) | Eliminates "bounce" (frequent repeats) in data. |
| [**filter**](https://github.com/pipelane/pipelaner/tree/main/sources/transform/filter) | Filters data based on specified conditions. |
| [**remap**](https://github.com/pipelane/pipelaner/tree/main/sources/transform/remap) | Reassigns fields or transforms the data structure. |
| [**throttling**](https://github.com/pipelane/pipelaner/tree/main/sources/transform/throttling) | Limits data processing rate. |

---

### **Sinks**
| **Name** | **Description** |
|--------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| [**clickhouse**](https://github.com/pipelane/pipelaner/tree/main/sources/sink/clickhouse) | Sends data to a ClickHouse database. |
| [**console**](https://github.com/pipelane/pipelaner/tree/main/sources/sink/console) | Outputs data to the console. |
| [**http**](https://github.com/pipelane/pipelaner/tree/main/sources/sink/http) | Sends data to a specified HTTP endpoint. |
| [**kafka**](https://github.com/pipelane/pipelaner/tree/main/sources/sink/kafka) | Publishes data to Apache Kafka. |
| [**pipelaner**](https://github.com/pipelane/pipelaner/tree/main/sources/sink/pipelaner) | Streams data via [gRPC](https://github.com/pipelane/pipelaner/tree/main/proto/service.proto) to other Pipelaner nodes. |

---

## 🌐 **Scalability**

### **Single-Node Deployment**
For operation on a single host:
![Single Node](https://github.com/pipelane/pipelaner/blob/c8e232106e9acf8a1d8682d225e369f282f6523a/images/pipelaner-singlehost.png/?raw=true "Single Node Deployment")

---

### **Multi-Node Deployment**
For distributed data processing across multiple hosts:
![Multi-Node](https://github.com/pipelane/pipelaner/blob/c8e232106e9acf8a1d8682d225e369f282f6523a/images/pipelaner-multihost.png/?raw=true "Multi-Node Deployment")

For distributed interaction between nodes, you can use:
1. **gRPC** — via generators and sinks with the parameter `sourceName: "pipelaner"`.
2. **Apache Kafka** — for reading/writing data via topics.

Example configuration using Kafka:
```pkl
new Inputs.Kafka {
...
common {
...
topics {
"kafka-topic"
}
}
}

new Sinks.Kafka {
...
common {
...
topics {
"kafka-topic"
}
}
}
```

---
## 🚀 **Examples**

| **Examples** | **Description** |
|-------------------------------------------------------------------------------|---------------------------------------------------------|
| [**Basic Pipeline**](https://github.com/pipelane/pipelaner/tree/main/examples/basic) | A simple example illustrating the creation of a basic pipeline with prebuilt components. |
| [**Custom Components**](https://github.com/pipelane/pipelaner/tree/main/examples/custom) | An advanced example showing how to create and integrate custom Generators, Transforms, and Sinks. |
---

### **Overview**

1. **🌟 Basic Pipeline**
Learn the fundamentals of creating a pipeline with minimal configuration using ready-to-use components.

2. **🛠 Custom Components**
Extend **Pipelaner**’s functionality by developing your own Generators, Transforms, and Sinks.
---

Each example includes clear configuration files and explanations to help you get started quickly.

💡 **Tip**: Use these examples as templates to customize and build your own pipelines efficiently.


## 🤝 **Support**

If you have questions, suggestions, or encounter any issues, please [create an Issue](https://github.com/pipelane/pipelaner/issues/new) in the repository.
You can also participate in discussions in the [Discussions](https://github.com/pipelane/pipelaner/discussions) section.

---

## 📜 **License**

This project is licensed under the [Apache 2.0](https://github.com/pipelane/pipelaner/blob/main/LICENSE) license.
You are free to use, modify, and distribute the code under the terms of the license.
Loading
Loading