Make nodes reusable across pipelines #532

renepeinl · 2022-10-24T15:43:42Z

renepeinl
Oct 24, 2022

I'm not sure if I have missed something since the documentation is very limited up to now. I would consider it one of the most important features of a tool like dataplane to not only create nodes once (e.g. a python code block) but being able to reuse this node in other pipelines as well.
To do that, we would need something like a central library to look for existing nodes so that you could drag&drop them into your pipeline in the same way as you do it for processors right now, but instead of being empty or with the default "print("Node-ID")" statements it would contain the code that was already written in another pipeline.
Ideally, you would also have some git integration for that, because if you write a significant amount of code, you surely want it to be versioned. Presumably, you could simulate that by running a bash script to clone a git repository and execute it, but than you loose the advantage of the built-in editor.

saul-data · 2022-10-24T15:52:51Z

saul-data
Oct 24, 2022
Maintainer

Hi there @renepeinl these are good points. Slowly but surely, I keep adding to the documentation so any feedback on that front is welcome. We are looking into import/export as well as git integration. Reusable nodes is an interesting concept that hasn't been raised before, I can definitely look into that. I know it is a bit painful to copy and paste the same code all the time.

We use python libraries to centralise all our code and to have reusable components. Happy to write up some documents on how to do that if that works for you? You can import your own private python packages. I am developing some Dataplane python libraries with recipes to help reduce the amount of code one needs to keep writing. I haven't updated our documentation to reflect this as yet - I am still testing the package but you can find it over here:

Install python packages
https://learn.dataplane.app/installing-python-packages

Dataplane python package
https://pypi.org/project/dataplane/

0 replies

fyodorr · 2022-12-28T09:45:09Z

fyodorr
Dec 28, 2022

+1 for reusable nodes. As my use case is a lot of modules that need to be shared between pipelines.

Also looking for some easy way to share data between python modules. Looked at the documentation, but couldn't find anything there. Something like this maybe:

Python node1: dataplane.export.variable1 = value;

Python node2: $variable1

And in GUI:

0 replies

saul-data · 2022-12-28T19:32:46Z

saul-data
Dec 28, 2022
Maintainer

@fyodorr thanks for the suggestion.

The first question about reusable nodes, I am busy looking into a solution and I agree will be useful for teams.

The second question about transferring variables between nodes is available, I need to still update the documentation. It is how you transfer data between pipeline steps in general.

If the data is large use the S3 method and for small and fast moving data use the Redis method as follows:

Install dataplane python package in the code editor: https://pypi.org/project/dataplane/

Sending node

from dataplane import pipeline_redis_store

redisConnect = redis.Redis(host=redis-service, port=6379, db=0)

data=“myvalue”
    
 # Store the data with key hello
rs = pipeline_redis_store(StoreKey="hello", Value=data, Redis=redisConnect, Expire=True, ExpireDuration=timedelta(minutes=15))

Receiving node

from dataplane import pipeline_redis_get

redisConnect = redis.Redis(host=redis-service, port=6379, db=0)
 rsget = pipeline_redis_get(StoreKey="hello", Redis=redisConnect)

print(rsget["value"])

0 replies

chris-braidwell · 2024-05-15T22:03:42Z

chris-braidwell
May 15, 2024

Hi Saul. You say above "You can import your own private Python packages". How should we do that? This page show how to access libs from PyPi. My private libraries are stored in AWS Code Artifact -- takes the place of PyPi -- proxies all access to PyPi.

When I install libs from my local machine using pip or poetry, I need to first run aws codeartifact login -- that adjusts the contents of my ~/.config/pip/pip.conf file. How would we achieve that in the cloud product, and in the local docker version?

1 reply

saul-data May 21, 2024
Maintainer

@chris-braidwell my suggestion is to take the Dataplane worker docker image as a base, install your private pip libraries and then push a private docker image. There is a Dockerfile for workers in the GitHub repo that can give you ideas to do it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make nodes reusable across pipelines #532

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Make nodes reusable across pipelines #532

renepeinl Oct 24, 2022

Replies: 4 comments · 1 reply

saul-data Oct 24, 2022 Maintainer

fyodorr Dec 28, 2022

saul-data Dec 28, 2022 Maintainer

chris-braidwell May 15, 2024

saul-data May 21, 2024 Maintainer

renepeinl
Oct 24, 2022

Replies: 4 comments 1 reply

saul-data
Oct 24, 2022
Maintainer

fyodorr
Dec 28, 2022

saul-data
Dec 28, 2022
Maintainer

chris-braidwell
May 15, 2024

saul-data May 21, 2024
Maintainer