Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental and incomplete Pandas dataframe and Apache Arrow support #483

Merged
merged 6 commits into from
Feb 6, 2025

Conversation

soininen
Copy link
Collaborator

@soininen soininen commented Feb 6, 2025

This PR introduces two new modules, arrow_value and dataframes. The most interesting function in arrow_value is from_database() which works just like the function with the same name in parameter_value but returns pyarrow.RecordBatch objects instead of Maps or other IndexedValues. Currently it supports only limited set of arrays, maps and time series. There is no to_database() equivalent yet. The dataframes module, on the other hand, contains functions that return dataframes from database mapping's parameter value items or queries. There is also a function to add/update a value from dataframe. As with arrow_value, the functionality is limited and not all value types are supported.

Implements #353

Checklist before merging

  • Documentation (also in Toolbox repo) is up-to-date
  • Release notes have been updated
  • Unit tests have been added/updated accordingly
  • Code has been formatted by black & isort
  • Unit tests pass

Very much work-in-progress. The new functions work but will probably
see a lot of changes in future commits.

Includes a very experimental Arrow interface and a function to
directly query paramter values and return them as Pandas dataframes.

Re #353
Added functions to get/add/update mapped items directly.
These functions work on a "higher" level than fetch_as_dataframe()
which works with database queries and is read-only.
Entity class name is also now included in the dataframe so
we can identify the entity uniquely which is not possible
with byname only.

Re #353
Entity classes, parameter names and alternatives are now categories,
entity element names strings.

Re #353
@soininen soininen merged commit fb3b427 into master Feb 6, 2025
15 checks passed
@soininen soininen deleted the 353_pandas_dataframes branch February 6, 2025 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant