parqflow

A new fast-to-read and highly compressed wind flow format for sharing wind speedups.

Format description

Parqflow files hold flow timeseries data in Apache's .parquet format The data itself is stored internally as columns which allows for great compression. However, users manipulate the data as a table of several columns. Furthermore, metadata allows for appending extra dimensions without changing the format of data (think of several hub heights, several directions, etc...) Each column of data also belongs to a regular rectangular grid which can be reconstructed from the additional (different) metadata that describes a parqflow set of files (as shown at Adding eastings and northings. Attached to these data one finds the following fields (showing sample values):

version: '0.0.1'
stamp: datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S+0000')
engine: 'StarCCM+'
levels: ['case', 'instrument', 'sector', 'unit', 'variable']
dtypes: ['str', 'str', 'float', 'str', 'str']
epsg: 25048
dx: 10
dy: 10
nx: 873
ny: 643
min_x: 1507990
min_y: 6911090
max_x: 1513410
max_y: 6950810

These allow one to reconstruct a gridded representation of the data by following row-major order.

Each column of a parquet file is named as a tuple of (case, instrument, sector, unit, variable)

case: any of Neutral, Stable, Combined, NeutralBlended, StableBlended, CombinedBlended
- where Blended cases represent weighted averages of all masts (so any mast name in these .parquet files are meaningless)
instrument: MeasurementInstrumentName_HubHeight
sector: the center of direction sector that this column represents. All-dir for the overall, all-directional value
unit: the physical unit that the column of data represents
variable: speed, turbulence_intensity, ...

Imports

from pathlib import Path

import pandas as pd
import numpy as np

# from the same folder as setup.py parqflow can be installed with pip install -e .
import parqflow as pf

Reading with filters

Given a path to a parquet file it returns the contents of b'cfd' as a dictionary
# this example only works with an earlier version of parqflow, see 'levels' above for supported levels

folder = Path(r'C:\cfd_file_format_proposal_sample_files')

all_possible_filters = {
    'project': ['Project'],
    'inlet': [
        4.0, 11.0, 17.0, 21.0, 26.0, 30.0, 34.0, 39.0, 43.0,
        47.0, 51.0, 56.0, 60.0, 64.0, 69.0, 73.0, 78.0, 84.0,
        90.0,  96.0, 102.0, 110.0, 120.0, 130.0, 143.0, 158.0,
        180.0, 210.0, 240.0, 270.0, 300.0, 330.0, 349.0, 356.0
    ],
    'hub_height': [110, 115, 120, 125],
    'unit': ['m_per_s', 'degrees', 'percent', 'm'],
    'variable': ['horizontal_wind_speed', 'horizontal_wind_direction', 'wind_turbulence_intensity', 'upflow', 'elevation_at_hub_height'],
}

filters = {
    # 'inlet': [0],
    'hub_height': [110, 115],
    'variable': ['horizontal_wind_speed'],
}

pf.filter_dataset(folder, filters)

Reading metadata (requires the pf library)

Given a folder and a filters dictionary it returns every matching column as a pandas.DataFrame

df = pf.filter_dataset(folder, filters)
df

path = folder / 'SampleProject_000_isoheightSurface_110m.parquet'
metadata = pf.read_metadata(path)
min_x, min_y = metadata['min_x'], metadata['min_y']
max_x, max_y = metadata['max_x'], metadata['max_y']

Filtering points at turbine positions

Given a list of points of interest (POIs) (turbine coordinates) it returns, for each POI, the closest grid row For a small grid of 100,000 grid points and finding 10 million turbines randomly scattered this search takes around 4 seconds

pois = pd.DataFrame({
    'x': np.random.uniform(min_x, max_x, 10_000_000), // 10 million turbines!
    'y': np.random.uniform(min_y, max_y, 10_000_000),
})
pois

pf.filter_grid_points(pois, df, metadata)

Adding eastings and northings

metadata = pf.read_metadata(path)
min_x, min_y = metadata['min_x'], metadata['min_y']
max_x, max_y = metadata['max_x'], metadata['max_y']
dx, dy = metadata['dx'], metadata['dy']
xx = np.arange(min_x, max_x + dx, dx)
yy = np.arange(min_y, max_y + dy, dy)
tuples = [(x,y) for y in yy for x in xx]
location_df = pd.DataFrame(tuples, columns=['x', 'y'])
location_aware_df = pd.concat([df, location_df], axis=1).set_index(['x', 'y'])
location_aware_df.loc[1508000.0, 6912100.0]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
parqflow		parqflow
.gitignore		.gitignore
README.md		README.md
client.py		client.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parqflow

Format description

Imports

Reading with filters

Reading metadata (requires the pf library)

Filtering points at turbine positions

Adding eastings and northings

About

Releases

Packages

Languages

dnv-opensource/parqflow

Folders and files

Latest commit

History

Repository files navigation

parqflow

Format description

Imports

Reading with filters

Reading metadata (requires the pf library)

Filtering points at turbine positions

Adding eastings and northings

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages