Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transportation Year 2 Updates #35

Draft
wants to merge 1 commit into
base: dsgrid-project-IEF-Phase2-2025
Choose a base branch
from

Conversation

nreinicke
Copy link

I'm very new to this data format and so starting this as a draft to get feedback. I've attempted to update the schema to accommodate the new load data from the evi-grid-national-framework. The new raw data is hourly load profiles spanning once week for each month of the year. We've run these for four scenarios:

  • low_baseline
  • high_baseline
  • high_inefficient
  • breakthrough_baseline

The raw data is parquet format on kestrel at (/projects/evix/evbps/grid-team-deliverables/2025-01-24-IEF/aggregations/load-profiles/month-week-hour). Here's a sample of what the data looks like:

region month body_style charge_location plug_name hour energy_kwh scenario state year
str u8 enum enum cat i32 f64 str str i64
"08083" 1 "ldv-light-truck" "enroute" "DC150" 0 0.0 "breakthrough_baseline" "CO" 2025
"08083" 8 "ldv-light-truck" "public" "L2" 0 14.131195 "breakthrough_baseline" "CO" 2025
"08055" 9 "ldv-light-truck" "public" "L2" 0 0.0 "breakthrough_baseline" "CO" 2025
"08057" 11 "ldv-car" "home" "L2" 0 0.0 "breakthrough_baseline" "CO" 2025

Some questions:

  • How much post-processing should we do on our side? For example, it looks like the current transportation config has the load like this electricity_ev_ldv_work_l2. Should we run our data through a script that does the mapping from our own format into that format or should that be something we modify in the config so dsgrid can handle that?
  • In our current pipeline, we aggregate BEV and PHEV light duty vehicles into a larger category like ldv-car. Do we need to retain the distinction between BEV and PHEV for this analysis?

@nreinicke nreinicke marked this pull request as draft January 31, 2025 22:14
@nreinicke
Copy link
Author

Tagging @ahcyip @elainethale @daniel-thom for any feedback. Sorry if I've totally butchered the existing schema.

@ahcyip
Copy link
Contributor

ahcyip commented Feb 5, 2025

  • How much post-processing should we do on our side? For example, it looks like the current transportation config has the load like this electricity_ev_ldv_work_l2. Should we run our data through a script that does the mapping from our own format into that format or should that be something we modify in the config so dsgrid can handle that?

I think the idea is that the data on our side should be as raw and detailed as possible and the config should be modified to handle the raw output.

  • In our current pipeline, we aggregate BEV and PHEV light duty vehicles into a larger category like ldv-car. Do we need to retain the distinction between BEV and PHEV for this analysis?

@bborlaug should make the call on whether our data should retain BEV separate from PHEV. On one hand, the data will be very different (magnitude and timing and locations of load per BEV vs. load per PHEV are very different etc.), but on the other, if our pipeline is already aggregating, I don't know if we will be checking the results with distinct BEV and PHEVs or doing any analysis that uses the distinction, so we may not need to go "backwards" and keep the distinction for dsgrid.

@ahcyip
Copy link
Contributor

ahcyip commented Feb 5, 2025

Regarding date format, you may have to change from 0-167 to
day_of_week 0-6 (zero-based, starting on Monday. Mon -> 0, Tue -> 1) X hour 0-23 (zero-based, starting at midnight) instead.
https://dsgrid.github.io/dsgrid/reference/dataset_formats.html

P.S. @daniel-thom helped me with the dsgrid software last time, but @nreinicke is a software pro, so Nick could probably handle everything discussed in https://dsgrid.github.io/dsgrid/tutorials/create_and_submit_dataset.html (if it is up to date). Also, @nreinicke sorry I forgot to pass this dsgrid documentation to you earlier - this may have covered a lot of what we chatted about.

@daniel-thom
Copy link
Contributor

Regarding date format, you may have to change from 0-167 to day_of_week 0-6 (zero-based, starting on Monday. Mon -> 0, Tue -> 1) X hour 0-23 (zero-based, starting at midnight) instead. https://dsgrid.github.io/dsgrid/reference/dataset_formats.html

P.S. @daniel-thom helped me with the dsgrid software last time, but @nreinicke is a software pro, so Nick could probably handle everything discussed in https://dsgrid.github.io/dsgrid/tutorials/create_and_submit_dataset.html (if it is up to date). Also, @nreinicke sorry I forgot to pass this dsgrid documentation to you earlier - this may have covered a lot of what we chatted about.

The data tables are already in a very good format for dsgrid. Here are the minor changes that need to be made:

  • Converting to day_of_week as discussed by @ahcyip would be very helpful because we already have support for that. I'll emphasize that while we are flexible on this point, we would prefer consistency with prior formats so that it's less work.
  • dsgrid currently requires specific column names: (1) dimension types of scenario, sector, subsector, metric, model_year, weather_year, geography, (2) value column called value, (3) time can be whatever you want. We might need discussion here.

I'd be happy to help with the post-processing to convert to a dsgrid format. This would be a simple Spark query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants