Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync with open source how #118

Draft
wants to merge 5,682 commits into
base: li_trunk
Choose a base branch
from
Draft

sync with open source how #118

wants to merge 5,682 commits into from

Conversation

lesterhaynes
Copy link

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

claudevdm and others added 21 commits November 22, 2024 12:54
* Add input boxes for required user inputs

* Remove unnecessary comments.

---------

Co-authored-by: Claude <cvandermerwe@google.com>
Bumps org.sonarqube from 3.0 to 6.0.0.5145.

---
updated-dependencies:
- dependency-name: org.sonarqube
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
From #30507 (comment), try to use the default machine types for Flink with more memory.
* Enable Java SDK Distroless container image variant

* Add LANG environment and /usr/lib/locale

* Use examples tests instead
* Use --enable-component-gateway when creating the flink cluster

* Update flink_cluster.sh
* Add a new precommit to test finl container

* Changed trigger file for Flink container workflow

* updated the timeout

* only allow maual trigger to test

* fixed the PR check

* fixed the workflow checks
* More complete error message for StripErrorMetadata.

* Update sdks/python/apache_beam/yaml/yaml_mapping.py

Co-authored-by: Danny McCormick <dannymccormick@google.com>

* fix formatting, paren

---------

Co-authored-by: Danny McCormick <dannymccormick@google.com>
* Update website for 2.61.0 release

* Update CHANGES.md

* Update beam-2.61.0.md

* Update downloads.md

* Update CHANGES.md

* Update beam-2.61.0.md
* Fixed the new flink container precommit

* trigger it

* tried to trigger the workflow

* at least 2 workers

* trigger it
* Align SDK container version with pipeline submission env

* Disable ZetaSQL test on Java8
* [Accenture Baltics] Case Study

* changed the date

* changed the date

* Fixed the captions

* Removed the captions

* removed the link
… an hour for streaming pipelines instead of 1 minute. (#33175)

* Change the cache timeout for bundle processors to be an hour for streaming pipelines instead of 1 minute.  Use a hidden option so that it can be controlled further if desired.
* Fixed beam_PreCommit_Flink_Container.yml

* Update beam_PreCommit_Flink_Container.yml

* Update beam_PreCommit_Flink_Container.yml

* refactored the options

* added test type

* fixed the python gradle

* Added the python version

* Fixed the java test

* fixed java options

* fixed options

* fixed the options

* fixed the job name
eduramirezh and others added 30 commits January 7, 2025 09:18
* Enable caching in Python tests workflow

As can be seen in [BuildBudget's demo](https://buildbudget.dev/demo/workflow/2083803/), this
workflow costs ~$2k/month.

This change should reduce the time it takes and eventually its cost by using standard
caching techniques.

* fixup! Enable caching in Python tests workflow

* removed unnecessary input

* fixup! removed unnecessary input
* Refactored to separate authentication and session settings, and allow inheritance and overriding of SessionService

* Improve methods' javadoc
Bumps com.gradle.develocity from 3.17.6 to 3.19.

---
updated-dependencies:
- dependency-name: com.gradle.develocity
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Clean up post py38 TODOs
Bumps [yapf](https://github.com/google/yapf) from 0.29.0 to 0.43.0.
- [Changelog](https://github.com/google/yapf/blob/main/CHANGELOG.md)
- [Commits](google/yapf@v0.29.0...v0.43.0)

---
updated-dependencies:
- dependency-name: yapf
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…in types over typing variants (#33427)

* Refactor: Add convert_collections_from_typing()

Added  to convert typing module collections to built-ins. This function effectively reverses the operation of the  function.  Includes comprehensive unit tests to verify the correct conversion of various typing collections to their builtin counterparts, including nested structures and type variables.

* Flip paradigm for convert_to_beam_type to be primative and collections-centric

* update comment

* fix clobbered import from merge

* formatting

* fix imports

* address comments

* remove extra import artifacts from merge

---------

Co-authored-by: labs-code-app[bot] <161369871+labs-code-app[bot]@users.noreply.github.com>
* Documents the connectors supported via the Managed API

* Corrects a row and adjusts title text
* Pin protobuf for older hadoop tests

* trigger postcommit
* Add Iceberg support for name-based mapping schema

* Add nullable annotation

* Add nested field

* iceberg-gcp already as a runtimeOnly

* Trigger IT tests
…m writers (#33231)

* add dynamic dest test

* fix and add some tests

* add to changes.md

* fix whitespace

* trigger postcommits

* address comments
* cleanup FileIO resources

* trigger integration tests

* cleanup
* Pin protobuf 3 for debezium

* CHANGES
* Add BQMS catalog

* trigger integration tests

* build fix

* use shaded jar

* shadowClosure

* use global timeout for tests

* define version in BeamModulePlugin

* address comments
…3176 (#33545)

* Add printing to k8s script

* Temporarily skip bad namespace

* Update stale_k8s_workload_cleaner.sh

* Add context

* Update for all singlestore io instances
* fix and update tests

* dont mention df yet

* add PR link

* whitespace
…g happens (#33384)

* Add check_splittability in filesystems.

For GCS, we determine the splittability based on whether the file
meets decompressive transcoding criteria.

When decompressive transcoding occurs, the size returned from
metadata (gzip file size) does not match the size of the content
returned (original data). In this case, we set the source to
unsplittable to ensure all its content is read.

* Rename the function and remove unused one.

* Revert the previous changes and use raw_download to retrieve raw data in gcs client lib

* Raise exception for doubly compressed gcs object. Apply yapf.

* Add some comments.

* Add integration tests and fix unit test failure.

* Fix lints

* More lints

* Add a one-line description to CHANGES.md
* Add retry logic to each batch method of the GCS IO

A transient error might occur when writing a lot of shards to GCS, and right now
the GCS IO does not have any retry logic in place:

https://github.com/apache/beam/blob/a06454a2/sdks/python/apache_beam/io/gcp/gcsio.py#L269

It means that in such cases the entire bundle of elements fails, and then Beam
itself will attempt to retry the entire bundle, and will fail the job if it
exceeds the number of retries.

This change adds new logic to retry only failed requests, and uses the typical
exponential backoff strategy.

Note that this change accesses a private method (`_predicate`) of the retry
object, which we could avoid by basically copying the logic over here. But
existing code already accesses `_responses` property so maybe it's not a big
deal.

https://github.com/apache/beam/blob/b4c3a4ff/sdks/python/apache_beam/io/gcp/gcsio.py#L297

Existing (unresolved) issue in the GCS client library:

googleapis/python-storage#1277

* Catch correct exception type in `_batch_with_retry`

The `RetryError` would be always raised since the retry decorator would catch
all HTTP-related exceptions.

* Update chanelog with GCSIO retry logic fix
* It's internal and test only code, fine to change method name
* Update republish_released_docker_containers.yml

* Set up gcloud
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.