sync with open source how #118

lesterhaynes · 2024-03-14T08:15:08Z

Please add a meaningful description for your change here

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

* Add input boxes for required user inputs * Remove unnecessary comments. --------- Co-authored-by: Claude <cvandermerwe@google.com>

Bumps org.sonarqube from 3.0 to 6.0.0.5145. --- updated-dependencies: - dependency-name: org.sonarqube dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

This reverts commit 4073aba.

From #30507 (comment), try to use the default machine types for Flink with more memory.

* Enable Java SDK Distroless container image variant * Add LANG environment and /usr/lib/locale * Use examples tests instead

* Use --enable-component-gateway when creating the flink cluster * Update flink_cluster.sh

* Add a new precommit to test finl container * Changed trigger file for Flink container workflow * updated the timeout * only allow maual trigger to test * fixed the PR check * fixed the workflow checks

* More complete error message for StripErrorMetadata. * Update sdks/python/apache_beam/yaml/yaml_mapping.py Co-authored-by: Danny McCormick <dannymccormick@google.com> * fix formatting, paren --------- Co-authored-by: Danny McCormick <dannymccormick@google.com>

* Update website for 2.61.0 release * Update CHANGES.md * Update beam-2.61.0.md * Update downloads.md * Update CHANGES.md * Update beam-2.61.0.md

* Fixed the new flink container precommit * trigger it * tried to trigger the workflow * at least 2 workers * trigger it

* Align SDK container version with pipeline submission env * Disable ZetaSQL test on Java8

…another metric type.

* [Accenture Baltics] Case Study * changed the date * changed the date * Fixed the captions * Removed the captions * removed the link

… an hour for streaming pipelines instead of 1 minute. (#33175) * Change the cache timeout for bundle processors to be an hour for streaming pipelines instead of 1 minute. Use a hidden option so that it can be controlled further if desired.

* Fixed beam_PreCommit_Flink_Container.yml * Update beam_PreCommit_Flink_Container.yml * Update beam_PreCommit_Flink_Container.yml * refactored the options * added test type * fixed the python gradle * Added the python version * Fixed the java test * fixed java options * fixed options * fixed the options * fixed the job name

* Enable caching in Python tests workflow As can be seen in [BuildBudget's demo](https://buildbudget.dev/demo/workflow/2083803/), this workflow costs ~$2k/month. This change should reduce the time it takes and eventually its cost by using standard caching techniques. * fixup! Enable caching in Python tests workflow * removed unnecessary input * fixup! removed unnecessary input

* Refactored to separate authentication and session settings, and allow inheritance and overriding of SessionService * Improve methods' javadoc

Bumps com.gradle.develocity from 3.17.6 to 3.19. --- updated-dependencies: - dependency-name: com.gradle.develocity dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Clean up post py38 TODOs

Bumps [yapf](https://github.com/google/yapf) from 0.29.0 to 0.43.0. - [Changelog](https://github.com/google/yapf/blob/main/CHANGELOG.md) - [Commits](google/yapf@v0.29.0...v0.43.0) --- updated-dependencies: - dependency-name: yapf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…in types over typing variants (#33427) * Refactor: Add convert_collections_from_typing() Added to convert typing module collections to built-ins. This function effectively reverses the operation of the function. Includes comprehensive unit tests to verify the correct conversion of various typing collections to their builtin counterparts, including nested structures and type variables. * Flip paradigm for convert_to_beam_type to be primative and collections-centric * update comment * fix clobbered import from merge * formatting * fix imports * address comments * remove extra import artifacts from merge --------- Co-authored-by: labs-code-app[bot] <161369871+labs-code-app[bot]@users.noreply.github.com>

* Documents the connectors supported via the Managed API * Corrects a row and adjusts title text

* Pin protobuf for older hadoop tests * trigger postcommit

… is 1. (#33524)

…3515)

* Add Iceberg support for name-based mapping schema * Add nullable annotation * Add nested field * iceberg-gcp already as a runtimeOnly * Trigger IT tests

…m writers (#33231) * add dynamic dest test * fix and add some tests * add to changes.md * fix whitespace * trigger postcommits * address comments

* cleanup FileIO resources * trigger integration tests * cleanup

* Pin protobuf 3 for debezium * CHANGES

…n Windows (#33542)

* Add BQMS catalog * trigger integration tests * build fix * use shaded jar * shadowClosure * use global timeout for tests * define version in BeamModulePlugin * address comments

…3176 (#33545) * Add printing to k8s script * Temporarily skip bad namespace * Update stale_k8s_workload_cleaner.sh * Add context * Update for all singlestore io instances

* fix and update tests * dont mention df yet * add PR link * whitespace

…g happens (#33384) * Add check_splittability in filesystems. For GCS, we determine the splittability based on whether the file meets decompressive transcoding criteria. When decompressive transcoding occurs, the size returned from metadata (gzip file size) does not match the size of the content returned (original data). In this case, we set the source to unsplittable to ensure all its content is read. * Rename the function and remove unused one. * Revert the previous changes and use raw_download to retrieve raw data in gcs client lib * Raise exception for doubly compressed gcs object. Apply yapf. * Add some comments. * Add integration tests and fix unit test failure. * Fix lints * More lints * Add a one-line description to CHANGES.md

…3552) This reverts commit 1b78b67.

…le StreamingGetWorkResponseChunk (#33512)

* Add retry logic to each batch method of the GCS IO A transient error might occur when writing a lot of shards to GCS, and right now the GCS IO does not have any retry logic in place: https://github.com/apache/beam/blob/a06454a2/sdks/python/apache_beam/io/gcp/gcsio.py#L269 It means that in such cases the entire bundle of elements fails, and then Beam itself will attempt to retry the entire bundle, and will fail the job if it exceeds the number of retries. This change adds new logic to retry only failed requests, and uses the typical exponential backoff strategy. Note that this change accesses a private method (`_predicate`) of the retry object, which we could avoid by basically copying the logic over here. But existing code already accesses `_responses` property so maybe it's not a big deal. https://github.com/apache/beam/blob/b4c3a4ff/sdks/python/apache_beam/io/gcp/gcsio.py#L297 Existing (unresolved) issue in the GCS client library: googleapis/python-storage#1277 * Catch correct exception type in `_batch_with_retry` The `RetryError` would be always raised since the retry decorator would catch all HTTP-related exceptions. * Update chanelog with GCSIO retry logic fix

* It's internal and test only code, fine to change method name

* Update republish_released_docker_containers.yml * Set up gcloud

github-actions bot added build java python examples go infra kotlin learning model labels Mar 14, 2024

claudevdm and others added 21 commits November 22, 2024 12:54

Add input boxes for required user inputs (#33183)

d4a9ca8

* Add input boxes for required user inputs * Remove unnecessary comments. --------- Co-authored-by: Claude <cvandermerwe@google.com>

Revert "Bump org.sonarqube from 3.0 to 6.0.0.5145 (#33174)" (#33193)

b70375d

This reverts commit 4073aba.

Remove the default machine types (#33191)

d342dd3

From #30507 (comment), try to use the default machine types for Flink with more memory.

Enable Java SDK Distroless container image variants. (#33173)

09c1c9e

* Enable Java SDK Distroless container image variant * Add LANG environment and /usr/lib/locale * Use examples tests instead

Remove leftover print statement in SpannerIO (#33200)

e5bd69d

Use --enable-component-gateway when creating the flink cluster (#33198)

60f8793

* Use --enable-component-gateway when creating the flink cluster * Update flink_cluster.sh

Add a new precommit workflow to test flink container (#33206)

78630d1

* Add a new precommit to test finl container * Changed trigger file for Flink container workflow * updated the timeout * only allow maual trigger to test * fixed the PR check * fixed the workflow checks

Remove sickbay GitHub Actions (#33214)

9bdc9a5

Update website for 2.61.0 release (#33117)

2e4417f

* Update website for 2.61.0 release * Update CHANGES.md * Update beam-2.61.0.md * Update downloads.md * Update CHANGES.md * Update beam-2.61.0.md

Move MetricAggregator to its only use in BundleBasedDirectRunner.

c2f5686

Consolidate implementation of metric cells.

aa939c1

Fixed the new flink container precommit (#33217)

ea93ce5

* Fixed the new flink container precommit * trigger it * tried to trigger the workflow * at least 2 workers * trigger it

ZetaSQL test Java version fixes (#33213)

c0ab7e5

* Align SDK container version with pipeline submission env * Disable ZetaSQL test on Java8

Type hints.

3c9a60b

Merge pull request #33195 Simplify metrics in preparation for adding …

d37d141

…another metric type.

[Accenture Baltics] Case Study (#33215)

ad8545c

* [Accenture Baltics] Case Study * changed the date * changed the date * Fixed the captions * Removed the captions * removed the link

Integrate direct path with StreamingDataflowWorker code path (#32778)

aa21e4a

eduramirezh and others added 30 commits January 7, 2025 09:18

SolaceIO: separate auth and session settings (#32406)

f17bb8d

* Refactored to separate authentication and session settings, and allow inheritance and overriding of SessionService * Improve methods' javadoc

Clean up py38 targets (#33510)

39850bf

* Clean up post py38 TODOs

Documents the connectors supported via the Managed API (#33516)

b911cca

* Documents the connectors supported via the Managed API * Corrects a row and adjusts title text

Update REVIEWERS.yml with the BigTable team (#33492)

ab354db

Pin protobuf for older hadoop tests (#33525)

fa086f9

* Pin protobuf for older hadoop tests * trigger postcommit

Return zero elements immediately if the requested number of quantiles…

6618968

… is 1. (#33524)

[#33513][prism]Handle Time sorted requirement and drop late data. (#3…

6509e51

…3515)

Add Iceberg support for name-based mapping schema (#33315)

40151ab

* Add Iceberg support for name-based mapping schema * Add nullable annotation * Add nested field * iceberg-gcp already as a runtimeOnly * Trigger IT tests

[BigQueryIO] fetch updated schema for newly created Storage API strea…

b4c3a4f

…m writers (#33231) * add dynamic dest test * fix and add some tests * add to changes.md * fix whitespace * trigger postcommits * address comments

[Iceberg] cleanup FileIO resources (#33509)

5a3ddc4

* cleanup FileIO resources * trigger integration tests * cleanup

Pin protobuf 3 for debezium (#33541)

a19466a

* Pin protobuf 3 for debezium * CHANGES

[#31438] WindowingStrategy Plumbing + AllowedLateness + Fixing Sessio…

4fc5c86

…n Windows (#33542)

[Managed Iceberg] support BQMS catalog (#33511)

6b3783f

* Add BQMS catalog * trigger integration tests * build fix * use shaded jar * shadowClosure * use global timeout for tests * define version in BeamModulePlugin * address comments

Temporarily skip deleting beam-performancetests-singlestoreio-1266137…

c6f8aae

…3176 (#33545) * Add printing to k8s script * Temporarily skip bad namespace * Update stale_k8s_workload_cleaner.sh * Add context * Update for all singlestore io instances

[Managed Iceberg] Fix partition value race condition (#33549)

d50cc15

* fix and update tests * dont mention df yet * add PR link * whitespace

Give some folks triage permissions (#33550)

f1f079b

Revert "Bump yapf from 0.29.0 to 0.43.0 in /sdks/python (#33112)" (#3…

2b72e12

…3552) This reverts commit 1b78b67.

Support use staged harness for Dataflow runner v2 job (#33508)

35732ce

[Dataflow Streaming] Support to receive multiple work items in a sing…

d7c5691

…le StreamingGetWorkResponseChunk (#33512)

Temporarily use self hosted runners for snapshots (#33563)

3b1dff9

Add issue link to disabled TensorRT integration test (#33079)

b649235

Use randbytes in custom LCG random generator (#33557)

6b3bca3

* It's internal and test only code, fine to change method name

Do not publish javadoc jar when exportJavadoc disabled (#33560)

0ea0fb4

Use self-hosted runners for republish workflow (#33507)

0b2e57c

* Update republish_released_docker_containers.yml * Set up gcloud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync with open source how #118

sync with open source how #118

lesterhaynes commented Mar 14, 2024

sync with open source how #118

Are you sure you want to change the base?

sync with open source how #118

Conversation

lesterhaynes commented Mar 14, 2024

GitHub Actions Tests Status (on master branch)