forked from apache/beam
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync with open source how #118
Draft
lesterhaynes
wants to merge
5,682
commits into
linkedin:li_trunk
Choose a base branch
from
apache:master
base: li_trunk
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add input boxes for required user inputs * Remove unnecessary comments. --------- Co-authored-by: Claude <cvandermerwe@google.com>
Bumps org.sonarqube from 3.0 to 6.0.0.5145. --- updated-dependencies: - dependency-name: org.sonarqube dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
From #30507 (comment), try to use the default machine types for Flink with more memory.
* Enable Java SDK Distroless container image variant * Add LANG environment and /usr/lib/locale * Use examples tests instead
* Use --enable-component-gateway when creating the flink cluster * Update flink_cluster.sh
* Add a new precommit to test finl container * Changed trigger file for Flink container workflow * updated the timeout * only allow maual trigger to test * fixed the PR check * fixed the workflow checks
* More complete error message for StripErrorMetadata. * Update sdks/python/apache_beam/yaml/yaml_mapping.py Co-authored-by: Danny McCormick <dannymccormick@google.com> * fix formatting, paren --------- Co-authored-by: Danny McCormick <dannymccormick@google.com>
* Fixed the new flink container precommit * trigger it * tried to trigger the workflow * at least 2 workers * trigger it
* Align SDK container version with pipeline submission env * Disable ZetaSQL test on Java8
…another metric type.
* [Accenture Baltics] Case Study * changed the date * changed the date * Fixed the captions * Removed the captions * removed the link
… an hour for streaming pipelines instead of 1 minute. (#33175) * Change the cache timeout for bundle processors to be an hour for streaming pipelines instead of 1 minute. Use a hidden option so that it can be controlled further if desired.
* Fixed beam_PreCommit_Flink_Container.yml * Update beam_PreCommit_Flink_Container.yml * Update beam_PreCommit_Flink_Container.yml * refactored the options * added test type * fixed the python gradle * Added the python version * Fixed the java test * fixed java options * fixed options * fixed the options * fixed the job name
* Enable caching in Python tests workflow As can be seen in [BuildBudget's demo](https://buildbudget.dev/demo/workflow/2083803/), this workflow costs ~$2k/month. This change should reduce the time it takes and eventually its cost by using standard caching techniques. * fixup! Enable caching in Python tests workflow * removed unnecessary input * fixup! removed unnecessary input
* Refactored to separate authentication and session settings, and allow inheritance and overriding of SessionService * Improve methods' javadoc
Bumps com.gradle.develocity from 3.17.6 to 3.19. --- updated-dependencies: - dependency-name: com.gradle.develocity dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Clean up post py38 TODOs
Bumps [yapf](https://github.com/google/yapf) from 0.29.0 to 0.43.0. - [Changelog](https://github.com/google/yapf/blob/main/CHANGELOG.md) - [Commits](google/yapf@v0.29.0...v0.43.0) --- updated-dependencies: - dependency-name: yapf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…in types over typing variants (#33427) * Refactor: Add convert_collections_from_typing() Added to convert typing module collections to built-ins. This function effectively reverses the operation of the function. Includes comprehensive unit tests to verify the correct conversion of various typing collections to their builtin counterparts, including nested structures and type variables. * Flip paradigm for convert_to_beam_type to be primative and collections-centric * update comment * fix clobbered import from merge * formatting * fix imports * address comments * remove extra import artifacts from merge --------- Co-authored-by: labs-code-app[bot] <161369871+labs-code-app[bot]@users.noreply.github.com>
* Documents the connectors supported via the Managed API * Corrects a row and adjusts title text
* Pin protobuf for older hadoop tests * trigger postcommit
* Add Iceberg support for name-based mapping schema * Add nullable annotation * Add nested field * iceberg-gcp already as a runtimeOnly * Trigger IT tests
…m writers (#33231) * add dynamic dest test * fix and add some tests * add to changes.md * fix whitespace * trigger postcommits * address comments
* cleanup FileIO resources * trigger integration tests * cleanup
* Pin protobuf 3 for debezium * CHANGES
* Add BQMS catalog * trigger integration tests * build fix * use shaded jar * shadowClosure * use global timeout for tests * define version in BeamModulePlugin * address comments
…3176 (#33545) * Add printing to k8s script * Temporarily skip bad namespace * Update stale_k8s_workload_cleaner.sh * Add context * Update for all singlestore io instances
* fix and update tests * dont mention df yet * add PR link * whitespace
…g happens (#33384) * Add check_splittability in filesystems. For GCS, we determine the splittability based on whether the file meets decompressive transcoding criteria. When decompressive transcoding occurs, the size returned from metadata (gzip file size) does not match the size of the content returned (original data). In this case, we set the source to unsplittable to ensure all its content is read. * Rename the function and remove unused one. * Revert the previous changes and use raw_download to retrieve raw data in gcs client lib * Raise exception for doubly compressed gcs object. Apply yapf. * Add some comments. * Add integration tests and fix unit test failure. * Fix lints * More lints * Add a one-line description to CHANGES.md
…le StreamingGetWorkResponseChunk (#33512)
* Add retry logic to each batch method of the GCS IO A transient error might occur when writing a lot of shards to GCS, and right now the GCS IO does not have any retry logic in place: https://github.com/apache/beam/blob/a06454a2/sdks/python/apache_beam/io/gcp/gcsio.py#L269 It means that in such cases the entire bundle of elements fails, and then Beam itself will attempt to retry the entire bundle, and will fail the job if it exceeds the number of retries. This change adds new logic to retry only failed requests, and uses the typical exponential backoff strategy. Note that this change accesses a private method (`_predicate`) of the retry object, which we could avoid by basically copying the logic over here. But existing code already accesses `_responses` property so maybe it's not a big deal. https://github.com/apache/beam/blob/b4c3a4ff/sdks/python/apache_beam/io/gcp/gcsio.py#L297 Existing (unresolved) issue in the GCS client library: googleapis/python-storage#1277 * Catch correct exception type in `_batch_with_retry` The `RetryError` would be always raised since the retry decorator would catch all HTTP-related exceptions. * Update chanelog with GCSIO retry logic fix
* It's internal and test only code, fine to change method name
* Update republish_released_docker_containers.yml * Set up gcloud
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.