Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Apache ORC build sequence with a sample reader file #1373

Merged
merged 18 commits into from
Apr 28, 2021
Merged

Add Apache ORC build sequence with a sample reader file #1373

merged 18 commits into from
Apr 28, 2021

Conversation

oliverhu
Copy link
Contributor

Build step for #1372

…ct.. desperating

/usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer  -fno-canonical-system-headers -Wno-builtin-macro-redefined -D__DATE__="redacted" -D__TIMESTAMP__="redacted" -D__TIME__="redacted" -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -g -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -O3 -DNDEBUG -fuse-ld=gold -Wl,-no-as-needed -Wl,-z,relro,-z,now -B/usr/bin -pass-exit-codes -lstdc++ -lm -rdynamic tools/src/CMakeFiles/orc-statistics.dir/FileStatistics.cc.o -o orc-statistics  -Wl,-rpath,/usr/local/lib:  c++/src/liborc.a c++/libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a -lprotobuf -pthread c++/libs/thirdparty/zlib_ep-install/lib/libz.a c++/libs/thirdparty/snappy_ep-install/lib/libsnappy.a c++/libs/thirdparty/lz4_ep-install/lib/liblz4.a c++/libs/thirdparty/zstd_ep-install/lib/libzstd.a /usr/local/lib/libsasl2.so
@oliverhu
Copy link
Contributor Author

@yongtang

@oliverhu
Copy link
Contributor Author

Seems failing in CI 🤔 but passes locally

(p3) pi@pig:~/tf/io$ bazel build -s --verbose_failures --remote_cache=https://storage.googleapis.com/tensorflow-sigs-io --remote_upload_local_results=false //tensorflow_io/...
INFO: Invocation ID: d708e045-201c-4fab-8824-e240b24702a8
DEBUG: /home/pi/.cache/bazel/_bazel_pi/be6ac8eba0db45fb771509e53438aa5b/external/rules_foreign_cc/workspace_definitions.bzl:6:6: `@rules_foreign_cc//:workspace_definitions.bzl` has been replaced by `@rules_foreign_cc//foreign_cc:repositories.bzl`. Please use the updated source location
WARNING: /home/pi/.cache/bazel/_bazel_pi/be6ac8eba0db45fb771509e53438aa5b/external/local_config_tf/BUILD:2911:8: target 'libtensorflow_framework.so' is both a rule and a file; please choose another name for the rule
DEBUG: Rule 'libyuv' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1584999593 +0000"
DEBUG: Repository libyuv instantiated at:
  /home/pi/tf/io/WORKSPACE:135:19: in <toplevel>
Repository rule new_git_repository defined at:
  /home/pi/.cache/bazel/_bazel_pi/be6ac8eba0db45fb771509e53438aa5b/external/bazel_tools/tools/build_defs/repo/git.bzl:186:37: in <toplevel>
DEBUG: Rule 'libgav1' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1591466645 -0700"
DEBUG: Repository libgav1 instantiated at:
  /home/pi/tf/io/WORKSPACE:1010:19: in <toplevel>
Repository rule new_git_repository defined at:
  /home/pi/.cache/bazel/_bazel_pi/be6ac8eba0db45fb771509e53438aa5b/external/bazel_tools/tools/build_defs/repo/git.bzl:186:37: in <toplevel>
DEBUG: /home/pi/.cache/bazel/_bazel_pi/be6ac8eba0db45fb771509e53438aa5b/external/bazel_gazelle/internal/go_repository.bzl:184:18: gazelle: gazelle: finding module path for import golang.org/x/sys/windows/svc/eventlog: exit status 1: can't load package: package golang.org/x/sys/windows/svc/eventlog: build constraints exclude all Go files in /home/pi/.cache/bazel/_bazel_pi/be6ac8eba0db45fb771509e53438aa5b/external/bazel_gazelle_go_repository_cache/pkg/mod/golang.org/x/sys@v0.0.0-20210420072515-93ed5bcd2bfe/windows/svc/eventlog
INFO: Analyzed 73 targets (0 packages loaded, 0 targets configured).
INFO: Found 73 targets...
INFO: Elapsed time: 0.846s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action

@yongtang
Copy link
Member

I will take a look at my local env.

@oliverhu
Copy link
Contributor Author

Thanks.. let me know.

@yongtang
Copy link
Member

@oliverhu It looks like cmake+bazel is quite complex on macOS, and given that liborc depends on zstd, lib, snappy and several other dependencies which already exists inside tensorflow_io's third_party libraries, it may not be the best choice to use cmake+bazel.

I have instead converted a bazel BUILD file for liborc. I think if you add the changes from:
3780b47

then build will work.

@oliverhu
Copy link
Contributor Author

Impressive, thanks! @yongtang gonna take a look at it today

@oliverhu
Copy link
Contributor Author

@yongtang updated PR

@oliverhu
Copy link
Contributor Author

I created a skeleton implementation PR (similar to PCAP reader) in https://github.com/oliverhu/io/pull/1/files, only made it work with the unit test asset... but some progress to check if interested.

@yongtang
Copy link
Member

@oliverhu Maybe you can move https://github.com/oliverhu/io/pull/1/files here? I think a PR with unit test covered would be good enough for the first initial PR.

@oliverhu
Copy link
Contributor Author

@yongtang I was thinking separate them into a few PRs. (1) add build scripts (2) add basic functionalities with basic unit tests (similar to pcap) (3) add similar support similar to JSON IO (4) tutorials and documentations (5) other necessary features. what do you think?

@yongtang
Copy link
Member

@oliverhu The plan sounds good to me. The only issue with current PR is the file hello-time.cc which I would rather to not add. I think you can add liborc to deps of python/ops/libtensorflow_io.so in BUILD file. This will show liborc build works as the first step. The next PR can add dataset related cc files.

@oliverhu
Copy link
Contributor Author

@yongtang sg, updated

@oliverhu
Copy link
Contributor Author

CICD error is not related:

Error in download_and_extract: java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/www.nasm.us/pub/nasm/releasebuilds/2.14.02/nasm-2.14.02.tar.bz2, http://www.nasm.us/pub/nasm/releasebuilds/2.14.02/nasm-2.14.02.tar.bz2] to /root/.cache/bazel/_bazel_root/a7ecd8237744645c5d189c197108d6d2/external/nasm/temp9141544444865095817/nasm-2.14.02.tar.bz2: connect timed out
ERROR: /root/.cache/bazel/_bazel_root/a7ecd8237744645c5d189c197108d6d2/external/libjpeg_turbo/BUILD.bazel:154:8: @libjpeg_turbo//:assembly depends on @nasm//:nasm in repository @nasm which failed to fetch. no such package '@nasm//': java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/www.nasm.us/pub/nasm/releasebuilds/2.14.02/nasm-2.14.02.tar.bz2, http://www.nasm.us/pub/nasm/releasebuilds/2.14.02/nasm-2.14.02.tar.bz2] to /root/.cache/bazel/_bazel_root/a7ecd8237744645c5d189c197108d6d2/external/nasm/temp9141544444865095817/nasm-2.14.02.tar.bz2: connect timed out

@yongtang yongtang mentioned this pull request Apr 26, 2021
@yongtang
Copy link
Member

@oliverhu The CI error seems to be transient. Can you try bump the commit to trigger again?

@oliverhu
Copy link
Contributor Author

Sure, triggered a build

@oliverhu
Copy link
Contributor Author

@yongtang all tests passed except for a windows check 🤔 do we want to trigger the build again?

@yongtang
Copy link
Member

@oliverhu The windows failure seems to be caused by different versions of protobuf (or grpc) as one is from tensorflow and another is from orc dependency. There might be some discrepancies that may needs to be resolved.

@oliverhu
Copy link
Contributor Author

I see.. seems related

2021-04-27T04:28:46.7567164Z ERROR: D:/a/io/io/tensorflow_io/core/BUILD:714:10: Linking of rule '//tensorflow_io/core:python/ops/libtensorflow_io.so' failed (Exit 1169): link.exe failed: error executing command 
2021-04-27T04:28:46.7568232Z command line(1): note: see previous definition of 'struct_stat'
2021-04-27T04:28:46.7570171Z   cd C:/users/runneradmin/_bazel_runneradmin/m4q24vrm/execroot/org_tensorflow_io
2021-04-27T04:28:46.7571062Z LINK : warning LNK4044: unrecognized option '/lm'; ignored
2021-04-27T04:28:46.7575175Z   SET LIB=C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29910\ATLMFC\lib\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29910\lib\x64;C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\ucrt\x64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\um\x64
2021-04-27T04:28:46.7576843Z LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
2021-04-27T04:28:46.7588115Z     SET PATH=C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\IDE\\Extensions\Microsoft\IntelliCode\CLI;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.28.29910\bin\HostX64\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\IDE\VC\VCPackages;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\IDE\CommonExtensions\Microsoft\TestWindow;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\IDE\CommonExtensions\Microsoft\TeamFoundation\Team Explorer;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\bin\Roslyn;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Team Tools\Performance Tools\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Team Tools\Performance Tools;C:\Program Files (x86)\Microsoft Visual Studio\Shared\Common\VSPerfCollectionTools\vs2019\\x64;C:\Program Files (x86)\Microsoft Visual Studio\Shared\Common\VSPerfCollectionTools\vs2019\;C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.8 Tools\x64\;C:\Program Files (x86)\HTML Help Workshop;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\IDE\CommonExtensions\Microsoft\FSharp\;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\devinit;C:\Program Files (x86)\Windows Kits\10\bin\10.0.19041.0\x64;C:\Program Files (x86)\Windows Kits\10\bin\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\\MSBuild\Current\Bin;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\IDE\;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\;;C:\Windows\system32;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\Llvm\x64\bin;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja;C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\IDE\VC\Linux\bin\ConnectionManagerExe
2021-04-27T04:28:46.7609037Z protobuf_lite.lib(common.obj) : error LNK2005: "public: __cdecl google::protobuf::internal::LogMessage::LogMessage(enum google::protobuf::LogLevel,char const *,int)" (??0LogMessage@internal@protobuf@google@@QEAA@W4LogLevel@23@PEBDH@Z) already defined in _pywrap_tensorflow_internal.lib(_pywrap_tensorflow_internal.pyd)
2021-04-27T04:28:46.7611532Z     SET ***
2021-04-27T04:28:46.7612903Z protobuf_lite.lib(common.obj) : error LNK2005: "public: __cdecl google::protobuf::internal::LogMessage::~LogMessage(void)" (??1LogMessage@internal@protobuf@google@@QEAA@XZ) already defined in _pywrap_tensorflow_internal.lib(_pywrap_tensorflow_internal.pyd)
2021-04-27T04:28:46.7614334Z     SET RUNFILES_MANIFEST_ONLY=1
2021-04-27T04:28:46.7614906Z     SET TEMP=C:\Users\RUNNER~1\AppData\Local\Temp
2021-04-27T04:28:46.7615777Z     SET TF_HEADER_DIR=C:/hostedtoolcache/windows/Python/3.8.9/x64/lib/site-packages/tensorflow/include
2021-04-27T04:28:46.7616878Z     SET TF_SHARED_LIBRARY_DIR=C:/hostedtoolcache/windows/Python/3.8.9/x64/lib/site-packages/tensorflow/python
2021-04-27T04:28:46.7617985Z     SET TF_SHARED_LIBRARY_NAME=_pywrap_tensorflow_internal.lib
2021-04-27T04:28:46.7618796Z     SET TMP=C:\Users\RUNNER~1\AppData\Local\Temp
2021-04-27T04:28:46.7620076Z   C:/Program Files (x86)/Microsoft Visual Studio/2019/Enterprise/VC/Tools/MSVC/14.28.29910/bin/HostX64/x64/link.exe @bazel-out/x64_windows-fastbuild/bin/tensorflow_io/core/python/ops/libtensorflow_io.so-2.params
2021-04-27T04:28:46.7621241Z Execution platform: @local_config_platform//:host
2021-04-27T04:28:46.7622842Z protobuf_lite.lib(common.obj) : error LNK2005: "public: class google::protobuf::internal::LogMessage & __cdecl google::protobuf::internal::LogMessage::operator<<(char const *)" (??6LogMessage@internal@protobuf@google@@QEAAAEAV0123@PEBD@Z) already defined in _pywrap_tensorflow_internal.lib(_pywrap_tensorflow_internal.pyd)
2021-04-27T04:28:46.7625461Z protobuf_lite.lib(common.obj) : error LNK2005: "public: void __cdecl google::protobuf::internal::LogFinisher::operator=(class google::protobuf::internal::LogMessage &)" (??4LogFinisher@internal@protobuf@google@@QEAAXAEAVLogMessage@123@@Z) already defined in _pywrap_tensorflow_internal.lib(_pywrap_tensorflow_internal.pyd)
2021-04-27T04:28:46.7628658Z protobuf_lite.lib(generated_message_util.obj) : error LNK2005: "class google::protobuf::internal::ExplicitlyConstructed<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > google::protobuf::internal::fixed_address_empty_string" (?fixed_address_empty_string@internal@protobuf@google@@3V?$ExplicitlyConstructed@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@123@A) already defined in _pywrap_tensorflow_internal.lib(_pywrap_tensorflow_internal.pyd)
2021-04-27T04:28:46.7634171Z protobuf_lite.lib(generated_enum_util.obj) : error LNK2005: "bool __cdecl google::protobuf::internal::InitializeEnumStrings(struct google::protobuf::internal::EnumEntry const *,int const *,unsigned __int64,class google::protobuf::internal::ExplicitlyConstructed<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > *)" (?InitializeEnumStrings@internal@protobuf@google@@YA_NPEBUEnumEntry@123@PEBH_KPEAV?$ExplicitlyConstructed@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@123@@Z) already defined in _pywrap_tensorflow_internal.lib(_pywrap_tensorflow_internal.pyd)
2021-04-27T04:28:46.7638436Z    Creating library bazel-out/x64_windows-fastbuild/bin/tensorflow_io/core/python/ops/python/ops/libtensorflow_io.so.if.lib and object bazel-out/x64_windows-fastbuild/bin/tensorflow_io/core/python/ops/python/ops/libtensorflow_io.so.if.exp
2021-04-27T04:28:46.7640278Z bazel-out\x64_windows-fastbuild\bin\tensorflow_io\core\python\ops\libtensorflow_io.so : fatal error LNK1169: one or more multiply defined symbols found
2021-04-27T04:28:47.0865798Z INFO: Elapsed time: 1005.536s, Critical Path: 19.79s
2021-04-27T04:28:47.0868758Z INFO: 5149 processes: 4776 remote cache hit, 339 internal, 34 local.
2021-04-27T04:28:47.0869864Z FAILED: Build did NOT complete successfully
2021-04-27T04:28:47.1095442Z FAILED: Build did NOT complete successfully

How do you guys repro windows related issues or test windows build locally ?

@oliverhu
Copy link
Contributor Author

@yongtang I moved @liborc into a dummy orc_ops cc_library and it works now. Even though I don't quite understand how it works for windows with current setup 🤔

@oliverhu
Copy link
Contributor Author

@yongtang updated the next PR as well: https://github.com/oliverhu/io/pull/1/files#diff-d4fa822d9271aeb06aa7aea41f509ba0bcc00e3cbbec14b300671df2375df9beR41 , the functionality should be similar to JSON IO as well (the keras model test passes).

@yongtang
Copy link
Member

@yongtang I moved @liborc into a dummy orc_ops cc_library and it works now. Even though I don't quite understand how it works for windows with current setup

I am not sure as well, though since the build is successful for all platforms, I think we can merge it for now.

@yongtang yongtang merged commit 43e9be4 into tensorflow:master Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants