Skip to content

Commit

Permalink
Simplify build by creating a fake extension (#34)
Browse files Browse the repository at this point in the history
Summary:
This PR simplifies our build setup from `setup.py`'s side: instead of creating one `Extension` object per `.so` file that we build, we declare only *one* fake Extension object and let CMake do all the work. This PR addresses https://www.internalfb.com/diff/D58585136?dst_version_fbid=386542291075917&transaction_fbid=3369755276656440.

Pros:

- We don't have to keep the .so files name in sync between CMake and setup.py. In D58585136 that proved to be challenging and brittle.
- The CMake buid is called only once. This is simpler to debug, less surprising, and slightly faster
- We can unblock D58585136

Cons:

- We have to handle the "move extensions from their temp build dir back into where they're expected" logic ourselves - see comments for clarifications.

We had a non-standard setup before, now we have another non-standard setup. Overall, I think we gained a little bit of simplicity with that setup, even though that made me write a lot of  comments. And that will help us unblock D58585136.

Pull Request resolved: #34

Reviewed By: ahmadsharif1

Differential Revision: D58646830

Pulled By: NicolasHug

fbshipit-source-id: fac238ad78931ee923ba3af7d5f6be1d1930bd36
  • Loading branch information
NicolasHug authored and facebook-github-bot committed Jun 17, 2024
1 parent 58350e1 commit 3816bc1
Show file tree
Hide file tree
Showing 3 changed files with 61 additions and 42 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/wheel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ jobs:
run: |
wheel_path=`find dist -type f -name "*.whl"`
echo Installing $wheel_path
python -m pip install $wheel_path
python -m pip install $wheel_path -vvv
- name: Check out repo
uses: actions/checkout@v3
Expand Down
99 changes: 60 additions & 39 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,11 @@


class CMakeBuild(build_ext):

def __init__(self, *args, **kwargs):
self._install_prefix = None
super().__init__(*args, **kwargs)

def run(self):
try:
subprocess.check_output(["cmake", "--version"])
Expand All @@ -57,13 +62,51 @@ def run(self):
super().run()

def build_extension(self, ext):
install_prefix = Path(self.get_ext_fullpath(ext.name)).parent.absolute()
"""Call our CMake build system to build libtorchcodec?.so"""
# Setuptools was designed to build one extension (.so file) at a time,
# calling this method for each Extension object. We're using a
# CMake-based build where all our extensions are built together at once.
# If we were to declare one Extension object per .so file as in a
# standard setup, a) we'd have to keep the Extensions names in sync with
# the CMake targets, and b) we would be calling into CMake for every
# single extension: that's overkill and inefficient, since CMake builds
# all the extensions at once. To avoid all that we create a *single*
# fake Extension which triggers the CMake build only once.
assert ext.name == "FAKE_NAME", f"Unexpected extension name: {ext.name}"
# The price to pay for our non-standard setup is that we have to tell
# setuptools *where* those extensions are expected to be within the
# source tree (for sdists or editable installs) or within the wheel.
# Normally, setuptools relies on the extension's name to figure that
# out, e.g. an extension named `torchcodec.libtorchcodec.so` would be
# placed in `torchcodec/` and importable from `torchcodec.`. From that,
# setuptools knows how to move the extensions from their temp build
# directories back into the proper dir.
# Our fake extension's name is just a placeholder, so we have to handle
# that relocation logic ourselves.
# _install_prefix is the temp directory where the built extension(s)
# will be "installed" by CMake. Once they're copied to install_prefix,
# the built .so files still need to be copied back into:
# - the source tree (for editable installs) - this is handled in
# copy_extensions_to_source()
# - the (temp) wheel directory (when building a wheel). I cannot tell
# exactly *where* this is handled, but for this to work we must
# prepend the "/torchcodec" folder to _install_prefix: this tells
# setuptools to eventually move those .so files into `torchcodec/`.
# It may seem overkill to 'cmake install' the extensions in a temp
# directory and move them back to another dir, but this is what
# setuptools would do and expect even in a standard build setup.
self._install_prefix = (
Path(self.get_ext_fullpath(ext.name)).parent.absolute() / "torchcodec"
)
self._build_all_extensions_with_cmake()

def _build_all_extensions_with_cmake(self):
# Note that self.debug is True when you invoke setup.py like this:
# python setup.py build_ext --debug install
build_type = "Debug" if self.debug else "Release"
torch_dir = Path(torch.utils.cmake_prefix_path) / "Torch"
cmake_args = [
f"-DCMAKE_INSTALL_PREFIX={install_prefix}",
f"-DCMAKE_INSTALL_PREFIX={self._install_prefix}",
f"-DTorch_DIR={torch_dir}",
"-DCMAKE_VERBOSE_MAKEFILE=ON",
f"-DCMAKE_BUILD_TYPE={build_type}",
Expand All @@ -77,17 +120,18 @@ def build_extension(self, ext):
subprocess.check_call(["cmake", "--build", "."], cwd=self.build_temp)
subprocess.check_call(["cmake", "--install", "."], cwd=self.build_temp)

def get_ext_filename(self, fullname):
# When copying the .so files from the build tmp dir to the actual
# package dir, this tells setuptools to look for a .so file without the
# Python ABI suffix, i.e. "libtorchcodec4.so" instead of e.g.
# "libtorchcodec4.cpython-38-x86_64-linux-gnu.so", which is what
# setuptools looks for by default.
ext_filename = super().get_ext_filename(fullname)
ext_filename_parts = ext_filename.split(".")
without_abi = ext_filename_parts[:-2] + ext_filename_parts[-1:]
ext_filename = ".".join(without_abi)
return ext_filename
def copy_extensions_to_source(self):
"""Copy built extensions from temporary folder back into source tree.
This is called by setuptools at the end of .run() during editable installs.
"""
self.get_finalized_command("build_py")

for so_file in self._install_prefix.glob("*.so"):
assert "libtorchcodec" in so_file.name
destination = Path("src/torchcodec/") / so_file.name
print(f"Copying {so_file} to {destination}")
self.copy_file(so_file, destination, level=self.verbose)


NOT_A_LICENSE_VIOLATION_VAR = "I_CONFIRM_THIS_IS_NOT_A_LICENSE_VIOLATION"
Expand All @@ -105,29 +149,6 @@ def get_ext_filename(self, fullname):
f"If you have a good reason *not* to, then set {NOT_A_LICENSE_VIOLATION_VAR}."
)

# If BUILD_AGAINST_ALL_FFMPEG_FROM_S3 is set then we want to build against all
# ffmpeg major version that are available on our S3 bucket.
# If BUILD_AGAINST_ALL_FFMPEG_FROM_S3 is not set, we only build against the
# installed FFmpeg. We don't know what FFmpeg version that is, so we build
# `libtorchcodec.so` without any version suffix. We could probably figure out
# the version number by invoking `pkg-config --modversion`.
FFMPEG_MAJOR_VERSIONS = (4, 5, 6, 7) if build_against_all_ffmpeg_from_s3 else ("",)
extensions = [
Extension(
# The names here must be kept in sync with the target names in the
# CMakeLists file. Grep for [ LIBTORCHCODEC_KEEP_IN_SYNC ]. The name
# parameter specifies not just the name but mainly *where* the .so file
# should be copied to. By setting the name this way, we're telling
# `libtorchcodec{ffmpeg_major_version}.so` to be copied at the root of
# the torchcodec package (next to the __init__.py file). This is where
# it's expected to be when we call torch.ops.load_library().
name=f"torchcodec.libtorchcodec{ffmpeg_major_version}",
# sources is a mandatory parameter so we have to pass it, but we leave
# it empty because we'll be building the extensions with our custom
# CMakeBuild class, and the sources are specified within the
# CMakeLists.txt file.
sources=[],
)
for ffmpeg_major_version in FFMPEG_MAJOR_VERSIONS
]
setup(ext_modules=extensions, cmdclass={"build_ext": CMakeBuild})
# See `CMakeBuild.build_extension()`.
fake_extension = Extension(name="FAKE_NAME", sources=[])
setup(ext_modules=[fake_extension], cmdclass={"build_ext": CMakeBuild})
2 changes: 0 additions & 2 deletions src/torchcodec/decoders/_core/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,6 @@ if(DEFINED ENV{BUILD_AGAINST_ALL_FFMPEG_FROM_S3})
${CMAKE_CURRENT_SOURCE_DIR}/fetch_and_expose_non_gpl_ffmpeg_libs.cmake
)

# The libtorchcodec names here must be kept in sync with the library names
# in setup.py. Grep for [ LIBTORCHCODEC_KEEP_IN_SYNC ]
make_torchcodec_library(libtorchcodec4 ffmpeg4)
make_torchcodec_library(libtorchcodec5 ffmpeg5)
make_torchcodec_library(libtorchcodec6 ffmpeg6)
Expand Down

0 comments on commit 3816bc1

Please sign in to comment.