Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: multi-arch dependencies #2197

Open
pmatilai opened this issue Sep 19, 2022 · 22 comments · May be fixed by #3578
Open

RFE: multi-arch dependencies #2197

pmatilai opened this issue Sep 19, 2022 · 22 comments · May be fixed by #3578
Labels
design Complicated design issue fileformat Matters concerning package (file) format generator Dependency generation related RFE transaction
Milestone

Comments

@pmatilai
Copy link
Member

pmatilai commented Sep 19, 2022

The current dependency design comes from nineties and the multilib support (those dreaded (64bit)() postfix markers on dependencies) from the early millenium was always an ugly hack, necessiated by backwards compatibility. It's not so much a technically complicated change as it's a compatibility issue, there hasn't been a good time to make such a change. The oncoming rpm v6 format provides a nice point to reset the dependency namespace, but we could and should make the option available before that.

#1038 is a PR in this direction, but there are further details to consider wrt storage and otherwise.

@pmatilai pmatilai added RFE fileformat Matters concerning package (file) format labels Sep 19, 2022
@pmatilai pmatilai added this to RPM Sep 19, 2022
@pmatilai pmatilai moved this to Backlog in RPM Sep 19, 2022
@pmatilai pmatilai added this to the 4.20.0 milestone Nov 17, 2022
@pmatilai
Copy link
Member Author

pmatilai commented Nov 17, 2022

PR #1038 suggested appending the entire arch as a marker to the extracted sonames, eg libfoo.so.1()(x86_64) but I dislike it as it pollutes the dependency name, making it unnecessarily hard to query for.

Just this morning it dawned on me that perhaps we could place the architecture into the EVR string instead, ie the above example would become libfoo.so.1 = x86_64. This allows the dependency to be used both in arch-specific and arch-independent manner, depending (pun half intended) on the situation, whether when querying in dependent packages.

One potential issue is that old 32bit packages would now match anything at all, but that should be workable with a new rpmds flag to indicate arch specific, and not letting such dependencies match any provides without a matching flag, which the old packages wouldn't obviously have. Equals (or not) is the only relevant comparison for this kind of "EVR" of course.

Thoughts? Downsides I'm not seeing? Multiple different versions of a provide is not something we commonly have so there may be gremlins related to that.

Turning Neal's patch from #1038 into this is just a couple of lines: pmatilai@046625c (in case people want to play around with it)

@Conan-Kudo
Copy link
Member

Thoughts? Downsides I'm not seeing? Multiple different versions of a provide is not something we commonly have so there may be gremlins related to that.

Actually, why don't we do this for adding the actual version for soname dependencies? I don't think it makes sense to use the arch this way, because it makes versioned sonames potentially unreliable.

I think that's why we never did this before, and that was one reason I didn't change it to an EVR string in #1038.

@pmatilai
Copy link
Member Author

Do you mean the soname version or symbol versions? In either case though, the problem is they look like versions but are nothing like rpm versions at all, only equivalence matters so it'd just be confusing a plenty. Also the soname version is a literal part of the string programs link against and breaking that up would only make matters more difficult.

@Conan-Kudo
Copy link
Member

Conan-Kudo commented Nov 17, 2022

I mean something like this: libfoo.so.1 = 1.2.3-4. Debian actually does this in their symbol files, which allows packages to determine what minimum version alongside a soname they need.

@pmatilai
Copy link
Member Author

Interesting, I didn't know that. It's an idea. Of course, arch in dependency version is also just an idea that does have some desireable qualities over something else, but also downsides and limits.

Anyway, before any actual decisions we need to come up with actual list of requirements and goals for the multiarch support and then see how to best achieve those, so these ideas are more like food for thought until then.

@pmatilai pmatilai removed this from the 4.20.0 milestone Jan 26, 2023
@pmatilai pmatilai added this to the 4.20.0 milestone Aug 28, 2023
@pmatilai pmatilai moved this from Backlog to Todo in RPM Aug 28, 2023
@pmatilai pmatilai added generator Dependency generation related transaction labels Sep 14, 2023
@pmatilai pmatilai moved this from Todo to Priority in RPM Feb 21, 2024
@pmatilai pmatilai self-assigned this Feb 21, 2024
@pmatilai pmatilai moved this from Priority to In Progress in RPM Feb 21, 2024
@pmatilai pmatilai moved this from In Progress to Todo in RPM Feb 21, 2024
@pmatilai pmatilai modified the milestones: 4.20.0, 6.0.0 Mar 7, 2024
@pmatilai
Copy link
Member Author

pmatilai commented Mar 7, 2024

Okay, this is not going to make it to 4.20. But, we need to start looking into this soon - shortly after 4.20 is branched to not be in this same situation one year from now.

@pmatilai
Copy link
Member Author

pmatilai commented Mar 8, 2024

For one thing, we need to have a spec for the new format first, and only then implement it. Using #1038 as a starting point:

  • ELF dependencies are tagged with arch information in parentheses after the main dependency token
  • the arch info is formed as (not necessarily in this order)
    • shorthand name of the architecture
    • for architectures with configurable big/little-endian, the endianess is included
    • other ABI points to consider, somehow generally:
      • soft/hard floating point
      • different address/instruction size (ARM thumb, x32, IIRC some MIPSen etc)

Basically: something like <token>[(<version>)](<archinfo>), note (version) being optional and without the parentheses if there's no version.

We'd like to keep the archinfo as short as possible for space and aesthetics reasons, so it makes sense to optimize the notation for the common case and let the oddballs suffer. Such as:

  • almost everything is little-endian so maybe only encode the endianess in the name for big-endian
  • soft FP is a very rare thing these days, that's the exception that should be encoded if at all - I know it's a thing you need to care about on Arm, at least in the past, but is it something that needs to be in every single dependency, really?
  • have a common notation for the case where address/instruction size differs, like arm thumb and x32, to make us not weep inside (too much)

@Conan-Kudo
Copy link
Member

soft FP is a very rare thing these days, that's the exception that should be encoded if at all - I know it's a thing you need to care about on Arm, at least in the past, but is it something that needs to be in every single dependency, really?

We may need to care about it again with RISC-V.

@pmatilai pmatilai removed their assignment May 22, 2024
@pmatilai pmatilai added the design Complicated design issue label Jun 14, 2024
@pmatilai
Copy link
Member Author

pmatilai commented Feb 14, 2025

Been thinking about this again, and wondering if we couldn't get away with an optional "flags" field consisting of single letters in alphabetical order: <basearch>-<bits>[-<flags>] that's only there for exceptional stuff like configurable endianness or varying floating point system, and which are generally platform specific. So we wont try to encode general characteristics of each platform into this arch classifier, only what's absolutely necessary to differentiate. So you might have something like

b - big endian (and lack of it means little-endian, the common norm these days)
s - software FPU (and lack of it means hardware, as per the norm)
t - thumb for Arm Thumb instruction set
x - extended instruction set - for example x32 would then be x86-32-x

...and since they are arch-specific, you have 25 flags to choose from, double that if you include uppercase and even more if numbers are allowed. Should be enough space for even all the wacko Arm variants 😆 There's no doubt some such flags are technically generic in nature, like endianess. But whether it makes sense to eg reserve lower-case for common and upper-case for arch-specific or some such thing, I kinda doubt it.

@pmatilai
Copy link
Member Author

And to be clear, why I think such flags should be strictly arch-specific: so we don't need to argue about them. Each arch can go as crazy as they like, it's not our headache.

@Conan-Kudo
Copy link
Member

It might be simpler for logic reasons to just encode even "default" bits, so that we're out of the business of deciding norms.

@pmatilai
Copy link
Member Author

Well, sure. I mean, if we declare it's all arch specific. So if the maintainers of an archicture (what a nobble concept, I wish we had them 😄 ) decide it's more important to explicitly spell out eg the endianess than try to save a precious letter, it's fine by me. The point about defaults is that it'd be nuts to encode endianness on x86_64 because that doesn't change

@Conan-Kudo
Copy link
Member

Sure. But x86_64 and s390x are the only two arches in common use I know of that aren't bi-endian.

@pmatilai
Copy link
Member Author

Well, x86_32 is going to hang around for a long time as well. Anyway, the point being, there's no point encoding data that is irrelevant for the platform because it just wastes letters that could be better used for something else. x86_64 has it's share of other extensions (though those are now tracked via arch sublevels).

Common flags would just force us to argue endlessly over whether a flag should be a common across all architectures etc and never getting anything added 😅

@pmatilai
Copy link
Member Author

On a semi-related note: while looking at elfdeps.cc I just realized that bumping the codebase to C++20 gives us std::format() which is going to make a lot of string stuff much much nicer. So thanks again @Conan-Kudo for gently elbowing us on that.

@Conan-Kudo
Copy link
Member

I will take great joy in elfdeps moving to C++20.

@Conan-Kudo
Copy link
Member

This also reminds me, how would we encode information for multi-arch objects (like FatELF or Mach-O binaries)?

@pmatilai
Copy link
Member Author

Ugh, those. I don't even know what they look like, much less what to do about them...

@pmatilai
Copy link
Member Author

Trying to wrap my head around the multi-arch object concept: as for provides, we could just emit all info we come across. As for requires, it gets stranger. I suppose you could stuff 'em into a rich dependency like (A or B or C) for each but symbol but that doesn't seem right either - you'd want to treat the dependencies of an arch as alternative sets instead.

@pmatilai
Copy link
Member Author

Yet more musings on this: starting to think maybe we should just slap the endianness to the end of the bitness field, eg
"64l" or "32b" regardless of the arch, because this is a fundamental piece of info in every ELF object. That'd be one reason to have that flags-field anywhere so the exceptions would really stand out, and it'd be consistent everywhere. It's not like there'd be an insurmountable amount of space wasted by that.

pmatilai added a commit to pmatilai/rpm that referenced this issue Feb 18, 2025
Traditional rpm dependencies have only supported biarch style
installations between 64bit and "something else", traditionally of
course 32bit. This has been denoted with (64bit) markers at the end of
ELF dependency tokens for some 64bit architectures and for others not,
and lets a dependency token of a completely different architecture
satisfy that of another. This is inconsistent and limiting at best.

Add a new multiarch mode for elfdeps where each ELF dependency token
carries an ABI identifier consisting of the base architecture, it's
bitness and endiananess, optionally followed by arch-specific flags.
v4 packages continue to use the traditional biarch dependencies, for
v6 packages we use the new multiarch style. It's possible to generate
both for transition-period compatibility though.

Inspired by and loosely based on an earlier patch by Neal Gompa.

Fixes: rpm-software-management#2197
@pmatilai pmatilai linked a pull request Feb 18, 2025 that will close this issue
@pmatilai
Copy link
Member Author

Needed a breather from all the signature work, so put together a draft based on the above musings: #3578

@pmatilai
Copy link
Member Author

pmatilai commented Feb 19, 2025

Collecting musings from the draft PR for further processing:

It seems to me %{_isa} should match the multiarch marker we add. The current basearch-bits ISA info clearly is closely related, and in its current form insufficient to express what it needs in a world of real multiarch. One thing currently missing in %{_isa} is the endianess, and that links us back to this older ticket requesting a macro for determining the endianess: #365. The other part is the flags which seems trickier because of how it's all generated from installplatform based on some arch names - there should be some kind of clear mapping of these things that is not a case-statement in a shell-script 🤪

It also seems to me the multiarch marker(s) for a single package should actually be collected in a tag of their own. That it gets embedded in the dependencies is good for distinguishing between those, but hard to use for other purposes, and clearly this all crosses paths with the whole notion of how rpm looks at architectures.

Another issue brought up by @voxik is that this seems to make noarch -> buildarch dependencies even more painful (#3579)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Complicated design issue fileformat Matters concerning package (file) format generator Dependency generation related RFE transaction
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

2 participants