Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update yEnc module #114

Merged
merged 4 commits into from
May 10, 2024
Merged

Update yEnc module #114

merged 4 commits into from
May 10, 2024

Conversation

animetosho
Copy link
Contributor

This brings the yEnc code up to date with rapidyenc v1.1.1, and removes almost all sabctools' specific modifications (should allow easier updates in future).

Key changes

  • add AVX10 and experimental RISC-V Vector/crypto support
  • add faster baseline CRC32 implementation (though crcutil still seems to be faster on older x86 processors)
  • add platform optimised CRC manipulation routines, replacing such functionality from crcutil
    • this allows rapidyenc to optionally be built without crcutil
  • avoid a performance pitfall on Zen4 processors
  • officially support decoding where destination/source buffer are the same
    • fixes a rare bug with this setup, which could cause the end to be mis-detected
  • de-dupe some code/lookup tables for a slight reduction in binary size

Possible future changes

  • I don't think the UNSUPPORTED_PLATFORM_ARM check is relevant any more, as the code itself checks for the header, but it's left in as it might need some testing (this is the only remaining sabctools specific modification to rapidyenc)
  • this PR redirects sabctools to use the new CRC functions, although I've kept crcutil present. If desired, crcutil can be disabled for a smaller build
  • rapidyenc can now be built without the encoder, though sabctools does use it (only for testing I believe)
  • there are CRC32 tests where the CRC32 exceeds 32 bits. I'm not sure of the purpose of these tests (they don't appear to be related to Incorrect CRC32 failure #96), but I've left them in, except for a problematic one

@sanderjo
Copy link
Collaborator

sanderjo commented May 8, 2024

experimental RISC-V Vector/crypto support

Wow! That is based on RV1.0? How did you test? On a virtual RISC-V RV1 machine?

If so, time for me to buy a newer RISC-V SBC; my current one support RV0.7.x

@animetosho
Copy link
Contributor Author

animetosho commented May 8, 2024

Wow! That is based on RV1.0? How did you test? On a virtual RISC-V RV1 machine?

Yes. GCC 14 might be able to compile it for 0.7.1, but I don't know if it works as I'm only really supporting 1.0.
Actually, the feature detection doesn't seem to offer the ability to query the version, so I wonder how it'd work on a draft-spec CPU.

I don't have any hardware, so I tested it via qemu*. No clue what actual performance is like (and even if I had hardware, RVV is, let's just say, rather liberal in what hardware is allowed to do, which makes it a little tricky to predict performance in a general sense).

* I've only tested the yEnc code itself, not sabctools


The build failures seem to be failing at upgrading pip, which shouldn't be affected by this PR. Not sure what the correct fix is.

@sanderjo
Copy link
Collaborator

sanderjo commented May 8, 2024

I don't have any hardware

Well ... because you made the RV1.0 code ... I ordered RV1.0 hardware: "Banana Pi BPI-F3 SpacemiT K1 8 Core RISC-V chip". To arrive at end of May.

@Safihre
Copy link
Member

Safihre commented May 8, 2024

The code changes are beyond my understanding, but do I understand correctly that crcutil is never used anymore? Even not as a fallback? Only if specifically enabled by compiler flag? Then indeed we can just remove it fully.

The crc32 exceeding 32bit, wasn't that for either the 32bit arm platforms or that exotic 32bit on 64bit platform that Linux supports?

The unsupported arm flag was for an old Synology chip, but I think we dropped that one already because unrar doesn't support old arm anymore. Will have to check that.

@animetosho
Copy link
Contributor Author

animetosho commented May 8, 2024

but do I understand correctly that crcutil is never used anymore? Even not as a fallback? Only if specifically enabled by compiler flag?

It's still enabled by default, but only used on x86 CPUs which don't support PCLMULQDQ (which are quite old at this stage). There, the crcutil code is slightly faster than the fallback CRC code I have (e.g. 1150.71 vs 1325.75 MB/s on a Xeon L3426).
On the stripped rapidyenc shared Linux binary, removing crcutil saves around 60KB (and maybe makes builds a little faster).

The crc32 exceeding 32bit, wasn't that for either the 32bit arm platforms or that exotic 32bit on 64bit platform that Linux supports?

That doesn't quite sound right, as they should have no issue with sticking to 32-bit.
My guess is that the crcutil interface took in 64-bit integers (since it was a generic CRC implementation, instead of being specific to CRC32), so it just got parsed as such. And because of that, the limits got tested that way.

@mnightingale
Copy link
Contributor

there are CRC32 tests where the CRC32 exceeds 32 bits. I'm not sure of the purpose of these tests (they don't appear to be related to Incorrect CRC32 failure #96), but I've left them in, except for a problematic one

There is a bit of discussion around that in #80 (comment) looks like an extreme case since Python would let you pass huge values and I think crcutil wouldn't handle them as expected. I believe we resolved to just disregard >32 bits.


Test failures are because macos-latest runner is an M1 (macos-14-arm64) then I think it tries to run x64 Python.
It should just need updating to use a x64 runner, and could have seperate arm64 added.

@Safihre
Copy link
Member

Safihre commented May 9, 2024

It's still enabled by default, but only used on x86 CPUs which don't support PCLMULQDQ (which are quite old at this stage). There, the crcutil code is slightly faster than the fallback CRC code I have (e.g. 1150.71 vs 1325.75 MB/s on a Xeon L3426).

Seems like such a small benefit that we could just drop crcutil?
Or you still use it in rapidyenc also? Then we can keep it, just to keep things as similar as possible for easy future updates.

@animetosho
Copy link
Contributor Author

There is a bit of discussion around that in #80 (comment) looks like an extreme case since Python would let you pass huge values and I think crcutil wouldn't handle them as expected. I believe we resolved to just disregard >32 bits.

I think that was for the length as opposed to the CRC32 itself?
I did add tests to check the extremities of 64-bit lengths, since the new code does handle the full range correctly.

I suppose it doesn't matter much, but perhaps the tests involving >32-bit CRC32s can be removed, as they're not really a valid scenario.

Test failures are because macos-latest runner is an M1 (macos-14-arm64)

Ooh, that's interesting. I've never been able to test anything on ARM Mac, so that could be useful if Github supports it now.

Or you still use it in rapidyenc also? Then we can keep it, just to keep things as similar as possible for easy future updates.

I've left it enabled by default because it is technically faster and I don't really mind the size cost, but I support a build flag to disable it.
If the size cost isn't a concern for you, it probably makes sense to keep it for now.

@Safihre
Copy link
Member

Safihre commented May 10, 2024

I fixed the macOS runs on the main branch by simply not forcing the architecture, only for the 32bit Windows build.

Thanks a lot for the update :)

@Safihre Safihre merged commit 04386bd into sabnzbd:master May 10, 2024
17 of 20 checks passed
@animetosho animetosho deleted the update_rapidyenc branch May 18, 2024 12:44
animetosho added a commit to animetosho/sabctools that referenced this pull request May 18, 2024
animetosho added a commit to animetosho/sabctools that referenced this pull request May 18, 2024
@animetosho animetosho mentioned this pull request May 18, 2024
Safihre pushed a commit that referenced this pull request May 18, 2024
* Fix RISC-V compile flags not being included
Fixes #116

* Fix ARM PMULL clearing declared CRC ISA

* Remove invalid CRC32 tests
See last point from #114

* Remove UNSUPPORTED_PLATFORM_ARM
Ref #114
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants