Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wqueue remove enter_critical_section #14623

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hujun260
Copy link
Contributor

@hujun260 hujun260 commented Nov 4, 2024

Summary

we use small lock to replace enter_critical_section to avoid busywait

reason:
We decouple semcount from business logic
by using an independent counting variable,
which allows us to remove critical sections in many cases.

Impact

work queue

Testing

Build Host:

OS: Ubuntu 20.04
CPU: x86_64
Compiler: GCC 9.4.0
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic
-machine virt,virtualization=on,gic-version=3
-net none -chardev stdio,id=con,mux=on -serial chardev:con
-mon chardev=con,mode=readline -kernel ./nuttx

@github-actions github-actions bot added Arch: arm Issues related to ARM (32-bit) architecture Arch: arm64 Issues related to ARM64 (64-bit) architecture Arch: avr Issues related to all AVR(8-bit or 32-bit) architectures Arch: ceva Issues related to CEVA architecture Arch: hc Issues related to HC architecture Arch: mips Issues related to the MIPS architecture Arch: openrisc Issues related to the OpenRISC architecture Arch: renesas Issues related to the Renesas chips Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: simulator Issues related to the SIMulator Arch: sparc Issues related to the SPARC architecture Arch: tricore Issues related to the TriCore architecture from Infineon Arch: x86 Issues related to the x86 architecture Arch: x86_64 Issues related to the x86_64 architecture Arch: xtensa Issues related to the Xtensa architecture Arch: z16 Issues related to the Z16 architecture Arch: z80 Issues related to the Z80 architecture Area: File System File System issues Area: OS Components OS Components issues Size: L The size of the change in this PR is large labels Nov 4, 2024
@hujun260 hujun260 changed the title Apache 8 wqueue remove enter_critical_section Nov 4, 2024
@github-actions github-actions bot removed Arch: arm Issues related to ARM (32-bit) architecture Arch: arm64 Issues related to ARM64 (64-bit) architecture Arch: avr Issues related to all AVR(8-bit or 32-bit) architectures Arch: ceva Issues related to CEVA architecture Arch: hc Issues related to HC architecture Arch: mips Issues related to the MIPS architecture Arch: openrisc Issues related to the OpenRISC architecture Arch: renesas Issues related to the Renesas chips labels Nov 4, 2024
@github-actions github-actions bot added Size: M The size of the change in this PR is medium and removed Arch: tricore Issues related to the TriCore architecture from Infineon Arch: x86 Issues related to the x86 architecture Arch: x86_64 Issues related to the x86_64 architecture Arch: xtensa Issues related to the Xtensa architecture Arch: z16 Issues related to the Z16 architecture Arch: z80 Issues related to the Z80 architecture Area: File System File System issues Size: L The size of the change in this PR is large labels Nov 4, 2024
@nuttxpr
Copy link

nuttxpr commented Nov 4, 2024

[Experimental Bot, please feedback here]

No, this PR summary does not adequately meet the NuttX requirements. Here's why and how to fix it:

Missing Information in Summary:

  • What functional part of the code is being changed? While it mentions "work queue," this is too vague. Specify the exact files, functions, or modules affected (e.g., sched/workqueue.c, work_queue(), etc.).
  • How does the change exactly work (what will change and how)? Saying "use small lock to replace enter_critical_section to avoid busywait" is a good start, but needs more detail. What kind of "small lock"? A mutex? A spinlock? Explain the mechanism and why it avoids busywaiting. Provide a before/after code snippet if possible.
  • Related Issues: Are there any related NuttX or NuttX Apps issues this PR addresses? If so, provide links.

Missing Information in Impact:

  • All Impact Sections are Incomplete: Simply saying "work queue" isn't enough. Answer ALL the questions with "YES" or "NO" and then provide details for any "YES" answers. For example:
    • Is new feature added? NO (presumably)
    • Is existing feature changed? YES (explain how the work queue behavior changes)
    • Impact on user: NO/YES (will user code using work queues need to change?)
    • Impact on build: NO/YES (are there any new configuration options or dependencies?)
    • Impact on hardware: NO/YES (does this affect specific architectures or boards?)
    • Impact on documentation: YES (at least the code comments should be updated; possibly also the official documentation) Explain what documentation changes are needed.
    • Impact on security: NO/YES (does changing the locking mechanism have any security implications? Consider race conditions, deadlocks, etc.)
    • Impact on compatibility: NO/YES (does this break compatibility with any existing drivers or applications?)

Missing Information in Testing:

  • Target(s): While you mention qemu-armv8a:nsh_smp, be more specific. Which QEMU version? Which board configuration within qemu-armv8a?
  • Testing logs before change: Provide actual logs demonstrating the problem this PR fixes. For example, show the busywaiting behavior.
  • Testing logs after change: Provide logs demonstrating the improved behavior after applying this PR. Show that the busywaiting is eliminated or reduced. Quantify the improvement if possible (e.g., "reduced CPU usage by X%").

Example of a better Summary:

This PR replaces the use of enter_critical_section() with a lightweight mutex in the work queue implementation (specifically in sched/workqueue.c within functions work_queue() and work_cancel()). Currently, the work queue uses critical sections, which can lead to busywaiting when multiple threads contend for the work queue. This change introduces a mutex to protect access to the work queue's internal state, allowing threads to block efficiently instead of busywaiting. This improves performance, especially in multi-core systems. This PR addresses NuttX Issue #1234.

Example of a better Impact section (hypothetical):

  • Is new feature added? NO
  • Is existing feature changed? YES, the work queue's internal locking mechanism is changed, improving performance under contention.
  • Impact on user: NO, existing user code using the work queue API should not require modification.
  • Impact on build: NO
  • Impact on hardware: NO
  • Impact on documentation: YES, the code comments in sched/workqueue.c will be updated to reflect the new locking mechanism.
  • Impact on security: NO, the new locking mechanism is designed to prevent race conditions and deadlocks.
  • Impact on compatibility: YES, This change might slightly affect timing in some edge cases where extremely precise timing with work queues was relied upon. However, this is expected to be negligible in most scenarios.

By providing complete and detailed information in your PR, you make it easier for reviewers to understand your changes and ensure they meet the project's standards. This will lead to a faster and smoother review process.

@hujun260 hujun260 force-pushed the apache_8 branch 3 times, most recently from e2e9525 to d7d8355 Compare November 4, 2024 08:04
@hujun260 hujun260 marked this pull request as draft November 4, 2024 09:27
@acassis
Copy link
Contributor

acassis commented Nov 5, 2024

Note that some places said work_cancel() only returns success, these comments need to be fixed. i.e.:

      ret = work_cancel(LPWORK, &priv->cbwork);
      if (ret < 0)
        {
          /* NOTE: Currently, work_cancel only returns success */
  
          lcderr("ERROR: Failed to cancel work: %d\n", ret);
        }

@github-actions github-actions bot added Arch: arm Issues related to ARM (32-bit) architecture Area: File System File System issues labels Nov 6, 2024
@hujun260
Copy link
Contributor Author

hujun260 commented Nov 6, 2024

This PR requires #14578 to be merged first.

@hujun260 hujun260 marked this pull request as ready for review November 6, 2024 13:31
sched/wqueue/wqueue.h Outdated Show resolved Hide resolved
sched/wqueue/wqueue.h Outdated Show resolved Hide resolved
sched/wqueue/kwork_notifier.c Outdated Show resolved Hide resolved

/* Remove the notification from the pending list */

dq_rem(&notifier->entry, &g_notifier_pending);
notifier = work_notifier_find(notifier->key);
if (notifier != NULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need skip line 167 if notifier is NULL

leave_critical_section(flags);
/* We have being canceled */

if (work->worker != NULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, there was no need for additional checks mainly because work_timer_expiry was within a critical section, and the cancellation also took place within the same critical section. If it was canceled, the callback for work_timer_expiry would not be triggered. However, now that we have switched to using smaller locks and the cancellation point is outside the critical section, we need to add a check.

@xiaoxiang781216
Copy link
Contributor

@hujun260 many comments aren't addressed in the new reversion yet.

@hujun260 hujun260 force-pushed the apache_8 branch 2 times, most recently from c487a1d to 45c5016 Compare January 12, 2025 05:50
reason:
We decouple semcount from business logic
by using an independent counting variable,
which allows us to remove critical sections in many cases.

Signed-off-by: hujun5 <hujun5@xiaomi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: arm Issues related to ARM (32-bit) architecture Area: File System File System issues Area: OS Components OS Components issues Size: M The size of the change in this PR is medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants