MCS: Fix passive server core migration during ReplyRecv #986

JE-Archer · 2023-03-09T12:10:15Z

Fixes an issue where passive servers are not scheduled after migrating to another core during ReplyRecv. If during the handling of the receive part of ReplyRecv, the current thread migrates to a remote core, it will be inserted into the scheduling queues of the remote core, but no reschedule IPI will be sent to the remote core. In the worst case, if the remote core is running the idle thread, the passive server may never be scheduled. This commit fixes this by sending a reschedule IPI after the current thread is enqueued, if necessary.

Note that while handling the receive part of the syscall, while the current thread will receive the caller's scheduling context and have its tcbAffinity set to the remote core, it will not be enqueued on the remote core. It is only during the call to schedule() after handling the syscall that the current thread, the affinity of which is now the remote core, will be enqueued on the remote core. However, this will only insert the TCB into the queue, it will not send a reschedule IPI to the remote core. If the remote core is running the idle thread, it will continue running the idle thread even if the kernel is entered via an interrupt, and even though the passive server's TCB is in the core's ready queues, because its scheduler action will not change from ResumeCurrentThread (the idle thread). This behaviour can be reliably reproduced with seL4test test IPC0028 (which was disabled for me, so double-check that it actually runs).

There are multiple ways that this issue could be fixed. For example, the enqueue+IPI could be done during the handling of the syscall. However this would probably necessitate changing the scheduling code anyway, as it will still call SCHED_ENQUEUE_CURRENT_TCB. For now I have chosen the simple fix of adding an IPI where the enqueue is done.

See also #941 for more discussion.

Test with: seL4/sel4test#89

gernotheiser · 2023-03-10T02:33:56Z

Can the kernel check what prio is running on the destination core? It would be good to only interrupt if the migrating thread should actually preempt what's currently running

JE-Archer · 2023-03-10T02:38:04Z

Can the kernel check what prio is running on the destination core? It would be good to only interrupt if the migrating thread should actually preempt what's currently running

The function which is responsible for triggering the IPI, remoteQueueUpdate, already checks for this. This fix just makes sure remoteQueueUpdate is called after enqueuing.

void remoteQueueUpdate(tcb_t *tcb)
{
    /* only ipi if the target is for the current domain */
    if (tcb->tcbAffinity != getCurrentCPUIndex() && tcb->tcbDomain == ksCurDomain) {
        tcb_t *targetCurThread = NODE_STATE_ON_CORE(ksCurThread, tcb->tcbAffinity);

        /* reschedule if the target core is idle or we are waking a higher priority thread (or
         * if a new irq would need to be set on MCS) */
        if (targetCurThread == NODE_STATE_ON_CORE(ksIdleThread, tcb->tcbAffinity)  ||
            tcb->tcbPriority > targetCurThread->tcbPriority
#ifdef CONFIG_KERNEL_MCS
            || NODE_STATE_ON_CORE(ksReprogram, tcb->tcbAffinity)
#endif
           ) {
            ARCH_NODE_STATE(ipiReschedulePending) |= BIT(tcb->tcbAffinity);
        }
    }
}

Indanz · 2023-03-10T03:42:42Z

The problem I have with this solution is that conceptually it doesn't make sense: ksCurThread should always be running on the current core. This fix relies on SCHED_ENQUEUE_CURRENT_TCB/SCHED_APPEND_CURRENT_TCB being called before ksCurThread gets updated. It introduces subtle ordering assumptions which can be easily broken in the future when innocuous unrelated changes are made.

I am also not convinced it solves all corner cases that trigger this bug. I don't have access to MEP's hardware any more where this could be easily triggered, so can't test if this solves the problem I originally ran into, but I doubt it.

I think we have to make a table of all permutations of when tasks can be woken up cross-core and make sure there are tests for all cases. Any fix we come up with can then be properly tested.

JE-Archer · 2023-03-10T04:06:21Z

The problem I have with this solution is that conceptually it doesn't make sense: ksCurThread should always be running on the current core. This fix relies on SCHED_ENQUEUE_CURRENT_TCB/SCHED_APPEND_CURRENT_TCB being called before ksCurThread gets updated. It introduces subtle ordering assumptions which can be easily broken in the future when innocuous unrelated changes are made.

I agree that it is weird conceptually that a thread on a remote core is enqueued on the remote core during a call to schedule() on the current core, and especially weird that it is done through a call to SCHED_ENQUEUE_CURRENT_TCB, where the "current TCB" is understood to mean the TCB running on the current core. However that is the way the code currently works with SC donation across cores during ReplyRecv.

Maybe we could enqueue the current thread on the other core during handleRecv and then set NODE_STATE(ksCurThread) to NULL, and adapt schedule() to handle NULL ksCurThread, or something along those lines?

I am also not convinced it solves all corner cases that trigger this bug. I don't have access to MEP's hardware any more where this could be easily triggered, so can't test if this solves the problem I originally ran into, but I doubt it.

If you can describe the scenarios I can try to reproduce them. What sort of hardware are we talking about?

Indanz · 2023-03-10T04:27:37Z

I agree that it is weird conceptually that a thread on a remote core is enqueued on the remote core during a call to schedule() on the current core, and especially weird that it is done through a call to SCHED_ENQUEUE_CURRENT_TCB, where the "current TCB" is understood to mean the TCB running on the current core. However that is the way the code currently works with SC donation across cores during ReplyRecv.

If that's indeed what happens now, then that it works at all seems pure coincidence.

Maybe we could enqueue the current thread on the other core during handleRecv and then set NODE_STATE(ksCurThread) to NULL, and adapt schedule() to handle NULL ksCurThread, or something along those lines?

I believe that schedule should only be concerned with scheduling on the current core, and should not be involved with cross-core scheduling at all, neither directly or indirectly.

We should not forget to update the fastpath code too, including the signal one that was added recently and is default off.

If you can describe the scenarios I can try to reproduce them. What sort of hardware are we talking about?

See #941 for the scenario. The hardware is the tqma board, which has an i.MX8 Soc, but I meant more the software stack than the actual hardware itself. With MEP's software it was easy to do load testing with many tasks and different scenario's. sel4test is good for testing specific scenario's, but not good for SMP load testing where lots of things happen concurrently to shake out any concurrency bugs.

JE-Archer · 2023-03-21T03:58:14Z

I believe that schedule should only be concerned with scheduling on the current core, and should not be involved with cross-core scheduling at all, neither directly or indirectly.

I have pushed a new draft fix that moves the cross-core enqueue to receiveIPC and adjusts schedule() to deal with a current TCB that has migrated to a remote core. But there are many ways to change receiveIPC to achieve the same purpose, so I would welcome input on which way is best. I am also unsure of the fastfail logic in schedule(). There is also the question of whether to leave ksCurThread set to the migrated thread and check its affinity, or to set it to NULL or something else, with some invariant like NODE_STATE(ksCurThread) == NULL || NODE_STATE(ksCurThread)->tcbAffinity == getCurrentCPUIndex().

This commit also only fixes ReplyRecv when receiving an SC from a caller on an endpoint; it does not fix receiving from a bound notification, as the former is what is tested by IPC0028. I will submit a pull request with a test for the latter case to seL4test and then update this PR with a fix.

Fixes an issue where passive servers are not scheduled after receiving the scheduling context of a calling thread on a remote core during ReplyRecv. If during the handling of the receive part of ReplyRecv, the current thread migrates to a remote core by borrowing the calling thread's scheduling context, it will be inserted into the scheduling queues of the remote core, but no reschedule IPI will be sent to the remote core. In the worst case, if the remote core is running the idle thread, the passive server may never be scheduled. Additionally, the existing code only incidentally enqueues the passive server's TCB on the remote core during a call to schedule() on the current core, via SCHED_ENQUEUE_CURRENT_TCB. However, SCHED_ENQUEUE_CURRENT_TCB assumes that the current TCB is running on the current CPU core. This commit fixes this case of TCB migration by calling possibleSwitchTo during receiveIPC. It also adds an assertion to SCHED_ENQUEUE_CURRENT_TCB and adapts the schedule() function to ignore the current TCB if the current TCB has changed CPU core during the kernel entry. Signed-off-by: James Archer <j.archer@unsw.edu.au>

lsf37 · 2023-05-05T06:52:46Z

with some invariant like NODE_STATE(ksCurThread) == NULL || NODE_STATE(ksCurThread)->tcbAffinity == getCurrentCPUIndex().

If at all possible ksCurThread should never be NULL and always point to a valid TCB. Everything else would seriously mess with a lot of invariants we currently have about the current thread.

JE-Archer force-pushed the fix-passive branch from 1cff4ad to 978a6d1 Compare March 9, 2023 12:17

lsf37 added MCS issues about the mixed-criticality system config SMP Issues related to muticore functionality hw-test sel4test hardware builds + runs for this PR labels Mar 10, 2023

kent-mcleod mentioned this pull request Mar 18, 2023

MCS+SMP: Missing remoteQueueUpdate somewhere? #941

Open

lsf37 mentioned this pull request Mar 20, 2023

use config_set only for boolean config values seL4/sel4test#89

Open

JE-Archer force-pushed the fix-passive branch from 978a6d1 to c0a8ffb Compare March 21, 2023 03:26

JE-Archer force-pushed the fix-passive branch from c0a8ffb to d8b038a Compare March 21, 2023 04:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCS: Fix passive server core migration during ReplyRecv #986

MCS: Fix passive server core migration during ReplyRecv #986

JE-Archer commented Mar 9, 2023 •

edited by lsf37

gernotheiser commented Mar 10, 2023

JE-Archer commented Mar 10, 2023

Indanz commented Mar 10, 2023

JE-Archer commented Mar 10, 2023

Indanz commented Mar 10, 2023

JE-Archer commented Mar 21, 2023

lsf37 commented May 5, 2023

MCS: Fix passive server core migration during ReplyRecv #986

Are you sure you want to change the base?

MCS: Fix passive server core migration during ReplyRecv #986

Conversation

JE-Archer commented Mar 9, 2023 • edited by lsf37

gernotheiser commented Mar 10, 2023

JE-Archer commented Mar 10, 2023

Indanz commented Mar 10, 2023

JE-Archer commented Mar 10, 2023

Indanz commented Mar 10, 2023

JE-Archer commented Mar 21, 2023

lsf37 commented May 5, 2023

JE-Archer commented Mar 9, 2023 •

edited by lsf37