Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCS: Fix passive server core migration during ReplyRecv #986

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JE-Archer
Copy link
Member

@JE-Archer JE-Archer commented Mar 9, 2023

Fixes an issue where passive servers are not scheduled after migrating to another core during ReplyRecv. If during the handling of the receive part of ReplyRecv, the current thread migrates to a remote core, it will be inserted into the scheduling queues of the remote core, but no reschedule IPI will be sent to the remote core. In the worst case, if the remote core is running the idle thread, the passive server may never be scheduled. This commit fixes this by sending a reschedule IPI after the current thread is enqueued, if necessary.

Note that while handling the receive part of the syscall, while the current thread will receive the caller's scheduling context and have its tcbAffinity set to the remote core, it will not be enqueued on the remote core. It is only during the call to schedule() after handling the syscall that the current thread, the affinity of which is now the remote core, will be enqueued on the remote core. However, this will only insert the TCB into the queue, it will not send a reschedule IPI to the remote core. If the remote core is running the idle thread, it will continue running the idle thread even if the kernel is entered via an interrupt, and even though the passive server's TCB is in the core's ready queues, because its scheduler action will not change from ResumeCurrentThread (the idle thread). This behaviour can be reliably reproduced with seL4test test IPC0028 (which was disabled for me, so double-check that it actually runs).

There are multiple ways that this issue could be fixed. For example, the enqueue+IPI could be done during the handling of the syscall. However this would probably necessitate changing the scheduling code anyway, as it will still call SCHED_ENQUEUE_CURRENT_TCB. For now I have chosen the simple fix of adding an IPI where the enqueue is done.

See also #941 for more discussion.

Test with: seL4/sel4test#89

@gernotheiser
Copy link
Member

Can the kernel check what prio is running on the destination core? It would be good to only interrupt if the migrating thread should actually preempt what's currently running

@JE-Archer
Copy link
Member Author

Can the kernel check what prio is running on the destination core? It would be good to only interrupt if the migrating thread should actually preempt what's currently running

The function which is responsible for triggering the IPI, remoteQueueUpdate, already checks for this. This fix just makes sure remoteQueueUpdate is called after enqueuing.

void remoteQueueUpdate(tcb_t *tcb)
{
    /* only ipi if the target is for the current domain */
    if (tcb->tcbAffinity != getCurrentCPUIndex() && tcb->tcbDomain == ksCurDomain) {
        tcb_t *targetCurThread = NODE_STATE_ON_CORE(ksCurThread, tcb->tcbAffinity);

        /* reschedule if the target core is idle or we are waking a higher priority thread (or
         * if a new irq would need to be set on MCS) */
        if (targetCurThread == NODE_STATE_ON_CORE(ksIdleThread, tcb->tcbAffinity)  ||
            tcb->tcbPriority > targetCurThread->tcbPriority
#ifdef CONFIG_KERNEL_MCS
            || NODE_STATE_ON_CORE(ksReprogram, tcb->tcbAffinity)
#endif
           ) {
            ARCH_NODE_STATE(ipiReschedulePending) |= BIT(tcb->tcbAffinity);
        }
    }
}

@Indanz
Copy link
Contributor

Indanz commented Mar 10, 2023

The problem I have with this solution is that conceptually it doesn't make sense: ksCurThread should always be running on the current core. This fix relies on SCHED_ENQUEUE_CURRENT_TCB/SCHED_APPEND_CURRENT_TCB being called before ksCurThread gets updated. It introduces subtle ordering assumptions which can be easily broken in the future when innocuous unrelated changes are made.

I am also not convinced it solves all corner cases that trigger this bug. I don't have access to MEP's hardware any more where this could be easily triggered, so can't test if this solves the problem I originally ran into, but I doubt it.

I think we have to make a table of all permutations of when tasks can be woken up cross-core and make sure there are tests for all cases. Any fix we come up with can then be properly tested.

@JE-Archer
Copy link
Member Author

The problem I have with this solution is that conceptually it doesn't make sense: ksCurThread should always be running on the current core. This fix relies on SCHED_ENQUEUE_CURRENT_TCB/SCHED_APPEND_CURRENT_TCB being called before ksCurThread gets updated. It introduces subtle ordering assumptions which can be easily broken in the future when innocuous unrelated changes are made.

I agree that it is weird conceptually that a thread on a remote core is enqueued on the remote core during a call to schedule() on the current core, and especially weird that it is done through a call to SCHED_ENQUEUE_CURRENT_TCB, where the "current TCB" is understood to mean the TCB running on the current core. However that is the way the code currently works with SC donation across cores during ReplyRecv.

Maybe we could enqueue the current thread on the other core during handleRecv and then set NODE_STATE(ksCurThread) to NULL, and adapt schedule() to handle NULL ksCurThread, or something along those lines?

I am also not convinced it solves all corner cases that trigger this bug. I don't have access to MEP's hardware any more where this could be easily triggered, so can't test if this solves the problem I originally ran into, but I doubt it.

If you can describe the scenarios I can try to reproduce them. What sort of hardware are we talking about?

@Indanz
Copy link
Contributor

Indanz commented Mar 10, 2023

I agree that it is weird conceptually that a thread on a remote core is enqueued on the remote core during a call to schedule() on the current core, and especially weird that it is done through a call to SCHED_ENQUEUE_CURRENT_TCB, where the "current TCB" is understood to mean the TCB running on the current core. However that is the way the code currently works with SC donation across cores during ReplyRecv.

If that's indeed what happens now, then that it works at all seems pure coincidence.

Maybe we could enqueue the current thread on the other core during handleRecv and then set NODE_STATE(ksCurThread) to NULL, and adapt schedule() to handle NULL ksCurThread, or something along those lines?

I believe that schedule should only be concerned with scheduling on the current core, and should not be involved with cross-core scheduling at all, neither directly or indirectly.

We should not forget to update the fastpath code too, including the signal one that was added recently and is default off.

If you can describe the scenarios I can try to reproduce them. What sort of hardware are we talking about?

See #941 for the scenario. The hardware is the tqma board, which has an i.MX8 Soc, but I meant more the software stack than the actual hardware itself. With MEP's software it was easy to do load testing with many tasks and different scenario's. sel4test is good for testing specific scenario's, but not good for SMP load testing where lots of things happen concurrently to shake out any concurrency bugs.

@lsf37 lsf37 added MCS issues about the mixed-criticality system config SMP Issues related to muticore functionality hw-test sel4test hardware builds + runs for this PR labels Mar 10, 2023
@JE-Archer
Copy link
Member Author

I believe that schedule should only be concerned with scheduling on the current core, and should not be involved with cross-core scheduling at all, neither directly or indirectly.

I have pushed a new draft fix that moves the cross-core enqueue to receiveIPC and adjusts schedule() to deal with a current TCB that has migrated to a remote core. But there are many ways to change receiveIPC to achieve the same purpose, so I would welcome input on which way is best. I am also unsure of the fastfail logic in schedule(). There is also the question of whether to leave ksCurThread set to the migrated thread and check its affinity, or to set it to NULL or something else, with some invariant like NODE_STATE(ksCurThread) == NULL || NODE_STATE(ksCurThread)->tcbAffinity == getCurrentCPUIndex().

This commit also only fixes ReplyRecv when receiving an SC from a caller on an endpoint; it does not fix receiving from a bound notification, as the former is what is tested by IPC0028. I will submit a pull request with a test for the latter case to seL4test and then update this PR with a fix.

Fixes an issue where passive servers are not scheduled after receiving
the scheduling context of a calling thread on a remote core during
ReplyRecv.

If during the handling of the receive part of ReplyRecv, the current
thread migrates to a remote core by borrowing the calling thread's
scheduling context, it will be inserted into the scheduling queues of
the remote core, but no reschedule IPI will be sent to the remote core.
In the worst case, if the remote core is running the idle thread, the
passive server may never be scheduled. Additionally, the existing code
only incidentally enqueues the passive server's TCB on the remote core
during a call to schedule() on the current core, via
SCHED_ENQUEUE_CURRENT_TCB. However, SCHED_ENQUEUE_CURRENT_TCB assumes
that the current TCB is running on the current CPU core.

This commit fixes this case of TCB migration by calling possibleSwitchTo
during receiveIPC. It also adds an assertion to
SCHED_ENQUEUE_CURRENT_TCB and adapts the schedule() function to ignore
the current TCB if the current TCB has changed CPU core during the
kernel entry.

Signed-off-by: James Archer <j.archer@unsw.edu.au>
@lsf37
Copy link
Member

lsf37 commented May 5, 2023

with some invariant like NODE_STATE(ksCurThread) == NULL || NODE_STATE(ksCurThread)->tcbAffinity == getCurrentCPUIndex().

If at all possible ksCurThread should never be NULL and always point to a valid TCB. Everything else would seriously mess with a lot of invariants we currently have about the current thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hw-test sel4test hardware builds + runs for this PR MCS issues about the mixed-criticality system config SMP Issues related to muticore functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants