SCHEDULING JOBS
The SRMS scheduler is charged with the responsibility of scheduling real-time jobs
according to the SRMS algorithm. Structurally, the scheduler runs as a separate thread in
the process space of the SRMS service. To insure that it preempts all task threads,
it runs at the second-to-highest priority level in the NT system (See the graphic below).
Only the event timer thread is higher, since it
produces the interrupts which wake the scheduler thread. The SRMS scheduler begins
when the user selects "Start System" from the File menu on the service's GUI and
terminates when the user selects "Stop System" or all tasks unregister.
The way it works is conceptually simple: the SRMS service maintains a list called task_list
which is a doubly-linked list of task_struct entries representing the active task set.
Each time the SRMS scheduler thread is run, it traverses this list in search for the next
highest priority job with work to do according to the algorithm. Once it finds that job,
it moves that task thread to a ready state, sets a timer to fire at the next time that the
scheduler should interrupt the system, and then waits on that timer. The selected
real-time task runs until the timer fires, and at that point, the SRMS scheduler resumes
and the procedure repeats itself.
Each time that the event timer fires, the timer's callback routine sends a WM_TimerFired
message to the SRMS scheduler's thread. The arrival of this message wakes the
scheduler thread from its wait state and immediately calls schedule_next_job(),
which is the function that determines which job gets to run. Essentially, that function is
the SRMS scheduler. Here is the function's prototype and an explanation of its parameters:
VOID schedule_next_job(HWND hWnd, UINT uMsg, UINT timerEvent, DWORD
curTime)
The first parameter, hWnd, is the handle to the SRMS interface window. Any
Win32 application with a window has a window handle which was returned by Win32's CreateWindow()
function when the window was created. Any specific messages that need to be passed to the
scheduler get passed via the uMsg parameter. The value of uMsg will
almost always ignored except when the SRMS system initially starts because the start_system()
function uses the SRMS_SCHEDULE_FIRST_JOB flag as its value. The timerEvent
parameter always has the identifier of the timer that fired, either PERIODIC_TIMER_ID
or POLICING_TIMER_ID, which are both explained later with the SetNextSchedulerEvent() function.
That value could also be zero, which means that a job finished early before a timer could
fire. Finally, the value of curTime is the current system time in
milliseconds which is returned by Win32's timeGetTime() function.
At all times, the scheduler keeps track of which task is currently scheduled to run.
The task pointer curTask will either point to the currently scheduled task's
entry in the task_list or be equal to NULL if there is no task
currently scheduled. Initially, curTask is NULL when the
system starts, but its value is later changed by the scheduler.
Upon entering the schedule_next_job() function, the scheduler checks if curTask
is NULL or not. If the value is NULL, then it skips to find the
next task with work to do. Otherwise, the scheduler needs to suspend the currently
scheduled task and record the execution history of its job. Assuming that there is a
currently scheduled job when the scheduler wakes, the scheduler immediately calls Win32's SuspendThread()
function to move the task thread from the NT kernel's ready state to a wait state. This
move will ensure that the task thread uses no more resources than it is allowed.
Next, the scheduler needs to record the job's execution time and perform some record
keeping. Job execution times are calculated with the help of Win32's QueryPerformanceCounter()
function. This function returns the value from NT's high-resolution performance counter,
if one exists in the hardware. Since all Pentium-class machines support a
high-resolution counter, the SRMS service assumes that one exists and does not check
otherwise. So, immediately after suspending the task thread, schedule_next_job()
queries the high-resolution counter and records the job's end time in the liEndJobTickCount
filed of the task_struct pointed by curTask. By
subtracting off the job's start time that was recorded just before it was scheduled,
the function determines the total amount of time that the job held the processor before it
was interrupted by the scheduler's timer. This value is then stored in dJobRunTime.
[In earlier test versions of the SRMS service, the schedule_next_job()
function recorded a job's execution time by using Win32's GetThreadTimes()
function, which determines the amount of time that a thread spends in NT kernel and user
space. Although this function sounds ideal in this situation, it was found that NT
uses its lower-resolution counter to record those times, and therefore made the timing
inaccurate. After implementing both methods and comparing their results, it was
determined that QueryPerformanceCounter() is more accurate in determining
execution times. Refer to the section on Timing and
Event Scheduling to see more information about Window's timing inaccuracies.]
Now that the SRMS scheduler knows for how long the job executed, it needs to subtract that value from the job's remaining execution time to determine for how much longer the job needs to run. By executing this line:
curTask->exec_left -= (LONG)ceil(dJobRunTime)
the scheduler rounds the job run time up to the nearest millisecond and subtracts from
the task's execution time remaining. [Since NT does not support smaller than
millisecond event scheduling, the fractional portion of dJobRunTime is
handled by rounding it up to the nearest millisecond.] Next it records the job's end
time, which is the difference between the current millisecond count returned by timeGetTime()
and the system start time, which is the time when the SRMS scheduler first started.
Also, it permanently stores the job's run time (dJobRunTime) in the dJobElapsed
field of the task's execution history array.
Once the record keeping portion of the scheduler finishes, it sets curTask
to NULL, indicating that there is currently no task being scheduled. It then
traverses the task set to see if any jobs have ended, and if so, it calls the record_job_results()
function to record more general statistics. record_job_results() checks to
see if the job completed during its period and, depending on that, it records met and
missed deadline counts, used and wasted time, and delivered QoS. It also calls admit_next_job()to
try and admit the next job for a task.
Next, the scheduler needs to find another job to schedule. To do this, it traverses the
task_list calling need_execution() on each task to determine if
a task still needs more CPU time to complete its job. If a task's ready time has passed,
and its remaining execution time is greater than zero, and the job has not yet signaled
the scheduler that it is finished, then need_execution() returns a positive
value. The scheduler traverses the task_list until it finds a high priority
task that needs execution. If none can be found and the doDualPriority
boolean is set to true, then the scheduler searches the list for any task (high or low
priority) with work to do. If still no job can be found, then the scheduler calls SetNextSchedulerEvent()
to set a timer for the next scheduler interruption and then returns. Otherwise, the
scheduler assigns curTask to point to the task that has work to finish. The
scheduler then records the job's start time with QueryPerformanceCounter()
and also records the job's system start time, remaining execution time, and budget in the execHistory
array. When done, SetNextSchedulerEvent() is called to set a timer for the
next time the scheduler is supposed to interrupt the processor. The scheduler then moves
the selected task thread from its wait state to the ready state using Win32's ResumeThread()
function. Finally, the task's hSchedEvent
handle is signaled to wake the task up and let it resume execution. When finished, the
scheduler function returns to and waits for the timer to fire.
The SRMS scheduler calls the SetNextSchedulerEvent() function to set an
event timer to fire at the next time that the scheduler should interrupt the system.
It takes as a parameter a pointer to the next job to be scheduled. It
compares that task's expected execution time with each task's deadline and sets the timer
to fire at the earliest of all those times. If the timer is set to fire at the end
of the next job, then the timer becomes a policing timer. On the other hand, if the
timer fires on a periodic boundary, it is termed a periodic timer. Policing timers
are employed to ensure that tasks do not attempt to acquire more processor time than they
are given. Tasks may not accurately compute their execution times, so a policing
mechanism is necessary. Periodic timers are timers that must fire at the end of each
task's period. To keep track of which timer is currently active, the timerType
value defined in rt.c is set to either PERIODIC_TIMER_ID or POLICING_TIMER_ID.
When the SRMS scheduler gets woken up out of its wait state as a result of the
timer firing, it normally passes the value of timerType to schedule_next_job()
so that the scheduling function knows which timer just fired.