SCHEDULING JOBS

OVERVIEW OF THE SCHEDULER:

The SRMS scheduler is charged with the responsibility of scheduling real-time jobs according to the SRMS algorithm. Structurally, the scheduler runs as a separate thread in the process space of the SRMS service.  To insure that it preempts all task threads, it runs at the second-to-highest priority level in the NT system (See the graphic below).   Only the event timer thread is higher, since it produces the interrupts which wake the scheduler thread.  The SRMS scheduler begins when the user selects "Start System" from the File menu on the service's GUI and terminates when the user selects "Stop System" or all tasks unregister.   The way it works is conceptually simple:  the SRMS service maintains a list called task_list which is a doubly-linked list of task_struct entries representing the active task set. Each time the SRMS scheduler thread is run, it traverses this list in search for the next highest priority job with work to do according to the algorithm. Once it finds that job, it moves that task thread to a ready state, sets a timer to fire at the next time that the scheduler should interrupt the system, and then waits on that timer.  The selected real-time task runs until the timer fires, and at that point, the SRMS scheduler resumes and the procedure repeats itself.

EXPLANATION OF schedule_next_job():

Each time that the event timer fires, the timer's callback routine sends a WM_TimerFired message to the SRMS scheduler's thread.  The arrival of this message wakes the scheduler thread from its wait state and immediately calls schedule_next_job(), which is the function that determines which job gets to run. Essentially, that function is the SRMS scheduler. Here is the function's prototype and an explanation of its parameters:

VOID schedule_next_job(HWND hWnd, UINT uMsg, UINT timerEvent, DWORD curTime)

The first parameter, hWnd, is the handle to the SRMS interface window. Any Win32 application with a window has a window handle which was returned by Win32's CreateWindow() function when the window was created. Any specific messages that need to be passed to the scheduler get passed via the uMsg parameter.  The value of uMsg will almost always ignored except when the SRMS system initially starts because the start_system() function uses the SRMS_SCHEDULE_FIRST_JOB flag as its value.  The timerEvent parameter always has the identifier of the timer that fired, either PERIODIC_TIMER_ID or POLICING_TIMER_ID, which are both explained later with the SetNextSchedulerEvent() function. That value could also be zero, which means that a job finished early before a timer could fire.  Finally, the value of curTime is the current system time in milliseconds which is returned by Win32's timeGetTime() function.

SCHEDULER'S RECORD KEEPING RESPONSIBILITIES:

At all times, the scheduler keeps track of which task is currently scheduled to run. The task pointer curTask will either point to the currently scheduled task's entry in the task_list or be equal to NULL if there is no task currently scheduled.  Initially, curTask is NULL when the system starts, but its value is later changed by the scheduler.

Upon entering the schedule_next_job() function, the scheduler checks if curTask is NULL or not. If the value is NULL, then it skips to find the next task with work to do. Otherwise, the scheduler needs to suspend the currently scheduled task and record the execution history of its job. Assuming that there is a currently scheduled job when the scheduler wakes, the scheduler immediately calls Win32's SuspendThread() function to move the task thread from the NT kernel's ready state to a wait state. This move will ensure that the task thread uses no more resources than it is allowed.

Next, the scheduler needs to record the job's execution time and perform some record keeping. Job execution times are calculated with the help of Win32's QueryPerformanceCounter() function. This function returns the value from NT's high-resolution performance counter, if one exists in the hardware.  Since all Pentium-class machines support a high-resolution counter, the SRMS service assumes that one exists and does not check otherwise.  So, immediately after suspending the task thread, schedule_next_job() queries the high-resolution counter and records the job's end time in the liEndJobTickCount filed of the task_struct pointed by curTask.   By subtracting off  the job's start time that was recorded just before it was scheduled, the function determines the total amount of time that the job held the processor before it was interrupted by the scheduler's timer.  This value is then stored in dJobRunTime

[In earlier test versions of the SRMS service, the schedule_next_job() function recorded a job's execution time by using Win32's GetThreadTimes() function, which determines the amount of time that a thread spends in NT kernel and user space.   Although this function sounds ideal in this situation, it was found that NT uses its lower-resolution counter to record those times, and therefore made the timing inaccurate.  After implementing both methods and comparing their results, it was determined that QueryPerformanceCounter() is more accurate in determining execution times.  Refer to the section on Timing and Event Scheduling to see more information about Window's timing inaccuracies.]

Now that the SRMS scheduler knows for how long the job executed, it needs to subtract that value from the job's remaining execution time to determine for how much longer the job needs to run.  By executing this line:

curTask->exec_left -= (LONG)ceil(dJobRunTime) 

the scheduler rounds the job run time up to the nearest millisecond and subtracts from the task's execution time remaining.  [Since NT does not support smaller than millisecond event scheduling, the fractional portion of dJobRunTime is handled by rounding it up to the nearest millisecond.]  Next it records the job's end time, which is the difference between the current millisecond count returned by timeGetTime() and the system start time, which is the time when the SRMS scheduler first started.   Also, it permanently stores the job's run time (dJobRunTime) in the dJobElapsed field of the task's execution history array.

Once the record keeping portion of the scheduler finishes, it sets curTask to NULL, indicating that there is currently no task being scheduled. It then traverses the task set to see if any jobs have ended, and if so, it calls the record_job_results() function to record more general statistics. record_job_results() checks to see if the job completed during its period and, depending on that, it records met and missed deadline counts, used and wasted time, and delivered QoS. It also calls admit_next_job()to try and admit the next job for a task.

DETERMINING THE NEXT JOB TO SCHEDULE:

Next, the scheduler needs to find another job to schedule. To do this, it traverses the task_list calling need_execution() on each task to determine if a task still needs more CPU time to complete its job. If a task's ready time has passed, and its remaining execution time is greater than zero, and the job has not yet signaled the scheduler that it is finished, then need_execution() returns a positive value. The scheduler traverses the task_list until it finds a high priority task that needs execution. If none can be found and the doDualPriority boolean is set to true, then the scheduler searches the list for any task (high or low priority) with work to do. If still no job can be found, then the scheduler calls SetNextSchedulerEvent() to set a timer for the next scheduler interruption and then returns. Otherwise, the scheduler assigns curTask to point to the task that has work to finish. The scheduler then records the job's start time with QueryPerformanceCounter() and also records the job's system start time, remaining execution time, and budget in the execHistory array. When done, SetNextSchedulerEvent() is called to set a timer for the next time the scheduler is supposed to interrupt the processor. The scheduler then moves the selected task thread from its wait state to the ready state using Win32's ResumeThread() function. Finally, the task's hSchedEvent handle is signaled to wake the task up and let it resume execution. When finished, the scheduler function returns to and waits for the timer to fire.

SETTING THE NEXT SCHEDULER INTERRUPT EVENT:

The SRMS scheduler calls the SetNextSchedulerEvent() function to set an event timer to fire at the next time that the scheduler should interrupt the system.   It takes as a parameter a pointer to the next job to be scheduled.  It compares that task's expected execution time with each task's deadline and sets the timer to fire at the earliest of all those times.  If the timer is set to fire at the end of the next job, then the timer becomes a policing timer.  On the other hand, if the timer fires on a periodic boundary, it is termed a periodic timer.  Policing timers are employed to ensure that tasks do not attempt to acquire more processor time than they are given.  Tasks may not accurately compute their execution times, so a policing mechanism is necessary.  Periodic timers are timers that must fire at the end of each task's period.  To keep track of which timer is currently active, the timerType value defined in rt.c is set to either PERIODIC_TIMER_ID or POLICING_TIMER_ID.   When the SRMS scheduler gets woken up out of its wait state as a result of the timer firing, it normally passes the value of timerType to schedule_next_job() so that the scheduling function knows which timer just fired.

< Back to the SRMS Service Home Page >