Project 2 - Scheduling Algorithm Performance


The scheduling of processes is an important part of making our computer efficient, and hence fast. We are going to write a scheduling simulator to measure what differences scheduling can make.

The Simulator

You will write a scheduling simulator that will implement three different scheduling algorithms: round robin, shortest job first, and shortest job remaining. You may use whatever programming language you like. If you need additional programs or libraries installed on mclovin, please email me with the name of the appropriate apt-get package. You must provide a working version on mclovin, though you do not have to do your work there.

Running the program

Your program should accept the following arguments:

<program name> <scheduling algorithm> [optional algorithm parameter] [verbose] <process time file n>*

where:

scheduling algorithm is one of the following:

  • RR: round robin
  • SJF: shortest job first
  • SJR: shortest job remaining

parameter is an parameter present only for round robin that specifies the time quantum a process should run.

verbose is an optional parameter that produces additional trace output as described below.

process time file is the name of a file containing process run information. Note that there may be an unlimited number of these files provided on the command line. Each file will be named in the format process-N.txt, where N is an integer between 1 and 65535. Each file represents the operation of a single process over time. Each file has the format:

start 0
B 120
I 4200
B 100
I 3700
B 110
end

where B is the burst time in microseconds, and I is the time the process is blocked and unable to run, also in microseconds. Some processes may have a start time that is later than zero; they should not be included in your scheduling decisions until after their arrival.

Limitations

When doing SJF or SJR, you need to consider the burst time of the process. The natural thing you will want to do is look at the burst time in the file. You cannot do this. In the real world, we do not get to ask the process what its burst time is - we can only predict what is coming up based on what happened in the past. You must therefore limit your burst time prediction to what you have determined based on past performance. For ties, schedule the lower process number first.

Output

When the verbose option is not enabled, your program needs only print out what process runs in what time interval, in the format:

<process number> <start time> <end time>.

You may assume that scheduling and switching processes takes zero time.

For example, assume we had two processes, 1 and 2, with the same pattern of behavior shown above. For SJF, your output would look like:

1 0 120
2 120 240
Idle 240 4320
1 4320 4420
Idle 4420 4440
2 4440 4540
Idle 4540 8120
1 8120 8230
Idle 8230 8240
2 8240 8350
end

If the verbose option is specified, you must provide additional information. Each time you make a scheduling decision you should print the current time and, for SJF or SJR, what your current modelled burst time is for each process. For RR, along with the current time you should show the state of your queue, that is what order you expect to schedule processes in the near future.

What to turn in:

You will e-mail me two things prior to class:
  • The path to a compiled and executable (or interpretable) version that I can test on mclovin. You code must be on mclovin and work there!
  • This directory will also contain a text file describing how to compile and execute your program.
  • The code of your simulator. Do not include any executable files.

Frequent questions on the project

There is some confusion about the project, so let me try and address some common questions I have received.

What information you can use: you are writing a simulation. In your simulation you need to keep track of the entire state of the universe, including what time it is, what the scheduler knows, and what state the processes are in. It is ok to know and keep track of all that in your program. The simulation does't have to run tick by tick - you can jump forward in time to the next event that happens.

What you cannot do, however, is allow your scheduler to know things that are in the future. Your scheduler can only know and make decisions based on what has happened in the past. Therefore, it can use any burst times or idle times that it observed to have completed in the past as part of its calculation. It cannot use any ongoing or future burst times or idle times to make a decision.

Notice the separation of what the simulation knows (everything) to what the scheduler knows (only what has happened in the past in simulation time). Don't confuse the two.

The difference between SJF and SJR: For both SJF and SJR you are computing an estimated burst time based on past performance. Your estimate will likely be wrong, but you will have to live with that and do the best you can with the information you have available to you, just like life itself.

When the scheduler is running SJF once a job is scheduled, based on the very imperfect information it has, that job runs until completion of its current bust time. It cannot be pre-empted. Note that running to completion means running to completion of a particular burst time, not completion of the process. If the burst time turns out to be much longer than your scheduler estimated, then too bad; the process gets its full burst time before the scheduler runs again and you can update your estimate.

For SJR, should another process arrive that has a shorter estimated burst time than the currently running process has left, then the current process should be suspended. The new process should be run - either for its full burst time or until another process arrives with a shorter expected burst time. Note that for SJR you will have to keep track of the estimated burst time and how long a particular process has been running so you can estimate how much of the current burst is left. You then need to schedule based on the current estimated remaining burst time for processes which have been pre-empted, or the full burst time for those which haven't run yet. Again, the schedulers estimates of burst time will be imperfect, but we are all imperfect so we have to live with it.

When new processes arrive to be scheduled for the first time, you have no idea of what to expect its burst time to be. You'll have to make some decision about what to do. All such decisions are wrong in some way, take your best shot at it.