Scheduling Algorithm Performance

Project 2 - Scheduling Algorithm Performance

The scheduling of processes is an important part of making our computer efficient, and hence fast. We are going to write a scheduling simulator to measure what differences scheduling can make.

The Simulator

You will write a scheduling simulator that will implement three different scheduling algorithms: round robin, shortest job first, and shortest job remaining. You may use whatever programming language you like. If you need additional programs or libraries installed on aji, please email me with the name of the appropriate apt-get package. You must provide a working version on aji, though you do not have to do your work there.

Running the program

Your program should accept the following arguments:

<program name> <scheduling algorithm> [optional algorithm parameter] [verbose] <process time file n>*

where:

scheduling algorithm is one of the following:

RR: round robin
SJF: shortest job first
SJR: shortest job remaining

parameter is an parameter present only for round robin that specifies the time quantum a process should run.

verbose is an optional parameter that produces additional trace output as described below.

process time file is the name of a file containing process run information. Note that there may be up to 100 of these files provided on the command line. Each file will be named in the format process-N.txt, where N is a uniquett integer between 1 and 65535. Each file represents the operation of a single process over time. Each file has the format:

start 0 B 120 I 4200 B 100 I 3700 B 110 end

where B is the burst time in microseconds, and I is the time the process is blocked and unable to run, also in microseconds. Some processes may have a start time that is later than zero; they should not be included in your scheduling decisions until after their arrival.

Limitations

When doing SJF or SJR, you need to consider the burst time of the process. The natural thing you will want to do is look at the burst time in the file. You cannot do this. In the real world, we do not get to ask the process what its burst time is - we can only predict what is coming up based on what happened in the past. You must therefore limit your burst time prediction to what you have determined based on past performance. For ties, schedule the lower process number first.

Output

When the verbose option is not enabled, your program needs only print out what process runs in what time interval, in the format:

<process number> <start time> <end time>.

You may assume that scheduling and switching processes takes zero time.

For example,assume we had two processes, 1 and 2, with the same pattern of behavior shown above. For SJF, your output would look like:

1 0 120 2 120 240 Idle 240 4320 1 4320 4420 Idle 4420 4440 2 4440 4540 Idle 4540 8120 1 8120 8230 Idle 8230 8240 2 8240 8350 end

Update: Print out the total wait time As I mentioned in class, your program should print out the total time the process waited when it completes. I will not be strict about the format since this is a late update. Processes do not have to be sorted by number. An example of how your output might look

End for process 2 waited 120 End for process 1 waited 0 End for process 3 waited 120 End for process 4 waited 240

If the verbose option is specified, you must provide additional information. Each time you make a scheduling decision you should print the current time and, for SJF or SJR, what your current modelled burst time is for each process.

What to turn in:

On aji you will create in your home directory a directory called scheduler. This directory will include:

A file named RUN.txt which will include a single line that is needed to run your program, excluding the arguments. For example, if your program ran from the command line and was named clay-schedule then your file might only contain on a single line ./clay-schedule. If you used java, for example, and needed to invoke the java interpreter the file might instead contain something like java -jar clay-schedule.jar. This will vary by language. It is important that this be correct - I will use a program that extracts this line and then automatically uses it to run and test your program with different parameters.
A file named BUILD.txt that briefly describes how to compile or build your code.
A file named CODE.txt that lists the names of the files used in your simulation with a brief description of what each file is for or what the code inside does
The files containing the source as described above.

You will then create a tar file in your home directory named <user_name>-scheduler.tar, where <user_name> is your aji.cs.georgetown.edu login, by typing

tar -cf <user_name>-scheduler.tar scheduler

Gzip the tar file by typing

gzip <user_name>-scheduler.tar

Since this timestamp will be used to verify whether the work was completed on time or not, you should set the permissions on the file you submitted to make sure that the file timestamp is not changed. So this by typing:

chmod a-w <user_name>-scheduler.tar.gz

Upload the gzipped tar file to Canvas for the assignment. This provides a timestamp and source code backup.

The project is due April 6th, before class.

Frequent questions on the project

Let me try and address some common questions I have receive about this project.

What information you can use: you are writing a simulation. In your simulation you need to keep track of the entire state of the universe, including what time it is, what the scheduler knows, and what state the processes are in. It is ok to know and keep track of all that in your program. The simulation does't have to run tick by tick - you can jump forward in time to the next event that happens.

What you cannot do, however, is allow your scheduler to know things that are in the future. Your scheduler can only know and make decisions based on what has happened in the past. Therefore, it can use any burst times or idle times that it observed to have completed in the past as part of its calculation. It cannot use any ongoing or future burst times or idle times to make a decision.

Notice the separation of what the simulation knows (everything) to what the scheduler knows (only what has happened in the past in simulation time). Don't confuse the two.

The difference between SJF and SJR: For both SJF and SJR you are computing an estimated burst time based on past performance. Your estimate will likely be wrong, but you will have to live with that and do the best you can with the information you have available to you, just like life itself.

When the scheduler is running SJF once a job is scheduled, based on the very imperfect information it has, that job runs until completion of its current bust time. It cannot be pre-empted. Note that running to completion means running to completion of a particular burst time, not completion of the process. If the burst time turns out to be much longer than your scheduler estimated, then too bad; the process gets its full burst time before the scheduler runs again and you can update your estimate.

For SJR, should another process arrive that has a shorter estimated burst time than the currently running process has left, then the current process should be suspended. The new process should be run - either for its full burst time or until another process arrives with a shorter expected burst time. Note that for SJR you will have to keep track of the estimated burst time and how long a particular process has been running so you can estimate how much of the current burst is left. You then need to schedule based on the current estimated remaining burst time for processes which have been pre-empted, or the full burst time for those which haven't run yet. Again, the schedulers estimates of burst time will be imperfect, but we are all imperfect so we have to live with it.

When new processes arrive to be scheduled for the first time, you have no idea of what to expect its burst time to be. You'll have to make some decision about what to do. All such decisions are wrong in some way, take your best shot at it.

Operating Systems