Project 3 - Paging Algorithm Performance


As we have discussed in class, the choice of a page replacement algorithm can greatly affect the performance of a computer system. In this project, you will simulate several different algorithms and test their performance on memory trace files of programs running on a Linux system.

The Simulator

You first task will be to write a simulation of a single-level page table that runs on your account on aji. Assume that the page size is 4k, and that it is a 32 bit processor.

Program speed matters, so I suggest C or C++, though Java is acceptable though not preferred. Interpreted languages such as Python or Ruby are probably right out, but hey, prove me wrong.

Running the program

The program should be contained in a single file named with your netid (if possible with the language you choose - if not, keep it simple). For example, my would be named clay.exe. Your program should accept the following arguments:

<yournetid.exe> <number of frames> <algorithm to use> [verbose]

where:

number of frames is the number of physical frames of RAM available to the process.

algorithm to use is one of the following (see below for more information about which to implement):

  • FIFO: First In, First Out
  • LFU: Least Frequently Used
  • LRU: Least Recently Used
  • MFU: Most Frequently Used
  • CLOCK: Global Clock
verbose is an optional flag that will provide additional output to show how your simulator is running.

Your simulator should read input lines from standard input of the form:

0x40000c35 W
0x40000c36 W
0x40011ac7 R
0x40011aca R
0x40000c44 W

where the first hex value is the address being accessed, and the second value indicates a read or write at that memory address. We are using standard input because the trace files are very large when uncompressed. We won't uncompress them. Instead, we will use gzcat to spew the uncompressed trace file, and pipe the output into our simulations. For example, a run might look like:

gzcat gpp.gz | clay 10 opt

Algorithms to Implement

You will have to implement five different page replacement algorithms. While this sounds awful given it is nice out, with careful planning of how you store and update your page table it isn't really very difficult to implement additional algorithms.

Output

Your program should simulate the algorithm specified using the number of frames specified. When complete, it should output statistics in the following format:

Number of memory accesses:
Number of misses:
Number of writes:
Number of drops:
Where the number of memory access is the number of lines in the file, as each represents a disk access; the number of misses is every time something is not found in a frame; the number of disk writes is count of how many frames are written to disk (this is when a page in a frame is evicted and is has been written to since it was brought in); and the number of drops is the number of frames that were evicted but were not dirty and hence not written back to disk.

Additionally, if the verbose flag is set, it should output a message every time a page is replaced. This message should indicate the hex address of the page being replaced, the page replacing it, and whether the replaced page is overwritten or swapped to disk first. The format must be:
Page <old page number in hex> (overwrites|swaps) <new page number in hex>

For example:

Page 0x60970 overwrites 0x60929
or
Page 0x60970 swaps 0x60929

Trace Files

These traces files were gathered on a Linux system. They have been reduced in terms of the number of references, however, because full traces are prohibitively large. They are listed below in order of number of memory accesses.

  • g++ compilation of an 051 final project, ~320K references: gpp.gz
  • A portion of an emacs run, ~2M references: emacs.sm.gz
  • A few seconds of the top utility running, ~5M references: top.gz
  • A run of an 051 project solution, ~7M references: stars.gz
  • A longer part of an emacs run, ~10M references: emacs.gz
  • Latex run of a single page letter of recommendation, ~31M references: latex.gz

Instead of having to download everything to your own directory (though you can if you want) you can reach each of these files on aji using the path: ~clay/<filename>

The need for speed

Speed counts. We want our systems to be fast. We will therefore be using the command time to get a measure of how fast our programs are. Make sure that you time your program, not gzcat. Do not use the verbose flag for timed runs.

You can time your programs like this:

gzcat ~clay/emacs.sm.gz | /usr/bin/time clay 10 LRU

You will get additional output that looks like this:

0.43user 0.19system 0:03.67elapsed 17%CPU (0avgtext+0avgdata 2608maxresident)k
0inputs+0outputs (0major+210minor)pagefaults 0swaps

The time we are most interested in is the user time, since that is how much CPU time your program took to run on that trace, so that is the time you will report. All timings must be done on aji There will be a bonus for fast programs, and a small deduction for really slow ones.

What to turn in:

On aji you will create in your home directory a directory called paging. This directory will include:
  • A file named RUN.txt which will include a single line that is needed to run your program, excluding the arguments. For example, if your program ran from the command line and was named clay-paging then your file might only contain on a single line ./clay-paging. If you used java, for example, and needed to invoke the java interpreter the file might instead contain something like java -jar clay-paging.jar. This will vary by language. It is important that this be correct - I will use a program that extracts this line and then automatically uses it to run and test your program with different parameters.
  • A file named BUILD.txt that briefly describes how to compile or build your code.
  • A file named CODE.txt that lists the names of the files used in your simulation with a brief description of what each file is for or what the code inside does
  • The files containing the source as described above.
You will then create a tar file in your home directory named <user_name>-pagingr.tar, where <user_name> is your aji.cs.georgetown.edu login, by typing

tar -cf <user_name>-pagingr.tar pagingr

Gzip the tar file by typing

gzip <user_name>-pagingr.tar

Since this timestamp will be used to verify whether the work was completed on time or not, you should set the permissions on the file you submitted to make sure that the file timestamp is not changed. So this by typing:

chmod a-w <user_name>-pagingr.tar.gz

Upload the gzipped tar file to Canvas for the assignment. This provides a timestamp and source code backup.

In addition, you will run your program for each algorithm on the latex.gz file on aji. You will then email me a PDF document that contains a nicely formatted (meaning printable) report that shows the results of each of your implemented algorithms on the latex.gz trace file, including timing information. Use 50 frames for this analysis. A table showing all this would be perfect.