Project 98,172 - Paging Algorithm Performance


As we have discussed in class, the choice of a page replacement algorithm can greatly affect the performance of a computer system. In this project, you will simulate several different algorithms and test their performance on memory trace files of programs running on a Linux system.

The Simulator

You first task will be to write a simulation of a single-level page table that runs on your account on mclovin. Assume that the page size is 4k, and that it is a 32 bit processor.

Program speed matters, so I suggest C or C++, though Java is acceptable though not preferred. Interpreted languages such as Python or Ruby are probably right out, but hey, prove me wrong.

Running the program

The program should be contained in a single file named with your netid (if possible with the language you choose - if not, keep it simple). For example, my would be named clay.exe. Your program should accept the following arguments:

<yournetid.exe> <number of frames> <algorithm to use> [verbose]

where:

number of frames is the number of physical frames of RAM available to the process.

algorithm to use is one of the following (see below for more information about which to implement):

  • FIFO: First In, First Out
  • GC: Global Clock
  • LFU: Least Frequently Used
  • LRU: Least Recently Used
  • MFU: Most Frequently Used
  • OPT: Belady's Optimal Algorithm
  • RAND: Random
  • OWN: The algorithm you designed
verbose is an optional flag that will provide additional output to show how your simulator is running.

Your simulator should read input lines from standard input of the form:

0x40000c35 W
0x40000c36 W
0x40011ac7 R
0x40011aca R
0x40000c44 W

where the first hex value is the address being accessed, and the second value indicates a read or write at that memory address. We are using standard input because the trace files are very large when uncompressed. We won't uncompress them. Instead, we will use gzcat to spew the uncompressed trace file, and pipe the output into our simulations. For example, a run might look like:

gzcat gpp.gz | clay 10 opt

Algorithms to Implement

You will have to implement six different page replacement algorithms. While this sounds awful given it is nice out and you might be about to graduate (assuming you pass this class), with careful planning of how you store and update your page table it isn't really very difficult to implement additional algorithms. You can choose which to implement from this menu:

All from this list:

  • FIFO: First In, First Out
  • GC: Global Clock
  • LRU: Least Recently Used

and two of the following:

  • OPT: Belady's Optimal Algorithm
  • LFU: Least Frequently Used
  • MFU: Most Frequently Used
  • RAND: Random
and
  • OWN: one algorithm of your own design.

    Output

    Your program should simulate the algorithm specified using the number of frames specified. When complete, it should output statistics in the following format:

    Number of memory accesses:
    Number of misses:
    Number of writes:
    Number of drops:
    Where the number of memory access is the number of lines in the file, as each represents a disk access; the number of misses is every time something is not found in a frame; the number of disk writes is count of how many frames are written to disk (this is when a page in a frame is evicted and is has been written to since it was brought in); and the number of drops is the number of frames that were evicted but were not dirty and hence not written back to disk.

    Additionally, if the verbose flag is set, it should output a message every time a page is replaced. This message should indicate the hex address of the page being replaced, the page replacing it, and whether the replaced page is overwritten or swapped to disk first. The format must be:
    Page <old page number in hex> (overwrites|swaps) <new page number in hex>

    For example:

    Page 0x60970 overwrites 0x60929
    or
    Page 0x60970 swaps 0x60929

    Trace Files

    These traces files were gathered on a Linux system using pin. They have been reduced in terms of the number of references, however, because full traces are prohibitively large. They are listed below in order of number of memory accesses.

    • g++ compilation of an 051 final project, ~320K references: gpp.gz
    • A portion of an emacs run, ~2M references: emacs.sm.gz
    • A few seconds of the top utility running, ~5M references: top.gz
    • A run of the 051 infamous starfish solution, ~7M references: stars.gz
    • A longer part of an emacs run, ~10M references: emacs.gz
    • Latex run of a single page letter of recommendation, ~31M references: latex.gz

    Instead of having to download everything to your own directory (though you can if you want) you can reach each of these files on mclovin using the path: ~clay/<filename>

    The need for speed

    Speed counts. We want our systems to be fast. We will therefore be using the command time to get a measure of how fast our programs are. Make sure that you time your program, not gzcat. Do not use the verbose flag for timed runs.

    You can time your programs like this:

    gzcat ~clay/emacs.sm.gz | /usr/bin/time clay 10 opt

    You will get additional output that looks like this:

    0.43user 0.19system 0:03.67elapsed 17%CPU (0avgtext+0avgdata 2608maxresident)k
    0inputs+0outputs (0major+210minor)pagefaults 0swaps

    The time we are most interested in is the user time, since that is how much CPU time your program took to run on that trace, so that is the time you will report. All timings must be done on mclovin There will be a bonus for fast programs, and a deduction for really slow ones.

    What to turn in:

    Your code must be present in your home directory on mclovin in a directory named <netid>-paging. In that directory should be your executable code and a README file describing how to run it. You will e-mail me three things prior to class:
    • The code of your simulator, including the README file.
    • A nicely formatted (meaning printable) report that shows the results of each of your implemented algorithms on each trace file, including timing information. Use 50 frames for this analysis. A table showing all this would be perfect. This must be a PDF file; I reserve the right to edit any editable document you submit for my own enjoyment.
    • A description of your page replacement algorithm; why you think it should work well, and how it met your expectations.