COSC 388: Machine Learning

Project 5
Fall 2003

Due: Dec 5 @ 5 P.M.
13 points

  1. Take 50 to 100 e-mail messages and divide them into two classes (e.g., spam/not-spam or personal/administative).

  2. Transform the e-mail messages into a set of training examples.

  3. Use WEKA to evaluate three learners on you data set. One must be a support vector machine (i.e., SMO).

  4. Feel free to use whatever you need to produce the training data, perl, C++, Java, C, etc.

  5. Submit an archive, like you've done for past projects.

    1. Include the program(s) you used to transform your e-mail into training data.

    2. In a text file named README include a detailed description of the steps you took and the results of the evaluation. This is a little tricky since the source of data for this project is your personal e-mail. Obviously, do not reveal any personal information, but try to give as much detail as possible about the terms selected as being most relevant and about the induced concept descriptions. There is no requirement to include the training data in the archive. Do not include the original e-mail messages in the archive.

Instructions for Submission: In the header comments, provide the following information:

//
// Name
// E-mail Address
// Platform: Windows, OS X, Redhat, Solaris (cssun/gusun/daruma), etc.
// Development Environment: gcc, g++, java, g77, etc.
// Mail Client: mailx, pine, GUMail, Netscape, Yahoo!, etc.
//
When you are ready to submit your project for grading, create a compressed archive of a directory containing the files of project and send it to me by e-mail as an attachment, as you did for Project 1.

Submit your project before 5:00 P.M. on the due date.

Once submitted, it is important to keep an electronic copy of your project on either cssun or gusun. These systems are regularly backed-up, and if we lose your project or the e-mail system breaks, then we will need to look at the modification date and time of your project to ensure that you submitted it before it was due. If you developed your code on a Windows machine, then use a secure ftp client to transfer your files or the archive to cssun or gusun.

Finally, when storing source code on university machines, it is important to set file permissions so others cannot read the file. To turn off such read/write permissions, type at the UNIX prompt chmod og-rw <file>, where <file> is the name of your source file.