You are encouraged to use existing resources (data/tools) so long as you are creating a system that combines them in a new way or experiments with different ideas.
You are not restricted to Python.
Start simple, then iterate: Implementing anything complicated will probably take longer than you expect (Hofstadter’s Law).
The project proposal should specify who will be responsible for what. Divide up tasks so that everyone can contribute at once, rather than somebody becoming the bottleneck.
These are both due at the project deadline.
Assume the reader has taken the course (but not seen your presentation). It is up to you how to decide how to organize the report, but the reader should be able to answer the following questions:
What is the purpose of this project? Why is this project interesting?
How is your system organized? Without walking us through every line of code or every file, what are the main components and how do they relate? Tip: Give an example input and illustrate what each stage of the system produces.
How is data used in your system? Statistical methods/machine learning? If there is learning, is it supervised or unsupervised? Is there randomness in the system (such that it can produce different outputs for the same input)?
To what extent is your system language-specific? What kinds of changes would be necessary to adapt it to another language?
What is new about the system, what ideas or methods have been borrowed from the literature (if similar tasks have been done before), and what parts of the system are being used off-the-shelf (include download URLs)? Aim for a discussion of 5-10 relevant publications.
What are the evaluation criteria (quantitative & qualitative)? If quantitative evaluation is possible, explain how it is conducted (e.g., gold standard, post hoc human judgments, fully automatic measures like perplexity). Describe a baseline if appropriate. Give some examples of good outputs and bad outputs your system produces, and discuss them.
What was learned by doing this project? E.g., what design decisions were made, and why? What was learned through trial and error, or through experimental comparisons?
What made your task difficult? What would you do next if you had more time?
Your report should be roughly 4-8 pages (not including references and appendices). For inspiration (good and bad examples), you can consult shared task system papers, e.g. at a SemEval workshop. Your report should start with an abstract (a few sentences) that advertises/summarizes the main achievements.
You will have 15 minutes to present your project to the class. The presentation should describe the project motivation and goals; what was challenging about the task; what approach was taken, what aspects of the approach were new, and what the most interesting findings were. You need not (and probably should not) try to present every detail that is in your writeup—think of the talk as an advertisement that should motivate people to read the report. You can decide how to share time in the presentation (e.g., each person could present for 3-4 minutes). If you have a live demo, that should be part of the 15 minutes. There will be 5 minutes for questions afterwards.
After the presentations, each of you will be asked to give private feedback on your teammates. If the team functions smoothly and everyone has contributed a reasonable share to the project, everyone on the team will get the same grade. However, if somebody has been an especially strong contributor or an especially weak contributor, we will adjust individual scores to take that into account.