Final project

Ethics Discussion (Due Wednesday 3/27 11:59pm)

The first step for the project will be to form a group and participate in a discussion about the potential ethical issues that arise when applying machine learning models in the real world. For this stage you should:

Form a group of 2-4 students and choose a cool team name. If you would like help finding a group, please email me! If you would like to work individually, you must get prior approval from the professor.
Complete the ethics and AI assignment as a group.

Proposal (Due Thursday 4/10 11:59pm)

The next step for the project will be to formally propose a project topic, for this milestone you should:

Choose a project topic. You may choose a topic from the list below or you may propose your own topic.
With your team, write a short (up to 1 page) proposal for your project and submit it on gradescope. Your proposal should include:
The names of the team members
A one paragraph high-level description of the goal(s) of the project (e.g. object detection in images, classifying text data, etc.), how this goal could be useful in real-world applications and why this topic interests you.
A one paragraph description of how you intend to approach this problem and any challenges you forsee. This should include identifying datasets that you might use, identifying at least one referance (academic paper or website) for a technique you intend to try.
A one paragraph description of how you will evaluate success for your application.

Check-in (Due Monday 4/24 11:59pm)

As a progress report you and your team will submit a short (1 page) summery of progress on Gradescope. This summary will include:

A high-level description of any updates you have made to the goals of your project.
A description of what methods you have tried and any preliminary results.
A timeline for finishing the remaining goals of your project.
A brief description of the contributions made by each team member.

Final deliverables (Due 5/9, 11:59pm)

Please see the final project template here, which includes instructions and guidelines for what to submit. Submissions will be through gradescope.

Rubric: link

Overleaf template: link

Course server

There is a GPU server allocated to this class at teapot.cs.hmc.edu. Please follow the instructions in homework 6 or 7 to setup a Jupyter session for project work.

Data directory If you are working with a large dataset on the server, please do not save it to your home directory as this may cause you to exceed your account’s allocated limit. Instead, please create a sub-directory of /cs/cs152/shared for your project and save your data there.

Examples from previous semesters

Here are examples of great projects kindly shared by students from previous semesters. Note that these may be longer than what is expected for a perfect grade. 4-6 pages should be sufficient.

Possible projects

Neural style transfer

Link: https://arxiv.org/pdf/1508.06576.pdf

Summary: Style transfer is the process of taking an existing image and applying an artistic style to it, such as making a photograph look like a painting or a drawing (check out the examples in the linked paper!). This can be accomplished with neural networks. For this project you could: implement the neural style transfer algorithm, evaluate it with different kinds of images and compare it to other methods for restyling images.

Suggested datasets: Any artistic images you’d like!

Semi-supervised learning with MixMatch

Link: https://arxiv.org/pdf/1905.02249.pdf

Summary: Semi-supervised learning is the problem of learning when we don’t have all the labels for our training data. MixMatch is a state-of-the-art approach for semi-supervised image classification. In this project you could: implement the Mix-Match algorithm, compare the different versions of it discussed in the paper and evaluate it on several different datasets. You could also test it on your own proposed semi-supervised learning task.

Suggested datasets: Street view house numbers, CIFAR-10, STL-10

Audio generation and classification with WaveNet

Link: https://arxiv.org/pdf/1609.03499.pdf

Summary: WaveNet is at network that forms the basis for many text-to-speech systems (think Alexa or Siri) it also allows for classifying audio. For this project you could: implement WaveNet, train it to generate speech (or other audio like music!) and evaluate it compared to existing tools for generation. You could also try to use it to classify speech or music.

Suggested datasets: Spoken digits, Speech commands, Crema-D, GTZAN

U-Nets for segmentation, depth-prediction, colorization or super-resolution

Link: https://arxiv.org/pdf/1505.04597.pdf

Summary: U-Nets are a very flexible type of neural network used for many computer vision tasks. They were originally introduced for segmenting different parts of medical images, but can used for everything from colorizing images to upscaling images to predicting depth in am image. For this project you could: implement the U-Net architecture, train a U-Net on one or more of the above tasks and evaluate its performance.

Suggested datasets: Oxford flowers ImageNet, NYU Depth

Object detection with YOLO

Introduction: https://pyimagesearch.com/2022/04/11/understanding-a-real-time-object-detection-network-you-only-look-once-yolov1/

Original paper: https://arxiv.org/pdf/1506.02640v5.pdf

Summary: Object detection is the task of locating objects within an image. This is a step more difficult than just classifying images, but is very useful in practice. For this project you could: implement the YOLO object detection model, test different variations of the model and evaluate it on new data.

Suggested datasets: VOC, Wider face, Kitti

Image generation with Generative Adversarial Networks

Introduction: https://developers.google.com/machine-learning/gan

Original paper: https://arxiv.org/pdf/1406.2661.pdf

Summary: Generative adversarial networks (GANs for short) are an effective way to generate realistic looking images using neural networks. They have caused a considerable amount of excitement and concern for their performance in generating realistic images of humans. For this project you could: implement a generative adversarial network, explore reasonable ways to evaluate the performance of GANs and dive into the ethical implications.

Suggested datasets: MNIST, Omniglot, CIFAR-10, FFHQ

Image classification with visual transformers

Link: https://arxiv.org/pdf/2010.11929.pdf

Summary: Transformer-based neural networks have transformed the field of natural language processing in recent years, as evidenced by the performance of models such as ChatGPT. There is growing evidence that they are also extremely useful for classifying images. For this project you might: implement the visual transformer architecture, compare it to convolutional neural network based architecture for image classification and visualize features to understand the differences in the approaches. You might also consider applying it to your own dataset.

Suggested datasets: Oxford flowers, CIFAR-10, ImageNet

Text generation or classification with GPT

Link: https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf

Summary: Large language models, such at GPT-3 and GPT-4 have gained a lot of attention recently, as their performance in generating plausible text is (debatably) approaching human levels. The GPT model is used by Chat-GPT and many other applications to model language. For this project you could implement and train your own version of the original (GPT-1) model, compare it against available tools such as Chat-GPT and explore how to distinguish generated text from real human writing.

Suggested datasets: Amazon reviews IMDB reviews

Other possible projects:

Machine learning with differential privacy

https://arxiv.org/pdf/1607.00133.pdf

Graph neural networks

https://arxiv.org/pdf/1810.00826.pdf