Semester Project

The best part of the semester: the awesome data-based group project!

Two monsters working magic to come up with a complete data analysis and presentation.

Figure 1: Artwork by @allison_horst.

Semester Project

The semester project will be broken down into 4 sections. Each part will get successively more technical (and worth more points of the overall project grade). The due dates are given below. Click on the appropriate full instructions to see the assignment for each part.

Data

You might consider looking at some of these places to find a dataset: https://hardin47.netlify.app/courses/data/

While the TidyTuesday datasets are super fun, they aren’t necessarily ideal for prediction (many of them include the entire population). I think you’ll have more luck looking at the UCI Machine Learning Repository.

Part 1 (worth 10 pts)

The goal of this data assignment is to understand the variables in your dataset and their con- nections with each other. Your task is to collect and describe a set of data of your choice and to perform some descriptive statistical analyses. The hardest part will be finding an appropriate dataset to use. Additionally, you will want to think carefully about the observational units (rows) in the dataset, they must be independent.

Part 2 (worth 20 pts)

Your task for the SLR project is to apply the tools of simple linear regression in order to answer questions about the relationship between two continuous (quantitative) variables. After the report is turned in, your pair (or you solo) will assess a different project based on the questions included in the full instructions. It is in your best interest to read those questions before writing up your analysis!

Part 3 (worth 30 pts)

The task for the MLR project is to build MLR models using the tools we’ve covered in class. You should come up with a single model in the end and assess how well that model does at predicting. In particular, you will:

Part 4 (worth 40 pts)

In the final part of the project, you have 3 separate tasks. The first is to apply the topics from the sparse and smooth linear models. The second is to apply something new (see full instructions). Woo hoo, you get to learn / try something new!! The final task is to summarize the semester project in a meaningful way for a client. Feel free to re-do anything from previous projects to make your final report even better.

Yes, it is going to seem awkward to try to combine all three parts. Do your best.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m158-lin-mod, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".