The plan, the details, and the logistics.
Class: Tuesdays & Thursdays, 2:45 - 4pm
Jo Hardin
2351 Estella
jo.hardin@pomona.edu
Office Hours: (Estella 2351)
Tuesday 8:30-10:30am
Wed 1:30-4pm
Thurs 8:30-10:30am
Mentor Sessions:
6-8pm Mondays (Estella 2131) and Thursdays (Estella 2113)
Mentor: Lauren Quesada
Linear Models is a second course in statistics that builds on introductory statistics using mathematical tools (including calculus and linear algebra). The simple linear regression model will be expanded to multiple linear regression which will see an in-depth analysis. We will investigate the impact of residuals, and we will use graphical tools to enhance both understanding and communication of the models. For data with many explanatory variables, we will use ridge regression and Lasso for predictive modeling. The statistical software R will be used for all analyses, homework, and projects. Focus will be on understanding the methods and interpreting results; we will discuss good modeling practices, ideas of which extend beyond linear models to any types of inference or prediction.
Anonymous Feedback As someone who is constantly learning and growing in many ways, I welcome your feedback about the course, the classroom dynamics, or anything else you’d like me to know. There is a link to an anonymous feedback form on the landing page of our Sakai webpage. Please provide me with feedback at any time!
By the end of the semester, students will be able to do the following:
(adapted from Monica Linden, Brown University):
In an ideal world, science would be objective. However, much of science is subjective and is historically built on a small subset of privileged voices. In this class, we will make an effort to recognize how science (and statistics!) has played a role in both understanding diversity as well as in promoting systems of power and privilege. I acknowledge that it is possible that there may be both overt and covert biases in the material due to the lens with which it was written, even though the material is primarily of a scientific nature. Integrating a diverse set of experiences is important for a more comprehensive understanding of science. I would like to discuss issues of diversity in statistics as part of the course from time to time.
Please contact me if you have any suggestions to improve the quality of the course materials.
Furthermore, I would like to create a learning environment for my students that supports a diversity of thoughts, perspectives and experiences, and honors your identities (including race, gender, class, sexuality, religion, ability, etc.) To help accomplish this:
All resources are freely available. They also all have print versions which can be purchased. Using the online version of the books is expected.
Content: An Introduction to Statistical Learning, James, Witten, Hastie, and Tibshirani https://www.statlearning.com/
Content: Applied Linear Statistical Models, 5th ed., Kutner, Nachtsheim, Neter, Li. You should be able to find online.
R resource: R for Data Science, Wickham, https://r4ds.had.co.nz/
R resource: Tidy Modeling with R; Kuhn and Silge https://www.tmwr.org/
tidymodels
.Homework will be assigned from the text(s) with some additional problems. One homework grade will be dropped. Homework will be done using the statistical software package R (in the RStudio IDE) and posted to GitHub. All homework must be done in RMarkdown and compiled to pdf.
Homework assignments will be graded out of 5 points, which are based on a combination of accuracy and effort. Below are rough guidelines for grading.
[5] All problems completed with detailed solutions provided and 75% or more of the problems are fully correct. Additionally, there are no extraneous messages, warnings, or printed lists of numbers.
[4] All problems completed with detailed solutions and 50-75% correct; OR close to all problems completed and 75%-100% correct. Or all problems are completed and there are extraneous messages, warnings, or printed lists of numbers.
[3] Close to all problems completed with less than 75% correct.
[2] More than half but fewer than all problems completed and > 75% correct.
[1] More than half but fewer than all problems completed and < 75% correct; OR less than half of problems completed.
[0] No work submitted, OR half or less than half of the problems submitted and without any detail/work shown to explain the solutions. You will get a zero if your file is not compiled and submitted on GitHub.
There will be a semester long group project. You will run a series of different linear models on a dataset of your choice. More information at: Math 158 Semester Project
GitHub will be used as a way to practice reproducible and collaborative science. There may be a slight learning curve, but knowing Git will be an extremely useful skill as you venture on after this class.
R will be used for all homework assignments. R is freely available at http://www.r-project.org/ and is already installed on college computers. Additionally, you need to install R Studio in order to use R Markdown, http://rstudio.org/. If you are not already familiar with R, please work through some of the materials provided ASAP.
You are welcome to use Pomona’s R Studio cloud server at https://rstudio.cloud. If you use the server, you can connect directly to your Git account without installing Git locally on your own computer. [If you are not a Pomona student, you will need to get an account from Pomona’s ITS. Go to ITS, tell them that you are taking a Pomona course, and ask for an account for using RStudio.]
This class will be interactive, and your participation is expected (every day in class). Although notes will be posted, your participation is an integral part of the in-class learning process.
In class: after answering one question, wait until 5 other people have spoken before answering another question. [Feel free to ask as many questions as often as you like!]
You are on your honor to present only your work as part of your course assessments. Below, I’ve provided Pomona’s academic honesty policy. But before the policy, I’ve given some thoughts on cheating which I have taken from Nick Ball’s CHEM 147 Collective (thank you, Prof Ball!). Prof Ball gives us all something to think about when we are learning in a classroom as well as on our journey to become scientists and professionals:
There are many known reasons why we may feel the need to “cheat” on problem sets or exams:
Being accused of cheating – whether it has occurred or not – can be devastating for students. The college requires me to respond to potential academic dishonesty with a process that is very long and damaging. As your instructor, I care about you and want to offer alternatives to prevent us from having to go through this process. If you find yourself in a situation where “cheating” seems like the only option:
Please come talk to me. We will figure this out together.
Pomona College is an academic community, all of whose members are expected to abide by ethical standards both in their conduct and in their exercise of responsibilities toward other members of the community. The college expects students to understand and adhere to basic standards of honesty and academic integrity. These standards include, but are not limited to, the following:
The faculty at Pomona College knows that person-to-person interaction provides the best liberal arts education. The best learning occurs in small communities. This year we are gathering in person for what we do best: create, generate, and share knowledge. During the past academic year, we built community remotely, and this year we will build on the pedagogical improvements we acquired last year. For example, we might meet on zoom from time to time, or hold discussions online on Sakai Discussions Board.
Our health, both mental and physical, is paramount. We must consider the health of others inside and outside the classroom. All Claremont Colleges students have signed agreements regulating on-campus behavior during the pandemic; in the classroom, we will uphold these agreements. We need to take care of each other for this course to be successful. I ask you therefore to adhere to the following principles:
There is a mask mandate for all indoor spaces on campus. You must wear a mask for the entire class; eating and drinking are not permitted. Your mask must cover your mouth and nose. The college has zero-tolerance for violations of this policy, and our shared commitment to the health and safety of our community members means if you come to class unmasked you will have to leave class for the day.
Class attendance is required, but if you need to miss class for health reasons, concerning symptoms, suspected Covid exposure, unexpected dependent care, technology issues, or other emergency reasons I will work with you. Let me underscore this: please make your decisions always based on health, safety, and wellness—yours and others—and I will work with you at the other end. Take any potential symptoms seriously; we’re counting on each other.
When not in class, avoid closed public spaces, and if you can’t avoid them: wear your mask properly, wash your hands, and maintain social distance.
If you, or a family member, are experiencing hardship because of the pandemic, talk to me or to someone in the Dean of Students office. You are not alone during this time.
The pandemic is fast-moving, and we might have to adjust the principles as the semester evolves. I am always happy to receive your feedback to make this course work.
Let’s care for each other, show empathy, and be supportive. While there will likely be some community transmission and breakthrough infections, together, we can minimize their effect on our community and on your learning.
Please email and / or set up a time to talk if you have any questions about or difficulty with the material, the computing, or the course. Talk to me as soon as possible if you find yourself struggling. The material will build on itself, so it will be much easier to catch up if the concepts get clarified earlier rather than later. This semester is going to be fun. Let’s do it.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m158-lin-mod, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".