What is git?

The most common version control system is called git. It helps you with:

  • keeping old snapshots of your work in case you need to undo a mistake
  • collaborating with others on your team by sharing a project
  • publishing your project online, for sharing or as a backup

Reviewing terminology

From your pre-class reading, what does each of these terms mean?

  • Easy:
    • repository (or repo)
    • commits and committing
  • Medium:
    • local and remote
    • push and pull
  • Hard:
    • clone
    • branch
    • merge conflict

Choosing when to commit

Recall the example story of work on a project from the notes, shown below. When are the best times to commit?

  1. You start by downloading a dataset from the instructor and starting a new blank Python script or Jupyter notebook in your repo folder. Everything's fine so far. 😀
  2. You try to load the dataset but keep getting errors. You don't manage to solve it before you have to go to dinner. 😡
  3. A friend at dinner reminded you about setting the text encoding, and that fixed the problem. You get the dataset loading before bed. Yes! 😀
  4. The next day before MA346 you get the data cleaned without a problem. 😀
  5. During class, the instructor asks your team to make progress on a hypothesis test, but you run out of time in class before you can figure out all the details. The last few lines of code still give errors. 😡

Commit messages

Warning: You will be tempted to not take commit messages seriously, and you will later wish that you had. Remember the importance of good code explanations; the same advice applies here. Write clearly and meaningfully!

Examples:

  • Downloaded dataset and started new Python script
  • Wrote code to load data
  • Added code to clean data

Avoiding merge conflicts

Anita and Barry are teammates on Project 1. Which of the following scenarios are correct ways to collaborate using git? What could go wrong?

  • Anita edits code near the top of project-1.py that cleans the data while Barry edits code near the middle of project-1.py that filters the dataset.
  • Anita edits data analysis code in analysis.ipynb while Barry creates a presentation in project-1.pptx.
  • Anita and Barry both edit hypothesis-test.py, which contains just 10 lines of code, each making various edits to add features, fix bugs, or tweak the output.
  • Anita edits a Jupyter notebook that analyzes data while Barry edits a different Jupyter notebook that cleans data and gets it ready for the analysis in the notebook Anita's writing.

Exercise 1

  1. Create a GitHub account and download the GitHub app, as described here in the course notes.
  2. Create and publish an empty (and private) repository to use in Project 1 for this course, as described here in the course notes.
  3. Add the Project 1 datasets to that repository and publish again, as described here in the course notes.
  4. Invite your instructor as a collaborator, so he can see your Project 1 repository, as described here in the course notes.
  5. Do the same for your Project 1 teammate, if you've chosen to have one.

Exercise 2

Which of the following statements are true, and why?

  1. If I don't intend to share my work online, there's no value to using git or the GitHub app.
  2. It is possible to use git without using the GitHub app.
  3. If I'm teammates with my friend Erl, then once I've pulled Erl's most recent commits from our shared repo, I can push my commits and they will have merged with one another successfully.
  4. It's best practice to commit your latest changes at the end of your work each day.

Homework

Add a Python script or Jupyter notebook to your Project 1 repository.

In that script or notebook, complete steps 2--3 from the Project 1 assignment document. That is, you will end up with a function that can take as input a state name (such as "Idaho") and a column name from the vaccinations DataFrame (such as "people_vaccinated") and return a pandas Series of just that column of data, indexed by date.

We will use that function in class next time, so the work is due before class next week.

Submit your work by committing and pushing to the repository. I will be able to see it because you have already invited me to the repository as a collaborator.