Why does Git allow using wrong values in user.name and user.email?

Question

I recently started working on a project that uses GitHub for source control. I'm used to SVN and am happy with it, I'm new to Git and a couple of things about it are very annoying. One of them is that I need to explicitly configure user.name and user.email parameters (or whatever the proper term is, it doesn't seem to be mentioned in the docs, like many other things). Maybe it makes sense because I don't need to provide credentials when I commit locally. But it actually does ask for my credentials when I push my changes and just accepts whatever user.name value was set without checking whether it matches my login. Then GitHub shows my changes under someone else's name, which is very confusing.

Is there some deep wisdom behind this or is it just sloppy code?

It's just procedure I suppose. The name and email you use to commit changes is kind of just a label. GitHub automatically associates it to your account behind the scenes. This is why Git has support for signing commits and tags, which you can use to prove that a Git object in its entirety has legitimately come from you and you alone. — scrowler
– scrowler, Commented Nov 20, 2016 at 23:58
Git is a distributed system. Why would the login credentials to a particular remote repository be related to your user.name? And if it checked that, should it then reject your pushes that contain commits from other people? How would you then work with a team? — Thilo
– Thilo, Commented Nov 21, 2016 at 0:01
Why wouldn't they be related? I'm still trying to understand the thinking behind Git. It just seems wrong and confusing that it's possible -- by mistake or intent -- to ascribe the changes I make to someone else. And then there doesn't seem to be a way of fixing that. — biggvsdiccvs
– biggvsdiccvs, Commented Nov 21, 2016 at 0:07
You can also put in completely bogus timestamps or a manufactured merge history. By default, a repository will accept commits from authorized users regardless of their content and trust that user that they know what they are pushing. If this turns out to be an issue, you could configure your repository to audit everything according to certain criteria (dates make sense, all authors must have signed off their own commits, etc). — Thilo
– Thilo, Commented Nov 21, 2016 at 0:14
Having said that, it would be nice for someone that only uses Github to have the user.name set up automatically according to their account. I would not be surprised if there is already such a tool. — Thilo
– Thilo, Commented Nov 21, 2016 at 0:15

torek · Accepted Answer · 2016-11-21 02:18:56Z

It's neither deep wisdom nor sloppy code. It's just the nature of a distributed system.

Your Git repository is yours. You control it, in all aspects. You decide what to put in and what to keep out. You also decide whether and when to digitally sign commits and/or tags (see git tag --sign and a mountain of PGP documentation).

You do, of course, also have control at transfer points. Specifically, at various times, someone gives you some set of commits and/or tags, plus the stuff that goes with them (trees and blobs), and asks you to put them in your repository. This operation is git fetch if you are retrieving data from them, or git push if they are sending data to you. You can, at that time, decide whether to accept them or reject them. Git provides direct control over this rather binary operation through "hooks".

You could both reject them (tell the other end "no") and secretly copy and modify them, so that you secretly accept them but change them. One can even imagine a system in which this process is formalized and allowed directly during a fetch or push session: "I see you are offering me these commits and other objects, but I don't like them as they are, I will modify them."

There are some good technical reasons not to do it this way. In particular, the identity of a Git object is a cryptographic hash of its contents, and if the receiving Git were to adjust or replace some or all of the contents, it would necessarily also come up with a new hash. The hash function is deliberately designed to be "one-way", i.e., given just a hash, it is very difficult to come up with contents that produce that hash. Therefore, for this to work very well, the receiving Git would not only have to say to the sending Git: "I don't really like that, but I will take it if you change it to this"—and thus become a sending Git, and now the original sender becomes the receiver and has to do the same thing yet again. So instead, where Git does implement accept-or-reject, there is no intermediate version: the receiving Git simply rejects the attempt and it's now up to the sender to choose whether to correct the problem.

(The actual vetting process is really run only on push since fetch just puts new commits in a place where you, the person in control of your repository, can examine them before storing them under your names. There is virtually no vetting for tags on fetch, even though they go into a global name-space: any tags you already have are retained, rejecting the attempt to store the new one, but any tags you do not have are accepted and stored, and you would have to manually rip them out if you decide you hate them after all.)

GitHub has its own Git repositories, and GitHub could do this kind of vetting: make sure that incoming pushed commits have, as their user name and email address, something that matches valid a user name and email address as stored in the account information that whoever is doing the push used to authenticate themselves as themselves. It's merely traditional not to bother, since this would also be a pain for people who aggregate others' work and therefore push commits that deliberately give credit to the original author. One would presumably also have to bypass it on the initial push creating a new GitHub repository for some existing, long-running project with many authors.

Note that what you give to GitHub is not a user-name-and-email-address, though: it is, instead, some sort of authentication credential (such as an ssh key, or a time-limited authentication cookie). It tells GitHub that you know some sort of shared secret: that you are (probably) you. (GitHub does keep a mapping: ssh keys map to a GitHub account, and GitHub obviously has an email address associated with the account.)

Todd A. Jacobs · Accepted Answer · 2016-11-21 01:58:40Z

TL;DR

Consider what would happen if your username on GitHub was foo while your work email address was [email protected]. If Git or GitHub enforced a direct mapping between identity and email address, how would you expect Git to handle this portably and reliably?

How about if your name was John Q. Public, your username on localhost was john-public, and your GitHub account was jpublic? How should Git handle these differences across systems?

Git can't, so Git doesn't. Instead, Git treats commit data and authentication as separate things.

Don't Confuse Commit Data with Credentials

Data stored in Git commit objects and the credentials you're presenting to GitHub aren't the same things at all. You're thinking that your username or email address are your identity in Git, but they actually have nothing at all to do with authentication or authorization within Git or GitHub. The credentials you're presenting to GitHub are your GitHub username and password, or your GitHub username and SSH key, and any relationship to your local username or email address is purely coincidental.

If you ever use Git on an NFS-mounted share, work for different companies over the life of a project, work for more than one company at a time, or need to keep work and non-work projects logically separated, you learn to appreciate that Git's email attribution mechanisms are both flexible and portable.

Remember that Git is a content tracker, not an authentication system. Most of the authentication you do with third parties like GitHub are actually handled outside of Git using SSH or HTTPS protocols, neither of which care about fields in your commit objects.

Username and Email Address Aren't Identities

One of them is that I need to explicitly configure user.name and user.email parameters (or whatever the proper term is, it doesn't seem to be mentioned in the docs, like many other things). Maybe it makes sense because I don't need to provide credentials when I commit locally. But it actually does ask for my credentials when I push my changes and just accepts whatever user.name value was set without checking whether it matches my login. Then GitHub shows my changes under someone else's name, which is very confusing.

You're conflating many very different issues. Some of the more obvious ones are listed below, but there are certainly others.

Git commits track GIT_AUTHOR_NAME and GIT_COMMITTER_NAME as part of the commit object. The committer and the author aren't necessarily the same, and being able to apply patches to the code base on behalf of someone else is considered a design feature.
GIT_AUTHOR_EMAIL and GIT_COMMITTER_EMAIL can vary from system to system, and even from project to project since Git supports per-project configuration files. This email information is attached to the commit and may be used by git-format-patch, but it doesn't intrinsically have anything to do with SSH or HTTP(S) authentication.
GitHub assigns changes to users based on email addresses. However, this is a user-facing implementation decision by GitHub; Git itself doesn't conflate commit objects with authentication. At the command line, you can work a lot of magic with ~/.mailmap.
GitHub allows you to add multiple email addresses to your account for tracking commits that belong to you, and also allows you to use a private address if you like.
GitHub uses a variety of authentication mechanisms, but in general users push or pull using SSH or HTTPS. You use a username and SSH key for the former, and a username and password for the latter. Usernames need not match on both the local and remote systems.
Other authentication mechanisms like SMTP have their own configuration values separate from Git's user.name or user.email.

In general, Git's decision to keep authentication separate from author or committer details is a good one for portability. You can have different usernames or email addresses on different systems or projects, and your identity information is relatively portable when kept in ~/.gitconfig, $GIT_DIR/.git/config, or the appropriate environment variables.

Ah, right, something I failed to mention but you remembered: the user-facing decision GitHub makes, to assign changes to specific GitHub users (i.e., GitHub accounts). This decision seems mysterious. Do you know precisely how GitHub chooses these?
@torek GitHub just matches all emails (not usernames) associated with your account to Git commits. I'm honestly not sure if it differentiates between authors and committers, but that's all it does: credits the GitHub username associated with an email address for the activity. It's just a social metric, not a verification of identity, so you can't rely on it. As for implementation, it probably uses the mailmap mechanism under the hood, but you'd have to ask someone with access to the source to know for sure.

Collectives™ on Stack Overflow

Why does Git allow using wrong values in user.name and user.email?

2 Answers 2

Comments

TL;DR

Don't Confuse Commit Data with Credentials

Username and Email Address Aren't Identities

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

TL;DR

Don't Confuse Commit Data with Credentials

Username and Email Address Aren't Identities

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related