Difference between git clean -f and git checkout -- <filename>

Question

I need to clear working directory. What is the difference between Difference between git clean -f and git checkout -- <filename> . Btw, git clean -f doesnt work for me, even other versions below dont make sense:

git clean -n -d
git clean -f -d
git clean -f -X
git clean -f -x
git clean -n
git clean -fd

But git checkout -- . works

clean is used to remove untracked files, checkout as you wrote it will revert changes to a modified, but tracked, file. As far as the various options you can pass to clean, have you tried reading the documentation? — Cory Kramer
– Cory Kramer, Commented Oct 14, 2021 at 11:20
While I completely second @CoryKramer's comment, I re-read the doc, and was reminded that sentences such as "Remove untracked files from the working tree" are pretty opaque to new git users. — LeGEC
– LeGEC, Commented Oct 14, 2021 at 13:37
first : all commands have a detailed help, which you can access by running git help <command>. The page linked by @CoryKramer has the exact same content as git help clean. second : git clean explicitly avoids touching files which are "tracked" (aka: known to git). To manipulate tracked files, you can use other commands, such as git checkout (as you already found out), git restore or git reset. — LeGEC
– LeGEC, Commented Oct 14, 2021 at 13:40

torek · Accepted Answer · 2021-10-14 14:17:54Z

The git checkout command itself is quite complicated, but:

git checkout -- <pathspec>

is relatively simple: it extracts the given specified path from Git's index, writing the result to your working tree.

Git's index—which Git also calls the staging area, and sometimes (rarely these days) the cache—holds your proposed next commit.

Git first fills in its index from the current commit, when you check out that commit. This is one of many other things that git checkout does, which is why it's so complicated. After that, you can manipulate the stuff in Git's index using git add and git rm, and also some other commands that aren't really meant for human beings to use.¹

When you run git commit, Git takes whatever files are in its index right then and uses those to make the new commit. That's why "your proposed next commit" is a good description of what the index is all about.²

Meanwhile, git clean is about removing files that definitely aren't in Git's index right now. Since git checkout -- somefile extracts the file somefile from Git's index right now, git clean cannot do anything about that file. The fact that git checkout -- somefile can do something means: the file does exist in Git's index. The fact that the file does exist in Git's index means: git clean won't do anything with this file.

¹These include git update-index and git read-tree. They are mostly meant for writing new Git commands, not for everyday programming.

²The index takes on an expanded role during merge conflicts. At this point, you can't run git commit: it will tell you that there are "conflicts in index" and "cannot commit". So even then, it's still sort of your proposed next commit, it's just that it's a jumbled-up mess and can't be committed until you fix it. That's what you are doing when you fix merge conflicts: fixing the mess Git left behind in its own index, as well as any mess it left behind in your working tree.

Making sense out of this

To really understand what's going on here—which is important if you're going to use Git at all—you need to know some things about Git:

Git is all about commits.
Git isn't about files. Commits hold files, yes, but Git is about commits.
Git isn't about branches either. Branches let you (and Git) find commits. But Git is all about the commits.

And did I mention commits? Oh, I did. Okay then! Commits are Git's raison d'être, but: What, exactly, is a commit anyway? Well, we just said it's a thingy that holds files, but we need a little bit more. A commit is:

Numbered. The numbers on commits are unique, but they're really big and ugly looking. They don't simply count up, one two three. Instead, they're very large numbers (up to 2¹⁶⁰-1 right now and in the future, even bigger) that are normally expressed in hexadecimal, e.g., _{^{1fd3599ca076ba9f03c88661013810a9536921ea}}. Nobody is ever going to remember one of these, and if you need to use one with Git, you'll probably use your mouse to cut and paste it.
A storage holder: a container, of sorts, like a box on a shelf. This storage comes in two parts: each commit holds a snapshot and some metadata.

We won't go into all the details here, but we will say this about the snapshot: Git stores every file in every commit. This could take a tremendous amount of disk space, but Git saves on disk space in two particularly clever ways:

The files are all pre-compressed and Git-ified, so that they're shrunken. Sometimes they're very compressed (via what Git calls packed objects) and sometimes they're less compressed (via what Git calls loose objects), but they're always compressed.
More importantly, they're automatically de-duplicated. Git makes sure that the content gets stored exactly once. That is, if you write a file whose contents are hello world, that content goes into the repository. From then on, any time you save the same content, Git simply re-uses the existing copy. It doesn't matter where you put that content: in one file, or another file, or ten files all in the same commit, or whatever. No matter how many times you put that content into the repository, Git just keeps re-using the one saved file.

For this to work, the compressed, Git-ified, and de-duplicated content has to be read-only. It is in fact read-only: nothing, not even Git itself, can change it.¹ So all the files in a Git snapshot—in fact, everything in every commit, including the metadata too—is read-only. Nothing about any commit can ever be changed.

This means the snapshots don't take much space: they're compressed and de-duplicated. But it also means you can't use them to get any work done—at least, not yet.

¹If something, such as a disk error—these do occur sometimes—changes even a single stored bit of a file, Git will notice this as a corrupted internal object. Git won't fix the problem, which is itself a different problem, but it will at least notice.

Getting work done

To get any work done, when starting with some commit via git checkout main or git checkout feature or whatever, Git:

locates the commit: the branch name helps it find the commit's number;
reads all the stored files from the commit; and
expands those files back into ordinary, usable files.

These ordinary, usable files go into an ordinary, usable folder (or directory) on your computer. This is your working tree or work-tree. It's where you get your work done!

Git does not use these files. Git provides them to you, so that you can use them. They are now yours. This working tree is just ordinary files on your ordinary computer, that you use in ordinary ways. You can edit them. You can remove them. You can make new ones. You can fold, spindle, or mutilate them to your heart's content. Git isn't using them, at all.

Eventually, though, you'll want to make a new commit. You'll probably want some of your updates to be in this new commit, as part of its new snapshot.

To get updated files into the new commit, you must now run git add. What git add does is:

read the working tree copy;
compress it and otherwise Git-ify it, and check to see if it's a duplicate.

Having done all the checking, if the file is a duplicate, Git throws out the work it did here and uses the duplicate. If not, Git arranges for the file to go into the repository, and now the content is ready to be committed. Either way, Git now updates its index—the thing we mentioned several times above—with the new copy.

At the original git checkout time, Git filled in its index with "copies" of the de-duplicated files from the commit you extract. These are in Git's internal format, so these "copies" are all duplicates, so they take no space.² As you work, you modify working tree files, but they aren't in Git yet. Only once you run git add do they get partly into Git—and only once you run git commit are they safely saved for all time, or at least, as long as the new commit you just made continues to exist.

²The index itself needs a bit of overhead per file, typically a bit under about 100 bytes, so "no space" is an overstatement. But a 10 megabyte file, stored in a Git repository, gets compressed and de-duplicated and the index "copy" is then still just the ~100 bytes. If you modify it and git add the modified copy, Git might have to make new content, which might take a bit of time to compress the 10 MB into the internal Git format. If you don't, though, there's just that tiny index thing. And if you do, then after Git has compressed it, there's a compressed copy ready to go, and the tiny index thing. Either way, the index is still ready to go.

Untracked files

What all of this means is that you can have files in your working tree that are not in Git's index.

Typically, most of these are files you created. That is, you start with:

git checkout somebranch

This fills in your working tree, and Git's index, from the latest commit on somebranch. Then you modify some of those files, and maybe make new ones that you don't bother to git add.

Those new files you made, that you did not git add, are not in Git's index right now. That's because you made them, or you ran some program that made them, but you never added them to Git's index with git add.

Right now, those files are untracked files.

Once you do add them to Git's index with git add, they stop being untracked. That assumes you ever do add them. If you never do, they stay untracked.

You can also remove files from Git's index, using git rm. A regular:

git rm fileX

removes file fileX from both Git's index and your working tree, but you can run:

git rm --cached fileX

to remove fileX from Git's index without removing it from your working tree. You might do this if you created fileX just now and then ran git add by mistake, and want to undo your git add, but keep the file, for instance.

The important thing here, though, is this: However you get this set up, any file that is in your working tree, but not in Git's index, right now, is an untracked file. It doesn't matter how you go about setting up the situation. An untracked file is one that isn't in Git's index, right now. You can change this by changing what's in Git's index, or by changing what's in your working tree, or both.

Ignored files

Sometimes we work with programs that make a lot of files. For instance, C and C++ compilers often build .o (object) files, Python builds .pyc or .pyo files, and so on. These files should never be committed.³ To keep Git from committing them, you have to make sure you never git add them either.

The git status command is normally very helpful, but when you run git status, it complains about your untracked files. Git gets very whiny: Hey, look at these 500 untracked .o files, don't you want to git add them now? (No, Git, I don't. Stop bothering me!)

You can list untracked files in a .gitignore file, either by name or by pattern, such as *.o. This tells Git two things:

Shut the <expletive> up: don't tell me about them when I run git status.
Don't add them either, when I run git add.

This only works on untracked files. Once a file is in Git's index, it's a tracked file, and .gitignore won't ignore it. So the file's name is wrong: it should maybe be .git-shut-up-about-these-files-when-they-are-untracked-and-do-not-auto-add-them-with-any-en-masse-add-either-as-long-as-they-are-untracked-because-when-they-are-untracked-they-should-stay-untracked. But who's going to type in such a file name? So .gitignore it is.

³If you do commit them, it's a pain in the butt later. You can use git rm or git rm --cached to make more commits in which they're gone again, but any time you check out one of the old commits that has them, they come back at that point. No part of any existing commit can ever be changed!

`git clean` is about removing untracked files

Now that you know what untracked files are, the fact that git clean only affects untracked files makes sense. The git clean command removes certain untracked files.

It can also obey the .gitignore rules, depending on what flags you use. It can remove untracked-but-NOT-ignored files, or untracked-AND-ignored files, depending on whether do or don't use the -x or -X flags. This gets a little messy and I always have to refer back to the documentation to remember which flag does what.

`git checkout` is never about untracked files

The git checkout command has a lot of ways to use it, but it's always dealing with Git's index in some way or another. The index holds tracked files, by definition, because a file that's in the index is a tracked file (even if it's missing from the working tree).

Git is about commits, and your working tree is not in Git

Always keep in mind that Git's real focus is on commits. The stuff in your working tree is there for you to get your work done, but it's not in Git. It may have come out of Git, and you might put stuff from your working tree back into Git later. But it's not in Git. If you git add a file, it's ... partway into Git, and will finish getting there later when you run git commit.

Collectives™ on Stack Overflow

Difference between git clean -f and git checkout -- <filename>

1 Answer 1

Making sense out of this

Getting work done

Untracked files

Ignored files

`git clean` is about removing untracked files

`git checkout` is never about untracked files

Git is about commits, and your working tree is not in Git

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Making sense out of this

Getting work done

Untracked files

Ignored files

git clean is about removing untracked files

git checkout is never about untracked files

Git is about commits, and your working tree is not in Git

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

`git clean` is about removing untracked files

`git checkout` is never about untracked files