Git Status Between Windows and Linux Does Not Agree

git status between Windows and Linux does not agree

Git stores, in the index, some special bits of information to know easily whether a file in the work-tree is modified or not. The index itself is a file that resides within the Git repository (.git/index; there may be additional auxiliary files, and temporary index files, here, but .git/index is the index, the first and real-est, as it were).

These special bits of information in the index are (derived from) the result of operating system stat calls. The stat calls on Linux and the stat calls on Windows deliver different data (specifically st_dev, though the ino, uid, and gid can also be an issue), so a single index (and hence Git repository-and-work-tree) cannot1 be correctly shared across a machine boundary. This holds for network drives, VM images, Dropbox folders (which have other issues), or any other sharing mechanism that allows either system to directly view the other system's data.

The end result of all of this is that it's sometimes, just barely, possible to share a Git repository this way, but it's a bad idea: you'll get odd effects, such as Git missing some modified files, or thinking files are modified when they aren't. The latter is what you're seeing, probably.

It really works a lot better, though, not to share repository directories (nor work-trees) like this. That's even true on "friendlier" systems, such as MacOS vs Linux when using VMs and, e.g., vagrant. It sort of works, sometimes, but it just is not reliable. Use separate clones and your life will be happier.


1At compile time, one can choose to have Git ignore the st_dev field, to enable sharing across network drives. That sometimes makes a difference, and sometimes doesn't. I suspect this option is chosen in most Windows builds so that Windows can share with Windows, but is not enabled in Linux builds, which means the Linux side won't ignore changes made by the Windows side—which will result in odd behavior.

The timestamps are normally compatible, but if one enables nanosecond-resolution time stamps, that may also be problematic.

Git - Windows AND linux line-endings

On Windows:

$ git config --global core.autocrlf true

On Linux:

$ git config --global core.autocrlf input

Read more about Dealing with line endings

git repo gives contradictory info from WSL than from Windows

The two git installations (native windows and WSL) are using a different setting for the core.autocrlf configuration, because these two installations are not using the same global config file.

Put simply, the native windows client is converting LF to CRLF upon checkout, and hence the presence of CRLF is not "seen" as a change by git status. On the contrary, the WSL client expects UNIX-style LF line endings, so the git status sees every file as having been modified to change LF to CRLF.

Instead of relying on the global setting the core.autocrlf you should set it locally in the repository for any shared repositories. If the same repository is being accessed from both Linux/WSL and native Windows, you probably want this set to false so git does not change any line endings at all. Just beware that if you do set this as false, you'll have to make sure your editors can handle the line endings as they are (in general, most programmers editors I've used do support using UNIX LF, even on Windows).

The core.autocrlf is documented here for more info:

https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_core_autocrlf

Can anyone explaine why git status shows files as modfied when running under a share on linux?

This blog post explains it quite nicely. Basically it is good idea to set:

git config --global core.autocrlf true

git forces refresh index after switching between Windows and Linux

You are completely correct here:

  • The thing you're using here, which Git variously calls the index, the staging area, or the cache, does in fact contain cache data.

  • The cache data that it contains is the result of system calls.

  • The system call data returned by a Linux system is different from the system call data returned by a Windows system.

Hence, an OS switch completely invalidates all the cache data.

... how can I use set the index file for different system?

Your best bet here is not to do this at all. Make two different work-trees, or perhaps even two different repositories. But, if that's more painful than this other alternative, try out these ideas:

The actual index file that Git uses merely defaults to .git/index. You can specify a different file by setting GIT_INDEX_FILE to some other (relative or absolute) path. So you could have .git/index-linux and .git/index-windows, and set GIT_INDEX_FILE based on whichever OS you're using.

Some Git commands use a temporary index. They do this by setting GIT_INDEX_FILE themselves. If they un-set it afterward, they may accidentally use .git/index at this point. So another option is to rename .git/index out of the way when switching OSes. Keep a .git/index-windows and .git/index-linux as before, but rename whichever one is in use to .git/index while it's in use, then rename it to .git/index-name before switching to the other system.

Again, I don't recommend attempting either of these methods, but they are likely to work, more or less.

Why does git behave this way? Inconsistency between OS and VM accessing the same repository

Git does not really have a notion of "uncommitted file". What it does have is the index and the work-tree.

The main thing Git stores are commits:

  • Commits are permanent (mostly1), completely-read-only entities stored in a database of sorts (a simple key-value store, really) that allow Git to access the complete snapshot of the source you, or the committer, made when you, or the committer, made that commit. Along with that snapshot, you—I'll leave out the "or the committer" from here on but of course it is implied—get a chance to add your own metadata, specifically, the log message about why you made that commit.

    The "true name" of any commit is its hash ID. Git uses the hash ID as the key in the key-value store, to retrieve the commit. Each commit also contains the hash ID of its predecessor or parent commit (or, for merge commits, two or more parent hash IDs—this is what makes them "merge commits").

One commit is always the current commit. That's the one commit you selected (via git checkout) to work with. Because commits are read-only, you cannot change this commit. What you can do is, at some point, make a new commit. Normally, this new commit will use the current commit as the new commit's parent, and then become the current commit, and this is why you can always get back every file you ever committed: commits are permanent (mostly) and read-only (completely) and remember their parents.

The files stored with a commit—the snapshot you made—are saved in a compressed, Git-only format that is not useful to anything other than Git. So these files must be extracted from each commit before you can use them. Hence Git also has:

  • A work-tree. Here, Git can extract the files from a commit into the format in which the computer uses them. These files should not be shared across computers, not because it cannot work, but because it can and this just makes for big headaches, as you are discovering.

    Since files in the work-tree are stored in the native format, and are used by other programs, Git offers the ability to modify the files—specifically things like line-endings and permissions bits—as they come out of a commit on the way into the work-tree, and as they go from the work-tree into a commit. But there's one more key item and this is where the biggest headaches come from.

  • The index. This item sits between the current commit, and the work-tree.

The index stores all the files in their special Git-only format. It starts out containing the files as they were when they were committed. The key difference between the commit's copy of the files and the index's copy is that you can change the ones in the index. You change them by replacing them wholesale, using git add to copy the work-tree file back into the index.

When you make a new commit, Git simply uses whatever is in the index at that time. All the files are already there, all pre-packaged in the Git-only format. This makes committing very fast.

What this also means is that the transformation from Git-only format to "useable by this computer" format, and vice versa, happens on the copy from index to work-tree (which changes files from Git-only to useable) and git add copy from work-tree to index (which changes useable to Git-only).

This is almost always the slowest-by-far part of dealing with commits and files, so the index keeps track of (indexes!) the work-tree, using OS-specific information. That OS-specific information, found via the OS about the work-tree, goes into the index.

If you share the work-tree and index and .git files across machines, what happens is that the index itself becomes useless, because the OS-specific work-tree data stored in the index is for the VM or the host, but never for both at the same time.

When the index is correct and describes the work-tree correctly, git status is fast and accurate. When it's not, the two diffs it must run—see my answer to the question you linked—cannot be done nearly as efficiently. If you use any kinds of file transformations, they must either be re-run, or assumed to have changed files.

The TL;DR of all of this is: Never share a Git repository this way, use the fetch and push mechanisms to share it instead. This is not because it does not work, but rather because it can work, but becomes a horrible experience. The file name case-folding issues you identified are the tip of another whole nightmare iceberg (not directly solved by not-sharing the repository, but at least possible to solve that way).


1You can remove a commit, as long as you also remove all of its children and their children and so on. That is, removing a commit requires a sort of commit-line genocide. It's often a bad idea to do this, and if you are going to do it, you usually have to copy the entire chain of children—but sometimes it's a good idea, and in fact this is what git rebase does internally.

Note that git commit --amend does not change a commit. Instead, it just shoves aside (and thus eventually kills off and removes) the existing end-of-chain commit by creating a new replacement end-of-chain commit, using the current commit's parent as the new commit's parent.



Related Topics



Leave a reply



Submit