2

I have git repository A that uses B as a submodule.

B's history has been rewritten after an LFS migration, but I would love it if A could still have its entire history functional. After the LFS migration, I do have a mapping OldSHA1 > NewSHA1 for submodule B, and now I just want to rewrite OldSHA1 gitlinks to NewSHA1 in repo A.

I have tried to run a filter-repo command on the repo A with a full OldSHA1==>NewSHA1 mapping as parameter but it doesn't seem to pick up gitlinks.

I also tried filter-branch as detailed in this thread Repository with submodules after rewriting history of submodule that seems to be looking for the exact thing I am trying to accomplish. I tried doing this with a single OldSHA1=>NewSHA1 mapping, and here's the command I am trying to run:

git filter-branch --commit-filter '
  if [ "$GIT_COMMIT" = <OLDSHA1> ];
  then
    cd <SUBMODULE_ABSOLUTE_PATH>;
    git checkout <NEWSHA1>;
    cd ..;
    git add -u;
    git commit -m "updated gitlink";
  else
    git commit-tree "$@";
  fi' HEAD 

But I keep getting the following error:

fatal: reference is not a tree: <NEWSHA1>

Somehow, git checkout doesn't seem to pick up the tree of submodule B. I even tried to specify a path with git -C AbsolutePathToSubModule checkout but I get the same error.

So, a few questions:

  • Is there something obvious I'm doing wrong here?
  • Is there a better way of accomplishing this? It seems like I "simply" want to replace a string with another somewhere in the object database, but I can't find a simple way to do that
  • Is there a way to do this on the entire repo like filter-repo does? Or should I run this on every single branch.

Thanks for any help, advice, clue about how to accomplish this!

Edit 1:

After an answer in the comments, I edited my script to this:

git filter-branch --commit-filter '
  if [ "$GIT_COMMIT" = <SpecificCommitID> ];
  then
    git update-index --add --cacheinfo 160000,<SpecificNewSha1>,<SubmodulePath>;
  fi
  git commit-tree "$@";
  ' HEAD

But it has no effect :(

WARNING: Ref 'refs/heads/develop' is unchanged

Edit 2:

Thanks a lot to user @torek! This is a snippet to help anyone get started:

git filter-branch --index-filter '
if [ "$(git rev-parse --quiet --verify :<SUBMODULEPATH>)" = <OLDSHA1> ];
then
  git update-index --cacheinfo 160000,<NEWSHA1>,<SUBMODULEPATH>;
fi' HEAD --all

From then, you have to loop over all OLDSHA1/NEWSHA1 pairs, or use a case) dictionary as depicted in their answer below

Thanks again a lot!

5
  • For filter-branch what you want is to update the index directly, which will be a big pain. I'm not sure if filter-repo has any existing Python function to do this for you but if so, it will be much easier, and if not, it's an obvious feature request... Commented Sep 26, 2022 at 13:33
  • If you do want to do this in git filter-branch, remember that you don't have the submodule at all, all you have is the gitlink. You must inspect the hash ID of the gitlink (stored in Git's index in the given path name: use git rev-parse to retrieve it from the index) and if it's one of the ones to replace, use git update-index to shove the corrected gitlink into position. The rest, the git filter-branch code will handle on its own. Commented Sep 26, 2022 at 13:34
  • Hi @torek and thank you for your comments! I'm pretty new to the whole filter-branch thing and am struggling a bit, could you elaborate on how git update-index works? I found something likegit update-index --cacheinfo 160000,<NewSha1>,<SubmodulePath> but i'm getting git update-index: --cacheinfo cannot add <SubmodulePath> ; cannot add to the index - missing --add option? when I do it in my script. And that still triggers with ---add Commented Sep 26, 2022 at 14:06
  • My bad, i typoed -add instead of --add. The final result is still not the desired one but I'm getting closer. Will write back as I fiddle with it. Thanks! Commented Sep 26, 2022 at 14:13
  • I updated my script in an edit at the end of my original post, baffled by how few examples of usage I can find online :( If you have any time to help, that would be amazing. Thanks! Commented Sep 26, 2022 at 14:42

4 Answers 4

2

This:

git filter-branch --commit-filter '
  if [ "$GIT_COMMIT" = <SpecificCommitID> ];
  then
    git update-index --add --cacheinfo 160000,<SpecificNewSha1>,<SubmodulePath>;
  fi
  git commit-tree "$@";
  ' HEAD

is not what you want as it tests the hash ID of the superproject commit. You need to test the hash ID of the submodule commit in the index entry, e.g.:

if [ "$(git rev-parse --quiet --verify :SubmodulePath)" = oldhash ]; then ...; fi

and of course that has to test all the old rewritten submodule hash IDs to run them through the mapping function.

(This will definitely be easier in filter-repo where you can use a dictionary lookup.)


If you use:

sm_hash=$(git rev-parse :submodule-path)

or similar to prefix the test, remember to account for the cases where the submodule path is absent from the index so that :submodule-path does not parse properly. I think --quiet --verify will do the right thing here (produce no ouput quietly) but it's worth testing first.

Once you have the hash, you can do:

case $sm_hash in
old1) new=new1;;
old2) new=new2;;
...
oldN) new=newN;;
*) new=$sm_hash
esac

as a poor man's dictionary lookup with default, but you will want to skip updating the submodule hash if it's unchanged-or-empty.

Sign up to request clarification or add additional context in comments.

Comments

1

Easiest is going to be, with the old and new ids in a shamap file,

git filter-branch --setup '
        declare -A newsha
        while read old new; do newsha[$old]=$new; done <shamap
'                 --index-filter '
        if oldsha=`git rev-parse :submodulepath 2>&-`
        then git update-index --cacheinfo 160000,${newsha[$oldsha]-$oldsha},submodulepath
        fi
'

and if you're on a Mac you'll need to brew install bash to get past one of the problems in their neglected GNU install.

1 Comment

Does anyone have an idea how to make that work with git-filter-repo? I'd like to do this in a repository with lots of branches. See github.com/newren/git-filter-repo/issues/537 .
0

The comment using bash syntax, declare -A ..., will not work. git filter-branch is a Bourne shell script (see https://github.com/git/git/blob/a82fb66fed250e16d3010c75404503bea3f0ab61/git-filter-branch.sh#L1), and the Bourne shell does not have associative arrays.

2 Comments

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
mmm, that's true on distros that hobble their default shell. Red Hat, SUSE, Slackware, Arch, Gentoo, and all distros built on those default to bash It's pretty much only Debian-based distros (that's a lot of them) that literally went backwards and switched to a less capable default shell.
0

Here's the way to do it with git-filter-repo.

This only works after running git-filter-repo in the submodule repo as this creates the commit-map file required for updating the parent repo, i.e. the repo containing the submodule.

git-filter-repo --commit-callback '
    commitMap = {}
    filename = r"<local-file-system-path-to-submodule-repo>/.git/filter-repo/commit-map"
    subModulePath = b"<relative-path-of-submodule-in-parent-repo>"

    with open(filename, "r") as file:
        # Skip first line
        next(file)
        
        # Process remaining lines
        for line in file:
            # Split line on first space and strip whitespace
            key, value = line.strip().split(maxsplit=1)
            commitMap[key.encode("utf-8")] = value.encode("utf-8")

    for change in commit.file_changes:
        if change.filename == subModulePath:
            print(change.blob_id)
            value = commitMap.get(change.blob_id, None)
            if value is not None: 
                change.blob_id = value
                print(f"Rewrote {change.blob_id} to {value}")
            else:
                print(f"Couldnt find {change.blod_id}")
'

Code based on https://github.com/newren/git-filter-repo/issues/537#issuecomment-2264377427.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.