19

While git-blame and counting number of lines changed by an author within a git repository are helpful, is there a command that can list all of the pathnames modified in a repo across all commits by either an author or set of authors that scores each file by the number of commits by that author or set of authors? E.g. the output from running such a command in a cloned git repo would be similar to:

1    /path/to/some/file/in/repo/file1
34   /path/to/some/file/in/repo/file2
3    /path/to/some/other/file/in/repo/anotherfile
...
2
  • :) No. Just wanted to identify parts of the code that could be focused on, and using # commits per file by author would be one way of identifying parts of the code to focus on for knowledge transfer when an employee is leaving. Commented Sep 15, 2014 at 21:10
  • Are you willing to write a batch file? If so, you could use git rev-list HEAD --count --author=someDude -- somefile.txt to create a count output. Commented Dec 9, 2014 at 17:37

3 Answers 3

27

Just realized that if you use --name-only to print the filenames, pretty format as empty string, and use this method to sort, uniq, and sort by top number of commits, in *nix/OS X, you could use:

git log --name-only --author=John --pretty=format: | sort | uniq -c | sort -nr

Be sure that you are using the right author.

E.g. if we were trying to find DHH's authors in Rails, we might do:

git log --format='%aN <%aE>' | LC_ALL='C' sort -u | grep avid

and notice that all of DHH's authors in the Rails git repo use the name "David Heinemeier Hansson". So, then we could do:

git log --name-only --author="David Heinemeier Hansson" --pretty=format: | sort | uniq -c | sort -nr

Which might output:

3624 
 611 actionpack/CHANGELOG
 432 activerecord/CHANGELOG
 329 railties/CHANGELOG
 206 activerecord/lib/active_record/base.rb
 195 activesupport/CHANGELOG
 157 actionpack/lib/action_controller/base.rb
 153 railties/Rakefile
 108 activerecord/lib/active_record/associations.rb
  79 actionpack/lib/action_view/helpers/javascript_helper.rb
  75 activerecord/lib/active_record/validations.rb
  74 activerecord/test/base_test.rb
  69 actionmailer/CHANGELOG
  66 railties/lib/rails_generator/generators/applications/app/app_generator.rb
  66 activerecord/Rakefile
  66 actionpack/lib/action_controller/caching.rb
  60 actionpack/lib/action_controller/routing.rb
  59 railties/lib/initializer.rb
  59 actionpack/Rakefile
  57 actionpack/lib/action_controller/request.rb
  ...

So, as of 2015-02-21, there were 3624 files in the Rails git repo that it appears he never personally made commits to, the top number of commits for a file was the ActionPack CHANGELOG at 611 commits, followed by the ActiveRecord CHANGELOG, and ActiveRecord::Base was the Ruby file he made the most commits to.

If you want to exclude the number of files not touched from the counts, use --format= instead of --pretty=format:, e.g.:

git log --name-only --author="David Heinemeier Hansson" --format: | sort | uniq -c | sort -nr
Sign up to request clarification or add additional context in comments.

Comments

3

Example with PowerShell

Display the commit count of the specified author for each file in the current working tree.

Short Form

$author = 'shaun';
dir -r | % { New-Object PSObject -Property `
@{ `
   Count = git rev-list HEAD --count --author=$author -- $_.Name; `
   FileName = $_.Name; `
}} `
| sort Count | % { $_.Count + ' -- ' + $_.FileName + ' -- ' + $author; }

Long Form

$author = 'shaun'; `
Get-ChildItem -recurse | ForEach-Object `
{ `
   New-Object PSObject -Property `
   @{ `
       Count = git rev-list HEAD --count --author=$author -- $_.Name; `
       FileName = $_.Name; `
    } `
} | ` 
Sort-Object Count | ForEach-Object ` 
{ ` 
   $_.Count + ' -- ' + $_.FileName + ' -- ' + $author; `
} 

Notes

  • ` means continue the command on a new line.
  • | means pipe the resultant objects to the next command.
  • $_.SomeProperty accesses a property from the piped in object.
  • you can copy/paste this directory into PowerShell, because the ` marks indicate a new line.
  • include filter-branch to also track previously deletes files and other branches.
  • include git log --format='%aN' | sort -u to iterate through all project authors

Output

0 -- blame.txt~ -- shaun
0 -- .blame.txt.un~ -- shaun
1 -- GitBook-GitTools-06-RewritingHistory.asc -- shaun
1 -- GitBook-GitTools-05-Searching.asc -- shaun
1 -- GitBook-GitTools-03-StashingAndCleaning.asc -- shaun
1 -- GitBook-GitTools-07-ResetDemystified.asc -- shaun
1 -- README.md -- shaun
1 -- LICENSE -- shaun
1 -- GitBook-GitTools-09-Rerere.asc -- shaun
1 -- GitBook-GitBranching-Rebasing.asc -- shaun
1 -- blame2.txt -- shaun
1 -- GitBook-GettingStarted-FirstTimeSetup.asc -- shaun
1 -- GitBook-GitTools-02-InteractiveStaging.asc -- shaun
1 -- GitBook-GitTools-01-RevisionSelection.asc -- shaun
1 -- GitBook-GitInternals-Maintenance.asc -- shaun
2 -- goals.asc -- shaun
2 -- GitBook-GitTools-10-Debugging.asc -- shaun
3 -- blame.txt -- shaun
6 -- GitBook-GitTools-08-AdvancedMerging.asc -- shaun

2 Comments

How does this work? I'd like to tweak it to get commits per top-level directory in repo, or at least get the full path of each file.
I added a long form of the PowerShell for you. Let me know if you have further questions.
1

I found it would be helpful by adding this git alias to .gitconfig:

# list commit counts by file
cc = "!cd ${GIT_PREFIX:-./}; git log --name-only --format= \"$@\" | sort | uniq -c | sort -nr | head -30 #"
# list commit counts by folder
ccf = "!cd ${GIT_PREFIX:-./}; git log --name-only --format= \"$@\" | rev | cut -d'/' -f2- | rev | sort | uniq -c | sort -nr | head -30 #"

And then you can use the same arguments as git log, e.g.

git cc --author=hank --since="1 year ago" -- path/to/some/folder

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.