“The implication here is that any code committed to a public repository may be accessible forever as long as there is at least one fork of that repository,” the report’s authors claim.
Am I dumb or is this exactly the purpose of forks? I feel like I’m missing something.
I think Github keeps all the commits of forks in a single pool. So if someone commits a secret to one fork, that commit could be looked up in any of them, even if the one that was committed to was private/is deleted/no references exist to the commit.
The big issue is discovery. If no-one has pulled the leaky commit onto a fork, then the only way to access it is to guess the commit hash. Github makes this easier for you:
I think all GitHub should do is prune orphaned commits from the auto-suggestion list. If someone grabbed the complete commit ID then they probably grabbed the content already anyway.
Thanks, I think that explains it a bit more. It is unexpected to me, as a non-git expert, and I’m sure many others.
I guess the funny thing is that each Git commit is internally just a file. Branches and tags are just links to specific commit files and of course commits link to their parents. If a branch gets deleted or jumped back to a previous commit, the orphaned commits are still left in the filesystem. Various Git actions can trigger a garbage collection, but unless you generate huge diffs, they usually stick around for a really long time. Determining if a commit is orphaned is work that Git usually doesn’t bother doing. There’s also a reflog that can let you recover lost commits if you make a mistake.