paint-brush
Git got big files or keys? Break out BFGby@mikefettis
853 reads
853 reads

Git got big files or keys? Break out BFG

by mike fettisNovember 9th, 2018
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Everybody messes up, today’s mistake was adding a big file to git before a .gitignore was in place to handle it. As a result, github is rejecting the push, even after “removing” the file from git. The reason is that the file still exists in git(history). Time to clean up the mess, break out BFG and nuke it from orbit. -Sadly this means java is involved, but necessary demons. BFG can be found below, and a java jdk needs to get installed.

Company Mentioned

Mention Thumbnail
featured image - Git got big files or keys? Break out BFG
mike fettis HackerNoon profile picture

Everybody messes up, today’s mistake was adding a big file to git before a .gitignore was in place to handle it. As a result, github is rejecting the push, even after “removing” the file from git. The reason is that the file still exists in git(history). Time to clean up the mess, break out BFG and nuke it from orbit. -Sadly this means java is involved, but necessary demons. BFG can be found below, and a java jdk needs to get installed.


BFG Repo-Cleaner by rtyley_A simpler, faster alternative to git-filter-branch for deleting big files and removing passwords from Git history._rtyley.github.io

First things first take a look at BFG repo-cleaner. Welcome back, hopefully there was some reading involved. BFG repo-cleaner will be used to clean up the big files, this can also be used to clean up sensitive data that someone accidentally added to a repo. “cough cough” aws keys. It does this by rewriting the git history and removing all traces of the file. Like many things git sometimes is better not to explain the wizardry and dive right in.

TLDR oh my git… just do this… black magic ensues.














Welcome back from blindly running commands found on the internet, everything worked correctly right? Time to break down what just happened. The prework is setting up BFG and getting it loaded into the environment. A folder structure is created in the home folder to store the jar. The jar is then downloaded and a symlink is created so that when the new version is added the old symlink can get deleted and reset. This is not entirely needed but it certainly helps. Next the folder is added to the path env variable in the bash_profile file. Then sourcing the bash_profile to use the new path and the new folder. It is not required to do all of this but, let’s be honest, this is going to happen more than once and it is better to have this in there for the future. After that the repo is cloned ( most likely it already exists so don’t worry. Then git garbage collection is run. Next move out of the directory because BFG needs to be run not in the current dir. Fire the BFG passing in the file or wildcard that should be nuked.Drop back in the folder.Expire the get reference log which cleans up some things BFG didFinally git garbage collection to clean up the rest of the cruft.

That’s that, files have been removed and all history of them existing has been wiped. This type of process can be especially useful when combined with a git hook and a regex for specific things in files, like keys and whatnot. It can also easily be tied into a Jenkins build pipeline to protect people from themselves. Good luck and when in doubt break out the Big “Friggin” Gun

BONUS: there is a fantastic zine from julia evans that talks about some other great git things


New zine: Oh shit, git!_Hello! Last week Katie Sylor-Miller and I released a new zine called "Oh shit, Git!". It has a bunch of common git…_jvns.ca

(Links for everything mentioned:)


Removing sensitive data from a repository - User Documentation_If you commit sensitive data, such as a password or SSH key into a Git repository, you can remove it from the history…_help.github.com


BFG Repo-Cleaner by rtyley_A simpler, faster alternative to git-filter-branch for deleting big files and removing passwords from Git history._rtyley.github.io


Git - git-reflog Documentation_The "show" subcommand (which is also the default, in the absence of any subcommands) shows the log of the reference…_git-scm.com


Git - git-gc Documentation_If the number of packs exceeds the value of gc.autoPackLimit, then existing packs (except those marked with a.keep file…_git-scm.com