Git: scan repositories for secrets using Gitleaks
Gitleaks: scan GitHub repositories for leaked secrets. Run as a Jenkins cronjob and send notifications to Slack.
A confidential data leak such as RDS keys or passwords to a Git repository, even if it is a private GitHub repository, is a very bad thing, and it’s good to check your repositories to know if any developer pushed a commit with such data.
Scanning utilities
To check Git repositories for a leak, at first glance there are a lot of utilities:
Gittyleaks — looks interesting, but the last update 2 years ago
Repo Supervisor — has a Web UI, uses AWS Lambda, is fully integrated with GitHub, and can be checked later
Truffle Hog — CLI only, looks not bad
Git Hound — a plugin for git, can perform scan before commits only, but not remote repositories
Gitrob — the last update was three years ago
Watchtower — looks interesting, and even has a Web UI, but they didn’t even post about their pricing on the website, so out of the race
GitGuardian — a really good solution, but overpriced
gitleaks — CLI only, the one, we will use in this post
So, from the list above its worth trying Truffle Hog and gitleaks, but I didn’t like the Truffle Hog documentation.
Repo Supervisor looks promising too, will check it in the following post.
From those two:
Gitleaks: just a scanner — gave a URL of a repository, and it will generate a JSON report with findings
Repo Supervisor: can be used in two ways:
just to scan a local directory
scan a remote repository on Pull Request/push/etc
So, for the Gitleaks we can create a cronjob in Jenkins or Kubernetes that will take a list of repositories to be checked, and then will send a report to a Slack channel.
Also, Gitleaks can be used with Github Actions, see more here>>>, but not all our developer teams use Actions. Another way can be pre-commit hooks.
Planning
So, for now, let’s try a solution with Jenkins, although there are various ways to run it:
trigger a job with the GitHub Pull Request Builder
trigger a job with the через GitHub hook trigger for GITScm polling или Poll SCM
run just as a crontask
At first, we will create a simple job running by schedule and then will check for other solutions.
What do we have in our project:
around 200 Github repositories
around 10 developers teams — backend, frontend, analytics, iOS, Android mobile applications, gaming, and devops.
What we can do with Gitleaks:
create a Jenkins job for every team
the job will accept a parameter with a list of repositories of the team
will create a dedicated Slack channel for every team
once a day will run scanning and will send reports to a corresponding Slack channel
At first, let’s run Gitleaks manually to see how it’s working, and then will do an automation job.
Gitleaks — manual run
Install it. On Arch Linux, can be installed from AUR:
$ yay -S gitleaks
GitHub token
Next, need to create a token to access a GitHub organization’s repositories.
Go to your GitHub user’s settings, and create a token:
Give it repo permissions:
And run Gitleaks with the token, a repository’s URL, and add --verbose
, save results to a file:
$ gitleaks — access-token=ghp_C6h***3z5 — repo-url=https://github.com/example/BetterBI — verbose — report=analytics-repo.json
…\
INFO[0036] scan time: 32 seconds 756 milliseconds 672 microseconds\
INFO[0036] commits scanned: 1893
WARN[0036] leaks found: 111
Check the report:
$ less analytics-repo.json
And an example from the findings:
...
{
"line": " "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADA***CCaM=\n-----END PRIVATE KEY-----\n\",",
"lineNumber": 5,
"offender": "-----BEGIN PRIVATE KEY-----",
"offenderEntropy": -1,
"commit": "0f047f0cca3994b3465821ef133dbd3c8b55ee7a",
"repo": "BetterBI",
"repoURL": "https://github.com/example/BetterBI",
"leakURL": "https://github.com/example/BetterBI/blob/0f047f0cca3994b3465821ef133dbd3c8b55ee7a/adslib/roas_automation/example-service-account.json#L5",
"rule": "Asymmetric Private Key",
"commitMessage": "DT-657 update script for subs and add test\n\nDT-657 create test for check new json (add new subs)",
"author": "username",
"email": "example@users.noreply.github.com",
"file": "adslib/roas_automation/example-service-account.json",
"date": "2021-05-11T19:46:46+03:00",
"tags": "key, AsymmetricPrivateKey"
},
...
Here:
line
: what exactly was foundoffender
: a rule, triggered for this findingcommit
: a commit ID with the secret
The findings are performed with regex expressions, described in the default.go
.
Also, you can create your own configuration file and pass it to Gitleaks.
For example, the private RSA key above was found by the Asymmetric Private Key rule:
[[rules]]
description = "Asymmetric Private Key"
regex = '''-----BEGIN ((EC|PGP|DSA|RSA|OPENSSH) )?PRIVATE KEY( BLOCK)?-----'''
tags = ["key", "AsymmetricPrivateKey"]
So, we can create a dedicated config file for each team or repository and pass them via Kubernetes ConfigMap or as a file in a Jenkins job.
Jenkins job
Now, that we’ve seen how Gitleaks can be started, let’s add a Jenkins job to run it periodically.
Pipeline script
So, for each team, we will create a dedicated Jenkins job that will have a parameter with a repositories list of the team.
Loops in Groovy
Some time ago I did a similar solution using Golang, check the Go: checking public repositories list in Github. Go slices comparison post for details, and there it was a bit simpler to run a loop over the list. With Groovy, had to google a bit.
Create a new Jenkins job, and set its type to the Pipeline:
In the job’s setting, create a string parameter with a list of the team’s repositories, there are only two used:
Next, go to the Jenkins script.
Set a variable named $repos_list
, that will accept an environment variable $TEAM_REPOS
, and then by using the split()
method divide the lists' objects.
Then, by using the for loop integrate over them:
node('master') {
def repos_list = "${env.TEAM_REPOS}".split(',')
for (repo in repos_list) {
println repo
}
}
Run the job:
Jenkins Docker plugin
Our default approach to run Jenkins builds is by using a Docker container to keep the hosts’ system clean.
Add another parameter with the Password type, and save GitHub token here:
By using the Jenkins Docker Plugin create a Docker container with Gitleaks, pass the token, URL, and report file. Pay attention, that the report’s file will contain a repository’s name:
node('master') {
def repos_list = "${env.TEAM_REPOS}".split(',')
for (repo in repos_list) {
stage("Repository ${repo}") {
docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json"
}
}
}
}
Here, for every repository name from the repos_list
list, we will create a dedicated Jenkins Pipeline Stage that also will use a repository's name.
Run and check:
Um… And here is an issue: scan will be stopped right after the first findings in the first scanned repo, as Geatleaks found a leak, returned the exit 1 code, and the job was immediately stopped:
Ignoring errors in a Jenkins stage{}
To solve it, we can use the try/catch
solution: each stage will be running in its try
, in case of errors, we will catch them with catch
, and will proceed with the build:
node('master') {
def repos_list = "${env.TEAM_REPOS}".split(',')
def build_ok = true
for (repo in repos_list) {
try {
stage("Repository ${repo}") {
docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json"
}
}
} catch(e) {
currentBuild.result = 'FAILURE'
}
}
}
Run it:
Good — now all stages are running despite a previous stage result.
Slack notifications from Jenkins
The next step for us is to configure sending alarms to a Slack workspace.
Let’s use the Slack Notification plugin for this. See its documentation here>>>.
Create a Slack Bot
Go to the Slack Apps, and create a new application:
Go to the Permissions:
Add the following:
files:write
chat:write
Go to the OAuth & Permissions, and install the bot to the Slack workspace:
Save the token:
Jenkins credentials
Add the token to the Jenkins — go to Manage Jenkins > Manage Credentials:
Add a new one:
Set its type to the Secret file:
In the Slack workspace, create a new channel:
Invite the bot to the channel:
Add a new function to the Jenkins script — notifySlack()
, and run it from the catch{}
to send alarms if any secrets were found during the scan:
def notifySlack(String buildStatus = 'STARTED') {
// Build status of null means success.
buildStatus = buildStatus ?: 'SUCCESS'
def color
//change for another slack chanel
def token = 'gitleaks-slack-bot'
if (buildStatus == 'STARTED') {
color = '#D4DADF'
} else if (buildStatus == 'SUCCESS') {
color = '#BDFFC3'
} else if (buildStatus == 'UNSTABLE') {
color = '#FFFE89'
} else {
color = '#FF9FA1'
}
def msg = "${buildStatus}: `${env.JOB_NAME}` #${env.BUILD_NUMBER}:\n${env.BUILD_URL}"
slackSend(color: color, message: msg, tokenCredentialId: token, channel: "#devops-alarms-gitleaks-analytics")
}
node('master') {
def repos_list = "${env.TEAM_REPOS}".split(',')
for (repo in repos_list) {
try {
stage("Repository ${repo}") {
docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json"
}
}
} catch(e) {
currentBuild.result = 'FAILURE'
notifySlack(currentBuild.result)
}
}
}
jenkins.plugins.slack.StandardSlackService postToSlack Response Code: 404
Run the build, and get the following error:
12:42:39 ERROR: Slack notification failed. See Jenkins logs for details.
Check Jenkins logs on the https://<JENKINS_URL>/log/all:
Go to the Manage Jenkins > Configure System, find the Slack plugin’s options, and set the Custom slack app bot user:
The credentials here are default, we are overriding them from the pipeline.
In the Advanced remove the Override URL, if it was set:
Run again and now everything is working:
File upload to Slack
Now, let’s add a report file upload with findings to the message in the Slack channel by using the slackUploadFile()
function:
def notifySlack(String buildStatus = 'STARTED', reportFile) {
// Build status of null means success.
buildStatus = buildStatus ?: 'SUCCESS'
def color
//change for another slack chanel
def token = 'gitleaks-slack-bot'
if (buildStatus == 'STARTED') {
color = '#D4DADF'
} else if (buildStatus == 'SUCCESS') {
color = '#BDFFC3'
} else if (buildStatus == 'UNSTABLE') {
color = '#FFFE89'
} else {
color = '#FF9FA1'
}
def msg = "${buildStatus}: `${env.JOB_NAME}` #${env.BUILD_NUMBER}:\n${env.BUILD_URL}"
slackSend(color: color, message: msg, tokenCredentialId: token, channel: "#devops-alarms-gitleaks-analytics")
slackUploadFile(credentialId: token, channel: "#devops-alarms-gitleaks-analytics", filePath: "${reportFile}")
}
node('master') {
def repos_list = "${env.TEAM_REPOS}".split(',')
for (repo in repos_list) {
try {
stage("Repository ${repo}") {
docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json"
}
}
} catch(e) {
currentBuild.result = 'FAILURE'
notifySlack(currentBuild.result, "analytics-${repo}-repo.json")
}
}
}
The channel here can be moved to the job’s parameter later.
Here, in the notifySlack()
we've added another parameter - the reportFile
, and then during the notifySlack()
function call, we are passing the report's file as a second argument to the function.
Run the job, and check the Slack channel:
And the final thing is to set a schedule to run the job:
Gitleaks configuration
Commits to check
At this moment, Gitleasks will perform a full scan of the repository — all commits, all history.
If we run it every day, then each day we will get messages about some old problematic commits.
As a way to mitigate it, we can create two jobs: in the first job, we will do a full scan, and in the second one will perform a kind of incremental scan for changes, made during the last 24 hours.
I.e. the “incremental” job will be run daily at 12:00 pm when all developers are in the office, and the job will check commits for the last day only.
To do so, Gitleaks has the --commit-since
option. Let's add a new variable called yesterday
with yesterday's date taken by the previous()
method of the Date()
class, and then this date will be passed to the --commit-since
:
...
node('master') {
def repos_list = "${env.TEAM_REPOS}".split(',')
def yesterday = new Date().format( 'yyyy-MM-dd' ).previous()
println yesterday
for (repo in repos_list) {
try {
stage("Repository ${repo}") {
docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json --commit-since=${yesterday}"
}
}
} catch(e) {
currentBuild.result = 'FAILURE'
notifySlack(currentBuild.result, "analytics-${repo}-repo.json")
}
}
}
Gitleaks configuration file
Another thing is to create a dedicated rules file for the Gitleaks.
This can be done with the --repo-config-path
, and in each repository, we can add its own configuration file.
Add some default rules there, plus I’d like to check for passwords passed as plaintext to commits:
...
[[rules]]
description = "Plaintext password"
regex = '''(?i)pass*[a-z]{5}[:|=]? +["|'](.*)["|']'''
tags = ["password", "PlainTextPassword"]
[allowlist]
description = "Allowlisted files"
files = ['''^\.?gitleaks.config$''']
With the (?i)pass*[a-z]{5}[:|=]? +["|'](.*)["|']
regular expression, we are looking for a string starting with pass, then for a ":" or "=" symbol, then it can contain or not a space, then a quote mark, then any text, and a quote mark again.
Seems must be working:
Save it to the repository as .github/gitleaks.config
, and in the job add another parameter by using this file:
...
docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json --commit-since=${yesterday} --repo-config-path=.github/gitleaks.config"
}
...
That’s all for now.
Originally published at RTFM: Linux, DevOps, and system administration.