Hosting my Git Repositories
For a long while now it has become more and more evident that you just can't trust companies to host your data. This is true even if you are paying for said services. This has become even more obvious with the introduction of large language models, etc.
Additionally, the products that exist in the space of Git hosting are less than ideal in my opinion. Instead of them focusing on making Git hosting as good as it possibly could be. They are focused on building AI platforms that integrate with all your Git repositories. Which I have zero interest in when I am paying for Git hosting. Beyond that these companies are built on a software development model that is clearly less than ideal but they have no interest in rethinking that model.
For these reasons and more I have decided it probably makes the most sense to simply host all my Git repositories myself.
The following are my notes around how I accomplished this so that if anyone else wants to do the same they would hopefully be able to figure out.
What we need
Basically, we need a server to host the repositories and run the software. You can of course pay some VPS provider like Digital Ocean or you can run your own server at your house or wherever you want. Note: If you run it at your house you will likely need to figure out some sort of Dynamic DNS. I believ Cloudflare provides this as a service. But I am sure there are others.
Then you need to have an operating system. I personally prefer Arch Linux so I have a server running Arch Linux.
We are going to be hosting our Git repositories over the SSH protocol with the help of a tool called Gitolite. Gitolite facilitates authenticating users and managing access to the Git repositories over SSH public/private key pairs.
The Setup
This portion assumes you already have an Arch Linux system setup for your server.
Get SSH Daemeon Setup
First thing we need to do on that server is installed the openssh package so
that we will have the ssh daemon installed.
sudo pacman -Sy openssh
Next we want to start the ssh daemon.
sudo systemctl start sshd
Then we want to enable it so that it starts up at boot.
sudo systemctl enable sshd
We can then verify it is running with the following.
systemctl status sshd
Then I configured sshd by editing /etc/ssh/sshd_config by updating it so that
the following are set. Explanations of what they do are in comments above them
all below. This is effectively to harden sshd as may be exposed to the internet
directly and it needs to be as secure as possible.
# Enable public key authentication
PublickeyAuthentication yes
# Tell it where to find the file containing authorized keps
AuthorizedKeysFile .ssh/authorized_keys
# Prevent password authentication
PasswordAuthentication no
# Prevent challenge response auth
ChallengeResponseAuthentication no
# Disable PAM authentication
UsePAM no
# Prevent root login
PermitRootLogin no
# Prevent keyboard interactive auth
KbdInteractiveAuthentication no
# Don't allow any empty passwords
PermitEmptyPasswords no
# Restrict allowed users
AllowUsers youruser gitolite
# Strong key exchange algorithms
KexAlgorithms curve25519-sha256,[email protected]
# Strong ciphers
Ciphers [email protected],[email protected]
# Strong MACs
MACs [email protected],[email protected]
# Only allow modern host keys
HostKeyAlgorithms ssh-ed25519
# Disable agent & TCP forwarding unless you need it
AllowAgentForwarding no
AllowTcpForwarding no
X11Forwarding no
# Disable tunneling unless needed
PermitTunnel no
# Disable rhosts-style trust
IgnoreRhosts yes
HostbasedAuthentication no
# Limit auth attempts
MaxAuthTries 3
# Limit concurrent connections
MaxSessions 2
MaxStartups 10:30:60
# Drop idle connections
ClientAliveInterval 300
ClientAliveCountMax 2
Then I restarted the sshd to have the changes take place.
sudo systemctl restart sshd
After this I simply scp'd my public ssh key over to the server and added it to
the authorized_keys file as follows so that I could easily ssh over to the
server and manage things.
cat ~/.ssh/id_ed25519.pub >> ~/.ssh/authorized_keys
Then I simply made sure that the file had the correct permissions.
chmod 600 ~/.ssh/authorized_keys
chown -R user:user ~/.ssh
Install & Setup Gitolite
First I installed the dependencies as follows.
sudo pacman -Sy base-devel git
Then I installed Gitolite as follows.
sudo pacman -Sy gitolite
The above created the gitolite user on the system and setup
/var/lib/gitolite as it's home directory.
Then simply needed to initialize Gitolite. In order to do this I ran the following.
sudo -u gitolite gitolite setup -pk path/to/my/public/ssh/key
This setup the gitolite-admin repository as well as a testing repository.
The gitolite-admin repository is how you configure and manage Gitolite. So
you simply clone it and make changes to the configs in there and commit and
push it up. It then handles creating repositories appropriately for you with
the configured access controls.
Which supports what you need for private repositories as well as anonymous (unauthenticated) repositories which are useful for opensource.
What about Off-site Backups
One side benefits I was getting from the services I was leaving was off-site backups. So, I figured I might as well address that in my setup as well.
Turns out there is a pretty grate backup tool called Restic that integrates with Backblaze B2 that allows you to configure backups to be done and stored in a Backblaze B2 bucket pretty easily.
First I had to decide what I wanted to backup. In my case I simply wanted the following.
/var/lib/gitolite/repositories
/var/lib/gitolite/.gitolite
Then I installed Restic as follows.
sudo pacman -Sy restic
Then I had to go and create my Backblaze B2 bucket and an Application Key, which I captured the bucket name, key id, and key itself and did the following.
sudo mkdir -p /etc/restic
sudo nvim /etc/restic/gitolite.env
I then created filled /etc/restic/gitolite.env with the following content.
export RESTIC_REPOSITORY="b2:BUCKET_NAME:/gitolite"
export RESTIC_PASSWORD="use-a-long-random-passphrase"
export B2_ACCOUNT_ID="your-application-key-id"
export B2_ACCOUNT_KEY="your-application-key"
Then I set the perms on that directory and file as follows.
sudo chmod 600 /etc/restic/gitolite.env
sudo chown root:root /etc/restic/gitolite.env
Then I initialized the restic repository.
sudo -i
source /etc/restic/gitolite.env
restic init
Then I ran a manual backup to test things out.
restic backup \
/var/lib/gitolite/repositories \
/var/lib/gitolite/.gitolite
restic snapshots
Then I added an exclusion file to exclude unneccessary files.
nvim /etc/restic/gitolite.exclude
and gave it the following content.
**/tmp
**/*.lock
**/lost+found
Then I tested manual backup again with it.
restic backup \
--exclude-file /etc/restic/gitolite.exclude \
/var/lib/gitolite/repositories \
/var/lib/gitolite/.gitolite
Then I came up with a simple retention policy and tested it as follows.
restic forget \
--keep-daily 14 \
--keep-weekly 8 \
--keep-monthly 12 \
--prune
This removes old snapshots, etc. based on the retention policy.
Then I created a backup script so that I could automate this.
nvim /usr/local/bin/restic-gitolite-backup.sh
with the following content.
#!/usr/bin/env bash
set -euo pipefail
source /etc/restic/gitolite.env
restic backup \
--exclude-file /etc/restic/gitolite.exclude \
/var/lib/gitolite/repositories \
/var/lib/gitolite/.gitolite
restic forget \
--keep-daily 14 \
--keep-weekly 8 \
--keep-monthly 12 \
--prune
Then I made it executable as follows.
chmod +x /usr/local/bin/restic-gitolite-backup.sh
Then I registered it as a systemd service by creating the following.
nvim /etc/systemd/system/restic-gitolite-backup.service
with the following content.
[Unit]
Description=Restic backup of Gitolite repositories
Wants=network-online.target
After=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/restic-gitolite-backup.sh
Then I created a systemd timer to run it automatically.
nvim /etc/systemd/system/restic-gitolite-backup.timer
with the following content.
[Unit]
Description=Daily Restic Gitolite backup
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
Then I simply enabled it as follows.
systemctl daemon-reload
systemctl enable --now restic-gitolite-backup.timer
And verified that it was properly registered and running with the following.
systemctl list-timers
Finally I tested the backups to make sure I could restore from them as follows.
restic snapshots
restic restore latest --target /tmp/restore-test
In theory you should be able to restore an individual repo as follows. But I have not tested this yet.
restic restore latest \
--include /var/lib/gitolite/repositories/myrepo.git \
--target /tmp/restore-test
OpenSource Repositories
Setup hosting of anonymous repositories with git:// protocol which doesn't
facilitate any authentication. I do this via the git-daemon which comes with
git. It also has the benefit of playing nicely with Gitolite and therefore
allows sharing the same repositories store.
To do this I created a systemd service description for it.
/etc/systemd/system/git-daemon.service
with the content
[Unit]
Description=Git Daemon (anonymous read-only)
After=network.target
[Service]
User=gitolite
Group=gitolite
ExecStart=/usr/lib/git-core/git-daemon \
--reuseaddr \
--base-path=/var/lib/gitolite/repositories \
--disable=receive-pack \
--informative-errors \
--verbose
Restart=on-failure
[Install]
WantedBy=multi-user.target
Then I enabled and started the service as follows.
sudo systemctl daemon-reload
sudo systemctl enable git-daemon
sudo systemctl start git-daemon
sudo systemctl status git-daemon
Then I simply exposed the git-daemon port 9418 in the firewall so that
externally people would be able to access the opensource repositories.
Dynamic DNS
I am using Cloudfare for DNS zone hosting. So I whipped up the following little
script ~/bin/update_dns_record_with_public_ip, to dynamically set the IP
address.
#!/usr/bin/env bash
set -euo pipefail
ZONE_ID=
RECORD_ID=
CLOUDFLARE_API_TOKEN=
SUB_DOMAIN=
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
LOGFILE="$HOME/logs/cloudflare-ddns.log"
# Fetch the RECORD_ID of the subdomain. This only has to be done initially to get the value for RECORD_ID
# So just uncomment this, save the file, and run the script to get the RECORD_ID. Then simply update the
# script with the RECORD_ID value, recomment these lines, and save the script and you should be good to go.
#
# curl -X GET "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records?name=$SUB_DOMAIN" \
# -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"
# exit 0
# Fetch the public IP address using ifconfig.me
IP_ADDRESS=$(curl -s ifconfig.me)
# Get the IP address Cloudflare currently has for the subdomain
DNS_IP=$(curl -s \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
-H "Content-Type: application/json" \
"https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" \
| jq -r '.result.content')
# 3. Compare existing DNS record's IP addr to the current public IP addr
if [[ "$IP_ADDRESS" == "$DNS_IP" ]]; then
echo "$TIMESTAMP - IP unchanged ($IP_ADDRESS). No update needed." >> "$LOGFILE"
exit 0
fi
RESPONSE=$(curl -X PUT "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
-H "Content-Type: application/json" \
--data "{
\"type\": \"A\",
\"name\": \"$SUB_DOMAIN\",
\"content\": \"$IP_ADDRESS\",
\"ttl\": 120,
\"proxied\": false
}")
# Check success
SUCCESS=$(echo "$RESPONSE" | jq -r '.success')
if [[ "$SUCCESS" == "true" ]]; then
echo "$TIMESTAMP - DNS update successful ($IP_ADDRESS)" >> "$LOGFILE"
else
ERR_MSG=$(echo "$RESPONSE" | jq -r '.errors[]?.message')
echo "$TIMESTAMP - DNS update failed: $ERR_MSG" >> "$LOGFILE"
fi
This script expects your user's hame directory to also have a logs directory
to be able to write it's logs out to.
This script fetches the public IP address using ifconfig.me and then uses the Cloudflare API to update the subdomain.
In order for this to work you need to create an API Key that is setup for Edit zone. This can be done from the Cloudflare -> Avatar -> Profile -> API Keys.
Then to use this you obviously would need to fill in the variables at the top of the script. The ZONE_ID can be obtained by selecting the zone in the Cloudflare UI and looking in the right sidebar for the API section and grabbing the zone id value. The RECORD_ID I believe is only available by using the API call that is commented out near the top of the script.
Once I had the script setup I was able to test it by just running it. But, I don't want to manually do this. I want it to automatically be run.
So I installed cronie with the following.
sudo pacman -Sy cronie vi
mkdir -p ~/.cache
sudo systemctl enable cronie.service
sudo systemctl start cronie.service
Then I edited my users crontab via crontab -e and gave it the following
content.
*/5 * * * * /home/YOUR-USERNAME/bin/update_dns_record_with_public_ip >> /home/YOUR-USERNAME/logs/cloudflare-ddns-cron.log 2>&1
0 0 * * * truncate -s 0 /home/YOUR-USERNAME/logs/cloudflare-ddns.log
The above registers two cron jobs. The first one runs the script every 5 mins to make sure that the DNS record is up to date. The second one runs once every day and truncates the log file so it does not get out of hand.
I then waited and checked the log files to make sure everything was working as expected.
Don't also forget that you need to likely modify your router configuration to forward port 22 from the outside of your network to your server's internal IP address. That is if you aren't using a hosted VPS. If you are using a hosted VPS you shouldn't need this entire Dynamic DNS section at all as you generally get a static IP address.
What about PRs, comments, etc.
Personally, I don't want or need any of those things. There are plenty of tools that can be used to facilitate those types of interactions without the need for the Git hosting to do them. A very basic one is the concept of patches and emails to say the least for open source repositories.