Write Your Own CI/CD pipeline, Part II

In the first part of this post we setup a basic git webhooks receiver but there are some rough edges and shortcomings, which we’ll try to address here.

NOTE: you can check the source code for this post at juankman94/githooks-receiver.

SSL

The easiest way to add SSL support to our application is to NOT add SSL support to our application. What I mean by that is we won’t add code to support it but rather rely on an external application to provide SSL for us. How you can do this depends on your stack, it can be done via httpd(8), nginx(8), relayd(8) etc.

I like httpd so that’s what I’ll show here. The first thing we need to do is setup SSL for apache and then use it to proxy the requests to our application. The configuration for that is:

<VirtualHost *:443>
  #...

  # see https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypass
  <LocationMatch "^/repo/(?<repo>[^/]+)$">
    AllowMethods POST
    ProxyPass "http://localhost:8008"
  </LocationMatch>

  SSLEngine On
  #...
<VirtualHost>

And that’s it! Be sure to check the proxy documentation for more details and options.

If you have selinux(8) enabled, you need to give permission to httpd(8) to connect to other network services. The appropiate boolean is httpd_can_network_connect, to to make the change permanent run:

$ sudo setsetbool -P httpd_can_network_connect=1

Logging

As per the documentation, sinatra relies on the Rack handler’s logging settings, so there are several ways to configure it. I went for the simplest approach I saw:

#...
require "logger"

LOGGER_FILE = "log/hooks.log"

logger = Logger.new(LOGGER_FILE, level: Logger::INFO)
before { env["rack.logger"] = logger }
#...

And now we can log requests messages from our application like:

logger.info "[#{ params["repo"] }] received event"

Execution Environment

Handling external processes through ruby core’s Process module feels a bit cumbersome, so I decided to look for an alternative in markets/awesome-ruby and I found posix-spawn which provides a more efficient way to spawn new processes, but what got my attention is that you can both provide custom environment variables to the child process AND restrict access to the parent’s environment variables.

Let’s add 'posix-spawn' to our Gemfile and update app.rb as follows:

require "posix/spawn"

# ...
  if ALLOWED_REPOS.include?(repo)
    data = JSON.parse request.body.read, symbolize_names: true
    cmd_env = {
      "HOME" => ENV["HOME"],
      "PATH" => (ENV["BUNDLER_ORIG_PATH"] or ENV["PATH"]),
      "PWD"  => ENV["PWD"],
      "USER" => ENV["USER"],
    }
    cmd_options = {
      unsetenv_others: true,
    }
    logger.info "[#{ repo }] received event"

    pid = POSIX::Spawn::spawn(cmd_env, cmd, data[:checkout_sha], cmd_options)
    Process.detach(pid)
  else
# ...

Only a subset of the environment variables are passed down to the child process for various reasons: one being for better security and, secondly because whenever a bundle starts a program (like receiver) it sets a bunch of environment variables in the execution context, some being for ruby(1), others for gem and a lot more for bundle itself (all starting with BUNDLER_).

So whenever a deployment script uses bundle it’s not using the system’s installation but the receiver bundle installation – which is wherever BUNDLE_PATH points to at the moment of execution AND using the project’s Gemfile (read via environment variable BUNDLE_GEMFILE), so if you run:

$ bundle install --path vendor/bundle
$ bundle exec ruby app.rb --port 8008
# $BUNDLE_GEMFILE    => $PWD/Gemfile
# $BUNDLER_ORIG_PATH => $PATH
# $PATH              => $PWD/vendor/bundle/ruby/<ruby-version>/bin:$PATH

The nuance here is that if you have a deployment script that runs bundle install it will be using receiver’s bundle version to install receiver’s dependencies, not the repo’s dependencies!

Sinatra & SystemD

We also want to set sinatra’s environment to production, and thankfully we can set that through systemd.service(5):

# githooks.service
# ...
[Service]
Environment=APP_ENV='production'
Type=simple
ExecStart=/opt/githooks/bin/run

Of course, after editing the service file we need to reload the daemon and restart the service:

% systemctl daemon-reload
% systemctl restart githooks

Authorization

GitLab provides a way to send a secret token with each webhook to validate the authenticity of the received payloads.

GitLab's Webhooks secret token screen capture. — You can set different tokens for different webhooks.

It can be anything you want; I decided to use an UUID because it’s easy to generate/replace and random enough. You can get one with the uuidgen(1) command. In order to let the token be easily replaceable we need to put it in a directory which is accessible to the program but also independent from the service manager, systemd in this case. So we can store it in a file in the application directory (and exclude it from the CVS):

$ cd ~/receiver
$ echo '.token' >> .gitignore
$ uuidgen > .token

And from the application perspective, it can be useful to have more than one mechanism to obtain the token: we can read it from the .token file OR from the execution environment, if we want the token to persist only in memory. The code to obtain the token would look like this:

# ...
@token = nil

def load_auth_token
  if ENV["GITHOOKS_AUTH_TOKEN"]
    @token = ENV["GITHOOKS_AUTH_TOKEN"]
  elsif File.file?(".token")
    @token = File.open(".token").readline.chomp
  else
    raise "Missing token"
  end
end

def validate_token(header_token)
  raise "Invalid token" if header_token != @token
end

load_auth_token

post "/repo/:repo" do
  begin
    validate_token request.env["HTTP_X_GITLAB_TOKEN"]
    # ...
  rescue => err
    status 401
    resp = { status: "bad request: #{ err }", code: 401 }
  end
# ...

NOTE: you probably DO NOT want to handle exceptions like this, but this example is only illustrating how one could validate the token, though a better approach would be to create a specific error class, e.g., InvalidTokenError and raise it from our validate_token method to later have rescue InvalidTokenError => err in our block. DO NOT implement the code above verbatim.

Pipeline reporting

Okay, I haven’t figured this one out. But I’ll be sure to update the post when I do!

TODO: add reporting module.

UPDATE: a previous version of this post suggested to run the systemd.service as a regular user, but that causes issues ‘cuz those services only run when the user has an active session.