SPARQL Endpoints as Stateless Containers

This post draws primarily on stateless containers and how to use them as a scalable, cost-effective and secure deployment strategy for hosting SPARQL endpoints. It extends stain/jena-fuseki Dockerfile to mount either Google Cloud Filestore using NFS as the volume or GCS FUSE for an even more cost-effective solution for data persistance.

Mounting Filestore as a network file system onto a Cloud Run Service running Apache Jena Fuseki

Dockerfile snippet

...

# GCS FUSE CLOUD RUN SETUP

RUN apt-get update -y
RUN apt-get install -y nfs-common nfs-kernel-server


# Config and data
ENV FUSEKI_BASE /fuseki

...

docker-entrypoint.sh snippet

#!/bin/bash

...

mkdir -p $FUSEKI_BASE
echo "Mounting Cloud Filestore."
mount -o nolock 10.192.202.18:/fuseki $FUSEKI_BASE
echo "Mounting completed."

...

Mounting Google Cloud Storage as a network file system onto a Cloud Run Service running Apache Jena Fuseki

Dockerfile snippet

...

# GCS FUSE CLOUD RUN SETUP
RUN set -e; \
    apt-get update -y && apt-get install -y \
    gnupg2 \
    tini \
    lsb-release; \
    gcsFuseRepo=gcsfuse-`lsb_release -c -s`; \
    echo "deb http://packages.cloud.google.com/apt $gcsFuseRepo main" | \
    tee /etc/apt/sources.list.d/gcsfuse.list; \
    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | \
    apt-key add -; \
    apt-get update; \
    apt-get install -y gcsfuse && apt-get clean


# Config and data
ENV FUSEKI_BASE /fuseki

...

entrypoint snippet

#!/bin/bash

...

mkdir -p $FUSEKI_BASE
echo "Mounting GCS Fuse."
gcsfuse --debug_gcs --debug_fuse gcs_fuse_cloud_run $FUSEKI_BASE
echo "Mounting completed."

...
Things to note:
  • Concurrency
  • Multiple services using the same bucket - gcs-fuse or NFS - Filestore
  • VPC-SC and Access Control Policies to limit access to the SPARQL endpoint
  • See the stain/jena-docker repository for more information on jena and jena-fuseki.