SPARQL Endpoints as Stateless Containers
This post draws primarily on stateless containers and how to use them as a scalable, cost-effective and secure deployment strategy for hosting SPARQL endpoints. It extends stain/jena-fuseki Dockerfile to mount either Google Cloud Filestore using NFS as the volume or GCS FUSE for an even more cost-effective solution for data persistance.
Mounting Filestore as a network file system onto a Cloud Run Service running Apache Jena Fuseki
Dockerfile snippet
...
# GCS FUSE CLOUD RUN SETUP
RUN apt-get update -y
RUN apt-get install -y nfs-common nfs-kernel-server
# Config and data
ENV FUSEKI_BASE /fuseki
...
docker-entrypoint.sh snippet
#!/bin/bash
...
mkdir -p $FUSEKI_BASE
echo "Mounting Cloud Filestore."
mount -o nolock 10.192.202.18:/fuseki $FUSEKI_BASE
echo "Mounting completed."
...
Mounting Google Cloud Storage as a network file system onto a Cloud Run Service running Apache Jena Fuseki
Dockerfile snippet
...
# GCS FUSE CLOUD RUN SETUP
RUN set -e; \
apt-get update -y && apt-get install -y \
gnupg2 \
tini \
lsb-release; \
gcsFuseRepo=gcsfuse-`lsb_release -c -s`; \
echo "deb http://packages.cloud.google.com/apt $gcsFuseRepo main" | \
tee /etc/apt/sources.list.d/gcsfuse.list; \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | \
apt-key add -; \
apt-get update; \
apt-get install -y gcsfuse && apt-get clean
# Config and data
ENV FUSEKI_BASE /fuseki
...
entrypoint snippet
#!/bin/bash
...
mkdir -p $FUSEKI_BASE
echo "Mounting GCS Fuse."
gcsfuse --debug_gcs --debug_fuse gcs_fuse_cloud_run $FUSEKI_BASE
echo "Mounting completed."
...
Things to note:
- Concurrency
- Multiple services using the same bucket - gcs-fuse or NFS - Filestore
- VPC-SC and Access Control Policies to limit access to the SPARQL endpoint
- See the stain/jena-docker repository for more information on jena and jena-fuseki.