Background: JupyterHub is an easy-to-use, browser-based interface to the Spark + Scala + Python environment we’ve been experimenting with over the past few months. JupyterHub is an always-on Jupiter notebook environment that, unlike Jupiter notebooks, does not require a user to configure it on their local laptop and allows to run long jobs. Think of it as what GitHub does for git, or what DockerHub does for Docker. JupyterHub does it for Jupiter notebooks. It is multi-user, which lets multiple researchers share the environment.
In practice, when a researcher is ready to start coding their project in Python or Scala and that code’s execution needs to be striped [“striped” as in “stars in stripes” ] across multiple high-computing nodes, the researcher can simply point their browser at the JupyterHub URL, log in, and they will be presented with a fairly respectable Integrated Development Environment (IDE) that will execute – line by line, with reviewable output – any code they write. And that code is executed against the multi-node HPC environment. It is very powerful, and very cool.
To appreciate what JupyterHub does, it would probably help to understand what a researcher would have to do without JupyterHub. The answer: a lot. There would have to be deep Unix shell experience, and proficiency with a Java build tool called Maven. The researcher would have to understand building .jars [Packaged Java Code], and the Java build environment, and the Spark submit environment [I added italics and bolding because I think we are using parallel “ands” for emphasis]. The researcher would have to know an editor such as vim; Python and Python virtual environments; a Python virtual environment tool called conda, a python library management tool called pip. By having JupytherHub in place, most if not all of the above will be handled by an administrator instead, and the researcher can focus on writing and executing code against the environment. However, if the researcher does have those skills, JupyterHub gives them the option of launching a web-based terminal session into the underlying environment. Which is also really cool.
An Ubuntu 18 box built for the project. The version of JupyterHub used, “The Littlest Jupyter Hub,” requires Ubuntu 18. Configured to serve JupyterHub via https.
Python3, git, curl (apt-get)
curl https://raw.githubusercontent.com/jupyterhub/the-littlest-jupyterhub/master/bootstrap/bootstrap.py | sudo -E python3 – –admin <admin-user-name>
JupyterHub plugin for LDAP authentication:
This is the configuration of JupytherHub on the server. It uses the handy “tljh-config show” command to dump it: (“tljh” stands for The Littlest Jupyter Hub, which is the name of the version we installed).
root@pom-itb-jhubdev:~# tljh-config show
client_id: <secret from github>
client_secret: <secret from github>
People To Thank:
Jonathan Lanyon – Assistance with configuring AD authentication. Ultimately it was learned that this tljh version doesn’t appear to support AD authentication, but the work performed is needed for when we put in a version that does it.
Pat Flannery – guidance with virtual hosts and building and configuring the Ubuntu VM.
Michael Ramsey – his assistance with diagnosing Active Directory configuration issues, comparing them with environments known to work.
Asya Shklyar – The omniscient leader of all things HPC, without whose vision there would be nothing. She confirmed JupyterHub as the way to go, and offered Binder as the next step evolution. She also set the challenge of AD authentication, which has not been met … yet. (foreshadowing)
More Reading – Additional Docs:
The main JupyterHub docs: https://jupyterhub.readthedocs.io/en/stable/
Ldapauthenticator docs: https://github.com/jupyterhub/ldapauthenticator
Configuring authenticators for tljh: http://tljh.jupyter.org/en/latest/topic/authenticator-configuration.html
Binder docs: https://binderhub.readthedocs.io/en/latest/
By Andrew Crawford