Python & Node.js in Linux Userspace
Bootstrapping a Combined Virtual Environment for R&D
In this blog article I demonstrate how to bootstrap a combined Python and Node.js virtual environment completely in userspace on Linux. Root privileges might only be required for installing system-level dependencies. The described setup serves as my current fundamental baseline for development and data analysis work. It is based on Python 3.4 (CPython), virtualenv, nodeevn, Node.js 6.2, PyQt4, numpy, matplotlib, pymongo and h5py. In runs on top of both 32 bit (x86) and 64 bit (x86_64) openSUSE 13.1 Linux (now, as of early 2016, in long-term "Evergreen" support). The following text might be applicable to other versions of openSUSE or, in general, other Linux distributions, though package names and versions might differ.
Why should you care?
There are many reasons why this approach makes sense, though all ultimately depends on your requirements. In my case, as far as Python is concerned, I prefer a younger version of Python than the versions shipped with openSUSE 13.1 - 2.7 and 3.3. Although there are in fact later versions of the Python interpreter available as packages from a number of community repositories, installing them usually messes up distribution-specific symlinks in /usr/bin which must eventually be repaired manually. Besides, I want to avoid installing Python packages system-wide and I want to be able to quickly re-compile and re-install the Python interpreter with different options / flags. In the past, I was also confronted with the need to quickly set up (and maintain) compute nodes. Here, it made sense to equip the nodes with an as rudimentary operating system as possible and deploy the latest version of my evolving virtual environment before every large(r) computation run. In another very common scenario, you might be confronted with a user account without administrative privileges at your work place or university, for which this method also applies.
Combining Python and Node.js is almost a story on its own. While most of my code is written in Python (and C), it occasionally happens that I find a very useful library, which is written in JavaScript and runs on Node.js (or someone deploys some useful JavaScript code in his website and I want to use it). With the increasing popularity of Node.js, the number of actually useful data analysis tools in this ecosystem is increasing dramatically. Therefore I want to be able to quickly install Node.js packages with npm without cluttering my system. Beyond that, circumventing the relatively old version of Node.js shipped by openSUSE makes sense for the exact same reasons as previously described for the Python interpreter. Combining my Python virtual environment (virtualenv) with an isolated Node.js environment is the logic consequence and thankfully there is a (Python) tool for that: nodeevn.
In this blog article, I will compile and install Python 3.4 (the "original" CPython from python.org) from source just like every other subsequent software package with user privileges only.
Prerequisites
Make sure that operating system satisfies all requirements (tools, libraries, headers) of Python and Node.js. On openSUSE, it should be sufficient to install the patterns "patterns_openSUSE-devel_basis" and "patterns_openSUSE-devel_C_C++". Beyond that, make sure that you have a version of readline (5 or 6) and its headers (readline-devel) installed or the Python interpreter will eventually complain on start-up. A detailed description of the Python build process is provided in the Python Developer's Guide. For Node.js, you might have to install openssl and its headers (libopenssl-devel).
Beyond that, I will explain how to install PyQt4 (which is required by matplotlib for example). Please note that it is required neither by Python nor by Node.js, so might want to ignore those sections. However, if you want to install PyQt, you also need to ensure that you have the required development packages for Qt installed on your system.
Getting started: Compiling Python, creating a virtual environment
For most of my work, I prefer to use a central project folder, which is located on my desktop. Let's go there.
Before I begin install anything, I create a bunch of folders for future use. The first one, "_python34", will eventually become the Python root directory. The second one, "_env34a", will hold my virtual environment. Note that I name the folders after the Python version (3.4) they contain while the appended "a" indicates that the folder contains the first virtual environment based on this version of Python. Systematic naming will significantly ease the management of additional interpreters and virtual environments.
Let's download the source code of the latest version of CPython 3.4 from python.org:
The xz-compressed tar-ball must be unpacked, which leaves a new directory named "Python-3.4.4" in my project folder. I prefer to add "src." as a prefix to the names of folders which contain source code. Then I change into the source code folder.
Now I configure, compile and install the Python interpreter. Do not forget to point to the previously created Python root directory using the "prefix" switch when configuring the source code.
If the above process exited successfully, it should be possible to create and "source" a virtual environment.
It is a good idea to link to the Python headers from within the virtual environment. Some programmes will expect it there.
Installing PyQt4 into the virtual environment
One common dependency for many GUI applications happens to be Qt because it is simply one of the best open source multi-platform GUI libraries. Most Qt applications still use Qt4 (though Qt5 is increasing in popularity), so it is usually a good idea to install PyQt4. This is where things become a little bit challenging. PyQt4 does not support being installed into a virtual environment out of the box. I need to apply a few tweaks to make it work. However, before I can even look at Qt, I must install SIP. SIP is a Python framework for accessing large C/C++ libraries written by same people who also maintain PyQt. It is important to find a matching pair of SIP and PyQt because otherwise the installation of PyQt might fail. Because it is unfortunately impossible to determine a match based on the version numbers, one must dive a little deeper into Sourceforge.net and check when the desired version of PyQt was uploaded. Then one can go for a version of SIP which was published around the same date.
openSUSE 13.1 ships with Qt 4.8.5 which can be determined by running qmake.
For this version of Qt, my matching (and working) pair of SIP and PyQt is sip-4.16.9 and PyQt-x11-gpl-4.11.4, both published on August 1st 2015. You can find both on Sourceforge.net here and here. Once I have downloaded the corresponding files, I move them to the project folder and unpack them with the following commands.
Now I can configure, compile and install SIP. Note that one must point to the Python include directory with its absolute path. Anything else does not seem to work. For the configuration of the source code, I am already using the Python interpreter from the virtual environment.
At this point, I can build PyQt. First, I go into its source code directory and configure the source.
The created "Makefile" needs a few small changes. I open it with a text editor and change the paths in lines 52 and 53 as follows:
Note that the line numbers in your Makefile might differ from mine though it should be relatively easy to find the right ones. You should look for lines containing install instructions for "qsci".
Now I can actually compile and install PyQt4 into the virtual environment. This step can take quite a while.
Installing relevant Python packages
After the installation of PyQt is complete, it is time to install further "must-have" Python packages. But before that, you should not forget to update pip to its latest version.
If you have successfully suffered through the installation of PyQt, you can now also install matplotlib.
Installing Node.js into the virtual environment
With the last step finalized, my baseline environment for development and data analysis is almost finished. The only thing missing is Node.js. As mentioned before, it is my goal to install Node.js directly into the virtual environment. Therefore I need another Python package: nodeenv. It takes care of downloading, compiling and installing Node.js from within the virtual environment and automates the process almost entirely. Let's install it.
Nodeenv is a rather versatile tool. It is definitely worth reading its manual as well as a number of blog posts illustrating its capabilities like this one for example. For my purposes, it is enough to run it once with the "p" switch enabled. This will give me Node.js and npm.
Let's test npm by installing the d3 package for Node.js. Do not forget to enable the "g" switch.