Adding command line support to the namescript user script

, .

One of the projects I worked on at Wikimedia Hackathon 2018 was adding support for running the namescript user script on the command line. Since the general process I used could be useful for other user scripts as well too, I’m going to describe it here.

Situation / requirements

namescript is a Wikidata user script to add labels, descriptions and aliases to Wikidata items for names. It is usually run when the user visits an item page and clicks a link on it, and only updates data that is not already there (for example, an existing description will not be replaced).

Recently, Harmonia Amanda, the script’s maintainer, wanted to fix a list of names with the script (initially 2000, then over 10000, by now more than 200000). These names all had incorrect descriptions, so the script should in this case delete all descriptions before adding new ones (alternatively, this could be done with the dataDrainer script as well), and ideally it would be possible to do this without visiting all the thousands of items individually.

To make it possible to batch-edit items using the script, I decided to turn it into a script that could be run on the command line using Node.js. However, the script is sometimes updated on Wikidata, e. g. to add new translations (either of the user interface or of the descriptions), so I wanted to avoid having a second codebase for the CLI version which would slowly fall behind the user script version, or even diverge from it. To avoid this, the script was rewritten in a way such that it supports both modes of operation: in the browser and on the command line.

New source code layout

The main functionality of namescript now lives in namescript-lib.js, which “exports” a single namescript global that the other scripts work with. (We’re ignoring any actual module system here because I’m not sure how to make that work. Just one global seems acceptable to me.) It uses some environment-dependent functions which are not directly implemented in it, but should be filled in at namescript.config (see below).

Some large datasets belonging to the script (e. g. translations) are moved to namescript-data.json. I initially did this just out of a sense of aesthetics, but it turns out ot be mildly useful as well because MediaWiki has some special support for JSON files, ensuring that they are well-formed and consistently formatted.

Two entrypoints then are responsible for loading namescript-lib.js and namescript-data.json, configuring namescript-lib.js, storing the data inside the namescript global, and starting the process. namescript-browser.js does this in a browser environment, loading the files via jQuery and adding a link to the page which, when clicked, will trigger the necessary edits using mw.Api. namescript-cli.js is the Node.js CLI counterpart, running the necessary edits directly using mwbot according to command line arguments. Both register various functions (how to log messages, how to make API requests, etc.) in namescript.config, where they are used by namescript-lib.js.

Finally, a convenience entrypoint namescript.js detects the current environment (browser or Node.js) and loads the correct real entrypoint. You can also load the correct entrypoint directly, but users of the user script were already referencing namescript.js, and it’s also more convenient to use from the CLI.

Further development

As I mentioned, the goal of this reorganization was to have a single code-base power the Wikidata user script and the command line script. This means that the canonical source for the script’s code is still Harmonia Amanda’s user namespace on Wikidata; however, since I don’t have permission to edit that (and I like working with Git anyways), I’m also mirroring the code on GitHub. The process to make any changes usually goes something like this:

  1. Harmonia Amanda contacts me with a feature request, e. g. it would be nice if the command-line version supported reading item IDs from files.
  2. I run namescript-download.js to update my local copy with any changes made on-wiki (e. g. in namescript-data.json) and commit any outstanding changes.
  3. I implement the feature locally, testing it on the sandbox item and checking with Harmonia if the edits made there look correct.
  4. I commit my changes, push everything to GitHub, and tell Harmonia that they’re ready.
  5. She pulls the changes and tests the script locally as well, e. g. with a large batch edit which motivated the feature request in the first place.
  6. If everything is fine, she runs namescript-upload.js to upload all the changes back to Wikidata.