Introducing m3api

, .

For the past couple of years, I’ve been working on a new JavaScript library for the MediaWiki Action API, called m3api. On the occasion of its 1.0.0 release today, I want to talk about why I wrote it, what it does, and why I think you should use it :)

npm package, GitHub repository, documentation, examples.

Why a new JS library for the MediaWiki API?

So why did I write a new library for the MediaWiki API at all? Aren’t there enough of them already?

I was looking for a library fulfilling two criteria, and didn’t find any that fulfilled both:

  1. Cross-platform: I want to be able to use the same interface to the API whether I’m writing code for the browser or for Node.js. (Small differences in setup are acceptable, but once setup is done, the interface should be uniform.) This apparently rules out virtually all the libraries; the only known exception on the list of libraries linked above (apart from m3api itself) is CeJS, which is a mystery to me.
  2. Reasonably modern: at a minimum, this means promises rather than callbacks. (As far as I can tell, this rules out CeJS, along with many other libraries.) Additional modern things that would be nice to have are async generators as the interface for API continuation and ES6 modules instead of Node.js require() / UMD / etc.

Since I couldn’t find a library matching my needs, I wrote it :)

Main characteristics

Naming things is hard; m3api stands for “minimal, modern MediaWiki API [client]” (three ‘m’s, you see). I’ve already mentioned “modern” above – m3api uses promises, async generators, ES6 modules, but also fetch() (even in Node – yay for undici), class syntax, object spreading and destructuring, FormData / Blob / File for file parameters, and more. (Some of this felt fairly “bleeding edge” when I started working on m3api, but keep in mind that this was almost five years ago. m3api may not support all the browsers supported by MediaWiki, but it does support the Node.js version that was shipped in stable Debian 12 (Bookworm) two years ago.)

I want to elaborate on the “minimal” term a bit more. Basically, the point is that I’m familiar with the MediaWiki Action API, and I don’t like libraries that aim to hide the API from me. I’m wary of basic CRUD abstraction methods; the action=edit API has plenty of useful options, many of which a higher-level method probably doesn’t make available. I want a library that helps me to work with the API directly. (I don’t mind if it also offers abstraction methods, but they’re not a high priority for me when writing my own library. Also, some other libraries seem to make it relatively hard to make direct API requests.)

However, “minimal” doesn’t mean that the library doesn’t have any features. There are plenty of features designed to make it easier to use the API; my basic rule of thumb is that the feature should be useful with more than one API action. For example, API continuation is present in several API actions, and somewhat tedious to use “manually”, so m3api offers support for it.

In addition to that, there are also several extension packages for m3api, as well as guidelines for others to implement additional extension packages. These implement support for specific API modules (m3api-query for action=query, m3api-botpassword for action=login) or other functionality that doesn’t belong in m3api itself (m3api-oauth2 for the OAuth 2.0 authorization flow). In combination, these libraries are intended to provide, if not a full API framework, then at least a powerful and flexible toolkit for working with the API.

Basic interface

The simplest way to make an API request with m3api looks like this:

import Session from 'm3api/node.js';
const session = new Session( 'en.wikipedia.org' );
const response = await session.request(
	{ action: 'query', meta: 'siteinfo' },
);

You can also specify default parameters that should apply to every request of a session when creating it:

import Session from 'm3api/node.js';
const session = new Session(
	'en.wikipedia.org',
	{ formatversion: 2 },
);
const response = await session.request(
	{ action: 'query', meta: 'siteinfo' },
);

These examples specify parameters to send to the API (action=query, meta=siteinfo, formatversion=2). Additionally, you can specify options as another object after the parameters, which instead influence how m3api sends the request. One option that you should always set is the userAgent, which controls the User-Agent HTTP header (see the User-Agent policy). Usually, you would set this option for all requests when creating the session:

import Session from 'm3api/node.js';
const session = new Session(
	'en.wikipedia.org',
	{ formatversion: 2 },
	{ userAgent: 'introducing-m3api-blog-post' },
);
const response = await session.request(
	{ action: 'query', meta: 'siteinfo' },
);

But you could also set it on the individual request, if you wanted:

import Session from 'm3api/node.js';
const session = new Session(
	'en.wikipedia.org',
	{ formatversion: 2 },
);
const response = await session.request(
	{ action: 'query', meta: 'siteinfo' },
	{ userAgent: 'introducing-m3api-blog-post' },
);

(It doesn’t make much sense to set the userAgent per request, but there are other options where it’s more useful, e.g. method: 'POST' and tokenType: 'csrf'.)

Other functions generally also follow this pattern of taking parameters followed by options, with the options being, well, optional. Both the parameters and options are merged with the defaults from the constructor, making for a convenient and uniform interface.

In addition to strings, parameter values can also be numbers, booleans, and arrays, for example:

const response = await session.request( {
	action: 'query',
	meta: [ 'siteinfo', 'userinfo' ],
	curtimestamp: true,
	formatversion: 2,
} );

List parameters can also be sets instead of arrays; more on that below.

API continuation

As mentioned above, m3api includes support for API continuation. I’m not aware of a great explanation of this feature in the API, so I’ll just use this section to talk about it in general as well as how m3api supports it ^^

Continuation is the mechanism by which the API returns a limited set of data while enabling you to make further requests to fetch additional data. The MediaWiki Action API’s continuation mechanism is highly flexible; a single API request can use many different modules, each of which contributes to continuation, and it all works out.

The basic principle is that the API may return, as part of the response, a continue object with parameters you should send with your next request. For instance, if you make an API request with action=query and list=allpages, the response may include "continue": { "apcontinue": "!important" }; your next request should then use the parameters action=query, list=allpages and apcontinue=!important. Continuation is finished when there is no continue object in a response.

In m3api, the main interface to continuation is the requestAndContinue() method, which returns an async generator. It’s typically used in a for await loop like this:

for await ( const response of session.requestAndContinue( {
	action: 'query',
	list: 'allpages',
} ) ) {
	console.log( response );
}

Each response is a response object like would be returned from a normal request() call. You can break; out of the loop at any time to stop making additional requests.

The above example shows a “simple” case of continuation: each request produces one “batch” of pages (or, for some modules, revisions), and the next request continues with the next batch of different pages. However, it’s possible for a response to not contain the full data of one batch of pages. (An extreme example of this would be action=query, generator=querypage, gqppage=Longpages, gqplimit=500, prop=revisions, rvprop=text – that is, the text content of the 500 longest pages on the wiki. This will run into the response size limit very quickly, but the batch still contains all 500 longest pages, even though not all 500 are returned with their text in the same response.) In this case, continuation will first proceed within one batch of pages (i.e., requests will return additional data for the same set of pages), and only proceed to the next batch after the full data for the previous batch has been returned, spread across multiple API responses. (It’s the caller’s responsibility to merge those responses back together again in a way that makes sense.) You can distinguish between these cases by the batchcomplete member in the response: if it’s present (set to "" in formatversion=1 or true in formatversion=2), then the request returned the full set of data for the current batch of pages, and following continuation will proceed to the next batch; if it’s not present, then the request didn’t return the full data yet, and following continuation will yield additional data for the same batch of pages.

m3api supports this distinction too, using the requestAndContinueReducingBatch() method. It also returns an async generator, but follows continuation internally until the end of a batch has been reached, yielding a value that represents the combined result of all the responses for that batch. If you continue iterating over the async generator, it will continue with the next batch, and so on. When you use this method, you have to provide a reducer() callback, which somehow merges the latest API response into the current accumulated value. The initial value for each batch can be specified via another callable, and otherwise defaults to {} (empty object). This interface is similar to Array.reduce() (hence the name; elsewhere this operation is also known as fold), but with a separate “reduction” taking place for each batch of pages returned by the API.

requestAndContinueReducingBatch() is a fairly low-level method, and is not intended to be used directly. The m3api-query extension package offers some more convenient methods (assuming you’re using action=query): queryFullPageByTitle(), queryFullPageByPageId() and queryFullRevisionByRevisionId() return the full data for a single page or revision (even that can be split across multiple responses!), while queryFullPages() and queryFullRevisions() return async generators that yield full pages or revisions.

for await ( const page of queryFullPages( session, {
	action: 'query',
	list: 'allpages',
} ) ) {
	console.log( page );
}

You get a simple, flat stream of pages, and don’t have to care that some of them may have been returned in the same response, others in a later response, and some may even have been split across multiple responses. The way in which pages from multiple responses are merged is configurable via the options, but the default should work for most cases. This is one of the parts of m3api I’m proudest of – making it easy to correctly work with API continuation.

Combining requests

Another m3api feature I’m proud of is automatically combining concurrent compatible requests. The idea is taken from the Wikidata Bridge (an interface to edit Wikidata from Wikipedia), where the Wikidata team at Wikimedia Germany (that I’m a part of) implemented something similar. (I reimplemented the idea from scratch in m3api to avoid infringing any copyright.)

The Wikidata Bridge needs to load a lot of information from the API when it initializes itself:

  1. Whether the user has permission to edit the Wikipedia article.
  2. The Wikipedia site’s restriction levels, to determine what kind of protection the article has.
  3. Whether the user has permission to edit the Wikidata item.
  4. The Wikidata site’s restriction levels, to determine what kind of protection the item has.
  5. Whether the Wikidata site has temporary accounts enabled, to determine whether to show a “your IP address will be publicly visible” warning.
  6. The bridge configuration on Wikidata.
  7. The data type of the property of the statement being edited.
  8. The latest revision ID of the item being edited.
  9. The statements of the item being edited.
  10. The label of the property of the statement being edited.

A naïve implementation would make up to ten separate API requests to get this information (I’ve linked them above for the Beta Wikidata Bridge demo page). However, due to how API modules are designed to be flexible in which data they return, and how parameters that specify “I’d like this piece of data” are often multi-valued, you can also combine them into just three requests: action=query on Wikipedia (1 and 2), action=query on Wikidata (3 to 6), and action=wbgetentities on Wikidata (7 to 10). The simple approach to implement the initialization with just three requests would be to have one big blob of code that makes all the requests and extracts all the information from the responses, but this wouldn’t be very readable or maintainable: we’d rather have a bunch of smaller, self-contained services that each just specify the request parameters they need and extract the parts of the response that concern them. But how do we then combine those requests?

One approach I’ve used in the Wikidata Image Positions tool (written in Python) is to explicitly split the API requests into three “phases”: assemble the parameters, make the request, process the response. Then you can assemble the parameters from multiple requests, make only one request, and process the same response multiple times (example based on load_image()):

query_params = query_default_params()
image_attribution_query_add_params(
    query_params,
    image_title,
)
image_size_query_add_params(
    query_params,
    image_title,
)

query_response = session.get(**query_params)

attribution = image_attribution_query_process_response(
    query_response,
    image_title,
)
width, height = image_size_query_process_response(
    query_response,
    image_title,
)

But this is fairly cumbersome, and also requires the calling code to know which requests can be combined and which can’t. We can do better.

Because all requests are asynchronous in JavaScript, our request() function can return a Promise without immediately making an underlying network request. We can then wait for a very short period (specifically, until the next microtask), and see if any other requests come in during that time; if they do, we check if they’re compatible, and potentially merge them into the pending request. Then, we send the pending request(s), and resolve the associated promises with the response(s).

The effect of this is that, when several compatible requests are made within the same JS event loop run, then m3api can merge them automatically. Most often, making several requests within the same JS event loop run looks like a call to Promise.all() with several requests (see the example below).

To determine whether requests are compatible, we need to distinguish between list-type parameters that can be merged, and ones that can’t be. The convention we used in the Wikidata Bridge, and which I reused for m3api, is that mergeable parameters are specified as Sets, while unmergeable parameters are specified as Arrays. (The reasoning behind this is that, in many other languages, sets are unordered, and when a parameter is mergeable then you probably don’t care about the order the parameters are sent in; conversely, when you care about the order, you probably don’t want another request’s values to be inserted in front of yours. This doesn’t 100% apply in JavaScript because Sets obey insertion order, but I think it still makes some sense.) So, two requests are compatible if all their parameters either only occur in one request (e.g. one has list=allpages while the other has meta=siteinfo), have the same value in both requests (e.g. both have action=query), or are specified as Set in both requests. To make creating Sets more convenient, a set() helper function is provided, so that e.g. requests with list: set( 'allpages' ) and list: set( 'allusers' ) are compatible.

The upshot of this is that the following example code will only make one underlying network request, with siprop=general|statistics:

async function getSiteName( session ) {
	const response = await session.request( {
		action: 'query',
		meta: set( 'siteinfo' ),
		siprop: set( 'general' ),
	} );
	return response.query.general.sitename;
}

async function getSiteEdits( session ) {
	const response = await session.request( {
		action: 'query',
		meta: set( 'siteinfo' ),
		siprop: set( 'statistics' ),
	} );
	return response.query.statistics.edits;
}

const [ sitename, edits ] = await Promise.all( [
	getSiteName( session ),
	getSiteEdits( session ),
] );

In principle, it’s possible that automatically combining requests will cause bugs in code written by developers who aren’t aware of this m3api feature. (For example, if someone doesn’t use m3api-query, they might use code like response.query.pages[ 0 ] to access the only page they expect to be present in the response, without realizing that a merged request may have caused further pages to be returned.) However, I hope that this will be rare, thanks to the combination of requests only being combined if they happen within the same JS event loop run and array-type parameters not being eligible for combining. If I get a lot of bug reports about this feature, I may reconsider it for the next major version. (If you want to make absolutely sure that a particular request will not be combined with any other, specify the action as a single-element array, e.g. action: [ 'query' ] – every other request will also specify the action parameter, and they’ll all be incompatible, because arrays are not mergeable.)

Error handling

As you might expect, m3api detects errors in the response and throws them (or, if you prefer, it rejects the promise, because all of this is async). As you might also expect, any warnings in the response are detected and, by default, logged to the console via console.warn(). (I was actually surprised to discover the other day that MediaWiki’s own mw.Api() doesn’t do this. God knows how many on-wiki gadgets and user scripts use deprecated API parameters without realizing it because the warnings returned by the API go straight to /dev/null…)

m3api also supports transparently handling errors without throwing them. Several errors returned by the API can be handled by retrying the request in some form; m3api’s approach is to retry requests until a certain time limit (by default, 65 seconds) after the initial request has passed – I think this makes more sense than limiting the absolute number of retries, as some other libraries do. (You can change the limit using the maxRetriesSeconds request option – bots may want to use a much longer limit than interactive applications.) If the response by the API includes a Retry-After header, m3api will obey it (as long as it’s within said time limit); otherwise, error handlers for different error codes can be configured, which may likewise retry the request. m3api ships error handlers for badtoken (update the token, then retry), maxlag and readonly errors (sleep for an appropriate time period, then retry). The m3api-oauth2 extension package installs an error handler to refresh expired OAuth 2 access tokens (on Wikimedia wikis, they expire after 4 hours) and then retry the request. These retries are always transparent to the code that made the request.

Why you should use it

I’m of course biased, but I happen to think it’s a well-designed library, for various reasons including the ones detailed above ;) but I’ll close by mentioning some of the recommendations in the API Etiquette (permalink) and outlining how m3api aligns with them:

request limit
This is partially up to the developer using m3api, but m3api supports “ask[ing] for multiple items in one request”, both manually by specifying parameters as lists or sets (e.g. titles: set( 'PageA', 'PageB', 'PageC' )) and automatically by combining requests as explained above. Also, as mentioned in the error handling section, Retry-After response headers are respected; this isn’t explicitly mentioned on the API Etiquette page, but I’ve heard it’s still considered good bot practice.
maxlag
Specifying the maxlag parameter is up to the developer using m3api, but m3api recommends it for bots, and if it is used, then m3api will automatically wait and retry the request if the API returns a maxlag error.
User-Agent header
m3api sends a general User-Agent header for itself by default, and also encourages developers to specify a custom User-Agent header. If developers neglect to specify the userAgent request option, a warning is logged (by default, to console.warn(), where it should be relatively visible).
data formats
m3api uses the JSON format (of course).

If you’re already using a different API library or framework, you’re free to continue using it, naturally. But if you’re currently making network requests to the API directly, or if you’re going to start a new project where you need to interact with the API, I encourage you to give m3api a try. And if you use it, please let me know how it’s working for you!