Interfacing TeXmacs and Zotero
January 6th, 2014

Note. Jump to section ? to see the current progress (updated on January 6th, 2014). Notice that most of this stuff is in my private BitBucket repository. Contact me if you want to help or have any ideas.

Some links: Zotero server API and several implementations. The JSON parser we use: JsonCpp, and the XML parser: TinyXML2.

1.Desired features

At the very minimum we'd like to:

  1. List all collections in a Zotero library and select one as the bibliography for a TeXmacs document. Clicking on DocumentUpdateBibliography should retrieve the latest version and format it using the user's preferences.

  2. View and edit notes in TeXmacs and HTML formats.

Other interesting (?) features:

  1. Create Zotero reports as TeXmacs documents (to correctly see math inside titles, abstracts, etc.)

  2. Manage a “Drafts” category with the user's current work. Maybe allow the selection of individual bibliographic items and create collections/tags (corresponding to TeXmacs documents) automatically for them.

  3. Organize items into collections and organize collections (useful while writing a paper). (Don't: this is already done in Zotero).

There are several levels of integration of TeXmacs with Zotero and its standalone client.

  1. Read-only access to the local cache of the standalone client. Select some bibliographic entries as the bibliography for some document, then DocumentUpdateBibliography would retrieve and format them (either first exporting as BibTex or using a CSL interpreter or…).

  2. Read-write access to the local cache of the standalone client. Allows for the previous feature, as well as edition of attached notes, etc.

  3. Read-write access to the Zotero database on the server. Implements all of the above and works without a local client running, directly updates the online version, may be integrated with the groups feature, etc. Obviously won't work offline unless we keep a local database.

As we will next see, it only makes sense to implement item ?. You may now skip to §?.

Note 1. I should be careful not to let this get out of hand: it's tempting to implement a full-fledged Zotero client but that's not what I need.

1.1.Read-only access (local)

Item ? would be easy using the recent SQLite plugin to read the local cache of the standalone Zotero client. There are several implementations available (in Python, mostly) which encapsulate the required SQL queries, e.g. the one bundled in Qnotero. However it doesn't make much sense, because this cannot be used as a first step before item ?.

1.2.Read-write access (local)

Item ? seems a good starting point, but it cannot be developed starting with the code for item ? . Write access to the local database should be left to the standalone client or Firefox plugin in order to avoid database corruption or problems due to simultaneous access to the database or internal changes in the database schema. We should use instead some public API . Currently (Dec. 2013) the Zotero standalone client supports some HTTP requests to a local mini-server started at port 23119.

1. Notice that partial and direct write access with SQLite (e.g. only to notes) is not a good idea: it would skip internal integrity checks, might lead to data loss, etc. We may have a standalone client or a Firefox plugin running.

1 The payloads are formatted using JSON .

However, the feature subset made available through this system is very small. Specifically, only those methods required for the so-called connectors (for Chrome and Safari) are available. For some examples see zotero-master/chrome/content/zotero/xpcom/server_connector.js and zotero-master/chrome/content/zotero/xpcom/connector/*.

Several discussions at zotero-dev (e.g. this one) mention the need for an implementation of an analogue of Zotero's server API for the standalone client. One possible approach is to try to implement (some of) this for them first, then use it from TeXmacs. However, the clients are written Javascript, making this difficult and tiresome.

Note 2. A bonus of this approach is that once communications with the local client are implemented, and given that it provides the same API as the Zotero server itself, it'd be a matter of changing a URL to have server access.

1.3.Read-write access (server)

Because of the limitations of item ? and the work needed extending the exported API of the standalone client for item ?, item ? might be the right starting point. Zotero's server API provides a full feature set including nifty things like exporting to many formats (including BibTex) and CSL formatting of bibliographic entries (see this forum post).

There are several full implementations of the Zotero server API, e.g. libZotero. All the API details are here.

At the very least, we need the following to start with:

  1. A way to make HTTPS requests to the server. Authentication is made using Oauth or a key generated by the user in her online Zotero preferences (let's stick to this for now) which is then attached as a parameter to the request.

    The easiest solution is a wrapper class around system calls to curl , as in:

    2. Or to use libCURL with OpenSSL support, but departs from current practices within TeXmacs.

    2

    curl -H 'Zotero-API-Version: 2' \
         'https://api.zotero.org/users/<user-id>/items?key=<app-key>'

    This wrapper must also implement caching of entries, using the header

    If-Modified-Since-Version: <libraryVersion>

    as well as a long list of other things.

  2. An XML parser to compose requests to and parse replies from the server. To this end I've included TinyXML2 in src/Plugins/TinyXML2. It consists of one header and one source C++ file.

  3. A Json parser to read the payload of some replies from and compose write requests to the server. To this purpose I've included the “amalgamated” version of JsonCpp in src/Plugins/Json. Again, it consists of just two files.

  4. SQLite for the local cache. I must decide whether the current interface with tree sql_exec (url db_name, string cmd) is enough. It uses sqlite3_get_table() to convert all types to strings then returns a tree of lists, whose first item contains the column names, as in:

    (tuple (tuple "Col1" "Col2" "Col3") (tuple "value 1" "2" "value 3"))

    As of Jan 2014, caching is not implemented.

1.4.User interface

We have the choice between:

2.Implementation details (Jan. 06, 2014)

Note 3. The following implementation has been dropped in favor of a thinner C++ layer (which simply issues one-shot asynchronous Http requests) with all the logic written in Scheme code.

For the Scheme examples in this section we assume the following:

; Basic configuration variables
(define server "https://api.zotero.org")
(define userid "insert your user id here")
(define apikey "insert your api key here")

A delicate issue is to achieve the right balance between C++ and Scheme code. Requests will be issued from scheme code. Callbacks will be Scheme code to update the interface, etc.

Conventions:

2.1.Creating a connection

The first step is creating an object of type connection. This starts (not yet) the OAuth exchange and obtains an authentication token or (for now) accepts a user key, which the user may create through Zotero's web interface.

Scheme example

; Connect (no actual online connection)
(define cn (tmz-connect server userid apikey))

2.2.Making requests (Deprecated)

Requests to the server are implemented as a two-stage process with objects of type request and command

  1. Every request has an associated connection (for the authentication token) and an associated command to be executed upon completion. To ensure order of completion of requests, these are queued (and started if need be) with connection::push (request req). One may use connection::push_after (request what, request where) to place a request at a specific point in the queue.

  2. One request is started asynchronously as an external process (curl) registered with the tm_link infrastructure as a pipe_link. The latter requires a command which will be called periodically (several times per second) by tm_server_rep::interpose_handler(). The command must read back from the pipe and call tm_link_rep::stop() when it's done to close the link (i.e. pipe).

    This setup is done at request_rep::start() which sets the handler for available data in the pipe to request_rep::this, meaning that request_rep::apply() will get called.

  3. Calls to request_rep::apply() return inmediately until the link is dead (all data has been transmitted), then stop() it and call the virtual request_rep::parse() to parse the results. When this is done (possibly with errors) the request calls connection::complete(this) to detach itself from the queue. The connection decides then whether the command is to be called.

After a connection is made, a request of type privileges_request must be instantiated and pushed to the queue. It reads back from the server the access rights associated with the given key and user id, then calls its command. Completion commands will be called with a tree object containing the results.

Scheme example

First we define the callbacks, then connect and proceed to some requests.

; Our storage:
(define _privileges #f)
(define _item #f)
(define _group #f)
(define _user-groups #f)
(define _collection-list #f)
(define _item-list #f)

; Commands to be called when the requests are done:

(define (receive-privileges t) (set! _privileges t))
(define (receive-item t) (set! _item t))
(define (receive-group t) (set! _group t))
(define (receive-user-groups t) (set! _user-groups t))
(define (receive-collection-list t) (set! _collection-list t))
(define (receive-item-list t) (set! _item-list t))

(tmz-request-privileges cn receive-privileges)
(tmz-request-item cn receive-item "some item identifier")
(tmz-request-group cn receive-group "239896")
(tmz-request-user-groups cn receive-user-groups userid "atom")
(tmz-request-collection-list cn receive-collection-list 
                             "collection" "collection id")
(tmz-request-item-list cn receive-item-list "collection-top" "collection id")

2.3.Error handling

There is little done yet: