General architecture of TeXmacs (FSF GNU project)

General architecture of TeXmacs

1.Introduction

The TeXmacs program has been written in C++. You need g++ and the makefile utility in order to compile TeXmacs. Currently, the source (in the src directory) of the TeXmacs implementation has been divided into the following parts:

A set of basic and generic data structures in the Basic directory.
Standard resources for TeXmacs, such as TeX fonts, languages, encodings and dictionaries, in the Resource directory.
A documented graphical toolkit in the Window directory (although the documentation is a bit outdated).
The extension language for TeXmacs in the Prg directory.
The typesetting part of the editor in the directory src/Typeset.
The editor in the directory src/Edit.
The TeXmacs server in the directory src/Server.

All parts use the data structures from Basic. The graphical toolkit depends on Resource for the TeX fonts. The extension language is independent from Resource and Window. The typesetting part depends on all other parts except from Prg. The main editor and the TeXmacs server use all previous parts.

The TeXmacs data are contained in the directory edit which corresponds to the TeXmacs distribution without the source code. Roughly speaking, we have the following kind of data:

Font data in fonts (encodings, .pk files, etc.).
Language data in languages (hyphenation patterns, dictionaries, etc.).
Document styles in style.
Initialization and other Scheme programs in progs.

The directory misc contains some miscellaneous data like the edit icon (misc/pixmaps/edit.xpm).

2.Intern representation of texts

TeXmacs represents all texts by trees (for a fixed text, the corresponding tree is called the edit tree). The nodes of such a tree are labeled by standard operators which are listed in Basic/Data/tree.hpp and Basic/Data/tree.cpp. The labels of the leaves of the tree are strings, which are either invisible (such as lengths or macro definitions), or visible (the real text).

The meaning of the text and the way it is typeset essentially depend on the current environment. The environment mainly consists of a relative hash table of type rel_hashmap<string,tree>, i.e. a mapping from the environment variables to their tree values. The current language and the current font are examples of system environment variables; new variables can be defined by the user.

2.1.Text

All text strings in TeXmacs consist of sequences of either specific or universal symbols. A specific symbol is a character, different from '\0', '<' and '>'. Its meaning may depend on the particular font which is being used. A universal symbol is a string starting with '<', followed by an arbitrary sequence of characters different from '\0', '<' and '>', and ending with '>'. The meaning of universal characters does not depend on the particular font which is used, but different fonts may render them in a different way.

2.2.The language

The language of the text is capable performing a further semantic analysis of a text phrase. At least, it is capable of splitting a phrase up into words (which are smaller phrases) and inform the typesetter about the desired spaces between words and hyphenation information. In the future, additional semantics may be added into languages. For instance, spell checkers might be implemented for natural languages and parsers for mathematical formulas or programming languages.

3.Typesetting texts

Roughly speaking, the typesetter of TeXmacs takes a tree on input and produces a box, while accessing and modifying the typesetting environment. The box class is multifunctional. Its principal method is used for displaying the box on a post-script device (either the screen or a printer). But it also contains a lot of typesetting information, such as logical and ink bounding boxes, the positions of scripts, etc.

Another functionality of boxes is to convert between physical cursors (positions on the screen) and logical cursors (paths in the edit tree). Actually, boxes are also organized into a tree, which often simplifies the conversion. However, because of macro expansions and line and page breaking, the conversion routines may become quite intricate. Notice also that, besides a horizontal and vertical position, the physical cursor also contains an infinitesimal horizontal position. Roughly speaking, this infinitesimal coordinate is used to give certain boxes (such as color changes) an extra infinitesimal width.

4.Making modifications in texts

In Edit/Modify you find different routines for modifying the edit tree. Modifications go in several steps:

A certain input event triggers an action, such as make_fraction, which intends to modify the edit tree.
All modifications which make_fraction or its subroutines will make to the edit tree eventually break down to seven elementary modification routines, namely assign, insert, remove, split, join, ins_unary and rem_unary.
Before performing the required modification, the elementary modification routine first notifies all views of the same text of the modification.
On notification, each view updates several things, such as the cursor position. It also notifies the modification to the typesetter of the text, since the typesetter maintains a list of already typeset paragraphs.
When all views have been notified of the modification, we really perform it.
Each user action like a keystroke or a mouse click is responsible for inserting undo points between sequences of elementary modifications. When undoing a modification, the editor will move to the previous undo point.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".