1 Introduction

PHPLaTeX is a program for converting LaTeX documents into HTML5 with MathML and SVG. It's main selling points are that it is written entirely in PHP and that it attempts to mimic the basic TeX method of parsing a document by reading and expanding tokens. The main consequence of the first of these is that it is extremely portable. The main consequence of the second is that it can handle macro expansion.

Not everything is implemented yet, and things that are may not be implemented fully. A full list of the currently available commands, together with some notes on specific commands, is given at the end.

This is not intended to ever be a PHP implementation of TeX. It would be nice if one could load in a reasonably straightforward TeX or LaTeX document and convert it to HTML5 with MathML and SVG, however I do not intend it ever to be able to load in a style file and use it with no alterations. The reason being that TeX is a renderer whereas this program is a converter. The rendering is done by another program (i.e. your web browser). Therefore the aim of this program is to tranlate a document into something a web browser can understand. The reason that it has to be fairly complicated is that TeX is actually a programming language whereas HTML is merely a markup language. Therefore some of the programming commands in a normal TeX document have to be carried out before they can be converted to markup. Unfortunately, the division between pre-rendering and post-rendering processing is not clean and so cannot be fully automatic (or at least, I've no plans to attempt it). In essence, the closer one is to the actual TeX engine, the less likely one is going to be able to ``drag and drop'' it into this program. Fortunately, most LaTeX documents are a long, long way from that.

2 Mathematics

Mathematics implementation is a work in progress. Certain obvious features are not yet implemented, in particular not all of the iTeX suite is currently supported. The main thing to know is that this program currently only supports LaTeX-style mathematics delimiters: \( and \) for inline mathematics and \[ and \] for display style.

3 XY Pictures and SVG

The motiviation for this program to was to write something that could convert xymatrix-style pictures into SVG. A major issue with this is the integraction of MathML and SVG: basically, there is none. Each can be embedded in the other, but neither can know anything about the sub-object. Therefore alignment and size are particularly troublesome. So this program has a rudimentary ``getWidthOf'' function which converts the inset mathematics into MathML and then estimates its width. This is then fed back into the SVG-generation to help get the alignment sorted out. However, once one has gone to the trouble of converting the mathematics into MathML, one may as well go the whole distance and convert the entire document.

Most features in an xymatrix command are supported, plus one or two little extras. What is not currently supported are:

The arguments that can be given before the main picture begins.
Labels in the arrows. Most of the work is there, but I've yet to decide how to put a suitable gap in the arrow. (Labels above and below the arrows work.)
Arrows passing by nodes other than their source and target.

All of these can be specified but they are currently ignored.

The main extra is that although labels cannot be put in the middle of arrows, arrowheads can. The syntax extends the ``style'' syntax of an xymatrix arrow. The standard ``style'' syntax is to put ``@{tail stem head}'' in the arrow specification. If the stem consists of ``stem{tip}stem'' then the ``tip'' is put at the midpoint of the arrow. Note that the before and after parts of the stem have to be the same, and will collapse to just one copy (so ``-{>}-'' is an unbroken line).

To save computation, the positions on the arrows are determined by the parameter used in their formula rather than arc-length.

A

B

C

D

E

F

f

g

Much work still needs to be done on fine tuning the algorithms for label placement and text size.

The way that this works is as follows. The argument to xymatrix is parsed and split up into a matrix of entries. Each entry is converted to MathML and measured. Currently only the width is measured, the height and depth are set to a standard (so high or deep entries may get cropped). Each node is then set in a box (using the foreignObject tag) and horizontally centred within that box. To get vertical alignment correct, a strut is placed at the begining of the label of the maximum height of the node entries. The boxes use the maximum width of all the nodes, meaning that hopefully no node entry will get cropped horizontally. Then the arrows are placed. Using the measured widths of each node, the arrows start and end at one of the eight compass points around the node. Which compass point depends on where the arrow is going to or from, but this can be overriden by using curving: the second arrow in the above is curved by using @([d],[u]) to exit the upper node from ``d'' and enter the lower from ``u''. The arrow is either a straight line, quadratic bezier, or cubic bezier, depending on how complicated the curving specification (basically we use the simplest that we can get away with). Setting the style is the simplest, except that there is a fairly complicated dictionary to translate XY styles to SVG styles. The method of doubling and tripling arrow stems is that of TikZ: draw thick lines first and overlay with successive white and black lines. The positions of the labels are then computed and the labels are rendered. This is a little more complicated than the nodes in that we don't want the label boxes to be wider than they have to be since labels are more likely to be in the middle of diagrams where life is more cluttered. Also, positioning them precisely is problematic since we specify the top left corner of the box without knowing exactly where it's contents will be put.

By placing the arrows after the nodes, we can get away with having our node boxes a little on the large side. However, this makes other things more awkward. In particular, passing arrows under nodes is not easy to do, also the arrows currently point to their compass point whereas what they ought to do is point to the centre of the node but stop short at the compass point. (This latter wouldn't actually be solved by putting the nodes on last as the arrowheads would be covered up. Actually solving this would just take a little more programming but is not unfeasible.)

4 Commands

4.1 Primitives

Primitives are PHP functions. They generally need to muck around with the internals of the system and so need access to the ``lower level'' of the program.

The following are currently implemented as primitives:

addtocounter

Counters are internal objects to adding to one requires access to the internals, hence a primitive.
csname

Merging two separate tokens (in this case the backslash and the command name) into a new token is not allowed for commands, so this needs to be a primitive.
def

Creates a new def, so needs to be a primitive.
documentclass

Currently doesn't do anything, more of a placeholder at the moment.
endmath

Commands swallow following spaces, except for the end-of-mathematics symbol. So this needs to be a primitive to avoid that (for the moment).
entity

This creates an entity from the given string. Entities are considered as tokens so this essentially merges tokens which can't be done by commands.
expandafter

As this involves expansion, it needs to be a primitive.
false

This alters the status of a conditional. This shouldn't be used by an author.
if

This checks the status of a conditional. It isn't the same as the TeX ``if'' command, rather it's used for conditionals created by ``newif''. This shouldn't be used by an author.
newcommand

This creates a new command. Although in LaTeX, ``newcommand'' is built on top of ``def'', here they are separate to optimise optional arguments.
newcounter

This creates a new counter.
newenvironment

This creates a new environment. It may be possible to reprogram this using just ``newcommand''s and ``expandafter''s since all it really does is create two commands.
refstepcounter

This isn't properly implemented yet. At the moment it just adds 1 to the counter, but it should then reset all subcounters.
setcounter

This sets a counter to a value, hence is a primitive.
show

This shows the corresponding primitive, def, or command. For primitives it doesn't give a lot of information, though.
true

See ``false''.
usecounter

This prints the value of the counter.
usepackage

This loads in optional extras. Currently available packages are: ``amsmath.sty'' and ``itex.sty''. It is also used to automatically load in ``default.sty''. As this accesses the file system, it has to be a primitive. However, it could be rewritten to use a more generic file inclusion.
xymatrix

This converts an xymatrix command into an SVG diagram. As there is some complicated calculation going on, this is a primitive. Nothing particularly requires that (though the compuation certainly couldn't yet be done as TeX commands), but one point of this was to use PHP where that made sense and here it certainly does.

4.2 Defs, Commands, and Environments

In PHPLaTeX, defs and commands are more alike than in ordinary LaTeX. This is mainly a question of implementation for optional arguments for commands. However, this is of little import for their use. There are two main differences from ordinary (La)TeX. Although defs and commands can be stacked, the nested defs and commands cannot currently take arguments (this will not be hard to fix, just haven't gotten round to it). The other difference is that any arguments to a command can be designated as optional. That is, the person who defines the command can choose which arguments should be optional, and what their default arguments should be. This feature is not available via the ``newcommand'' primitive, rather one has to write a new primitive that defines the command directly. Thus this feature is more for hackers than authors.

Environments are as in LaTeX: they simply define commands that get invoked upon begining and ending.

At the moment, none of the defining commands (def, newcommand, or newenvironment) checks to see if a command of that name already exists. Thus the last definition (usually) wins.

Currently available commands and environments are:

4.2. Commands

\title, \begin, \end, \newif, \(, \[, \], \amporcol, \section, \subsection, \subsubsection, \thesubsection, \thesubsubsection, \newline, \par, \emph, \textbf, \", \`, \', \^, \ , \c, \ , \mathop, \mathnum, \mathchar, \mathparen, \left, \right, \frac, \sqrt, \root, \rule, \!, \negspace, \,, \thinspace, \:, \medspace, \;, \thickspace, \quad, \qquad, \ae, \AE, \oe, \OE, \aa, \AA, \o, \O, \ss, \dag, \ddag, \S, \P, \copyright, \pounds, \aleph, \wp, \Re, \Im, \surd, \angle, \partial, \infty, \clubsuit, \diamondsuit, \heartsuit, \spadesuit, \cdot, \vartheta, \varpi, \dots, \in, \to, \approx, \propto, \neq, \neg, \wedge, \vee, \supset, \subset, \emptyset, \pm, \implies, \prime, \nabla, \forall, \times, \notin, \ni, \prod, \sum, \ast, \equiv, \sim, \oplus, \cap, \cup, \rfloor, \euro, \int, \cong, \ne, \le, \ge, \gt, \lt, \otimes, \perp, \alpha, \beta, \gamma, \delta, \epsilon, \zeta, \eta, \theta, \iota, \kappa, \lambda, \mu, \nu, \xi, \omicron, \pi, \rho, \sigma, \tau, \upsilon, \phi, \chi, \psi, \omega

Via ``amsmath'' package:

\mathbb, \mathcal, \mathfrak, \big, \Big, \bigg, \Bigg, \text, \lvert, \rvert, \lVert, \rVert, \lbrace, \rbrace, \lbracket, \rbracket

Via ``itex'' package:

\infty, \infinity, \lbrace, \, \rbrace, \, \vert, \Vert, \|, \setminus, \backslash, \smallsetminus, \sslash, \lfloor, \lceil, \lang, \langle, \rfloor, \rceil, \rang, \rangle, \uparrow, \downarrow, \updownarrow, \#, \prime, \alpha, \beta, \gamma, \delta, \zeta, \eta, \theta, \iota, \kappa, \lambda, \mu, \nu, \xi, \pi, \rho, \sigma, \tau, \upsilon, \chi, \psi, \omega, \backepsilon, \varepsilon, \varkappa, \varpi, \varrho, \varsigma, \vartheta, \phi, \varphi, \arccos, \arcsin, \arctan, \arg, \cos, \cosh, \cot, \coth, \csc, \deg, \dim, \exp, \hom, \ker, \lg, \ln, \log, \sec, \sin, \sinh, \tan, \tanh, \det, \gcd, \inf, \lim, \liminf, \limsup, \max, \min, \Pr, \sup, \omicron, \epsilon, \cdot, \Alpha, \Beta, \Delta, \Gamma, \digamma, \Lambda, \Pi, \Phi, \Psi, \Sigma, \Theta, \Xi, \Zeta, \Eta, \Iota, \Kappa, \Mu, \Nu, \Rho, \Tau, \mho, \Omega, \Upsilon, \Upsi, \iff, \Longleftrightarrow, \Leftrightarrow, \impliedby, \Leftarrow, \implies, \Rightarrow, \hookleftarrow, \embedsin, \hookrightarrow, \longleftarrow, \longrightarrow, \leftarrow, \to, \rightarrow, \leftrightarrow, \mapsto, \map, \nearrow, \nearr, \nwarrow, \nwarr, \searrow, \searr, \swarrow, \swarr, \neArrow, \neArr, \nwArrow, \nwArr, \seArrow, \seArr, \swArrow, \swArr, \darr, \Downarrow, \uparr, \Uparrow, \downuparrow, \duparr, \updarr, \Updownarrow, \leftsquigarrow, \rightsquigarrow, \leftrightsquigarrow, \upuparrows, \rightleftarrows, \rightrightarrows, \dashleftarrow, \dashrightarrow, \curvearrowleft, \curvearrowbotright, \downdownarrows, \leftleftarrows, \leftrightarrows, \righttoleftarrow, \lefttorightarrow, \circlearrowleft, \circlearrowright, \dots, \ldots, \cdots, \ddots, \udots, \vdots, \cup, \union, \bigcup, \Union, \cap, \intersection, \bigcap, \Intersection, \in, \gt, \lt, \approxeq, \backsim, \backsimeq, \subset, \subseteq, \subseteqq, \subsetneq, \subsetneqq, \varsubsetneq, \varsubsetneqq, \prec, \parallel, \nparallel, \shortparallel, \nshortparallel, \perp, \eqslantgtr, \eqslantless, \gg, \ggg, \geq, \geqq, \geqslant, \gneq, \gneqq, \gnapprox, \gnsim, \gtrapprox, \ge, \le, \leq, \leqq, \leqslant, \lessapprox, \lessdot, \lesseqgtr, \lesseqqgtr, \lessgtr, \lneq, \lneqq, \lnsim, \lvertneqq, \gtrsim, \gtrdot, \gtreqless, \gtreqqless, \gtrless, \gvertneqq, \lesssim, \lnapprox, \nsubset, \nsubseteq, \nsubseteqq, \notin, \ni, \notni, \nmid, \nshortmid, \preceq, \npreceq, \ll, \ngeq, \ngeqq, \ngeqslant, \nleq, \nleqq, \nleqslant, \nless, \supset, \supseteq, \supseteqq, \supsetneq, \supsetneqq, \varsupsetneq, \varsupsetneqq, \approx, \asymp, \bowtie, \dashv, \Vdash, \vDash, \VDash, \vdash, \Vvdash, \models, \sim, \simeq, \nsim, \smile, \triangle, \triangledown, \triangleleft, \cong, \succ, \nsucc, \ngtr, \nsupset, \nsupseteq, \propto, \equiv, \nequiv, \frown, \triangleright, \ncong, \succeq, \succapprox, \succnapprox, \succcurlyeq, \succsim, \succnsim, \nsucceq, \nvDash, \nvdash, \nVDash, \amalg, \pm, \mp, \bigcirc, \wr, \odot, \uplus, \clubsuit, \spadesuit, \Diamond, \diamond, \sqcup, \sqcap, \sqsubset, \sqsubseteq, \sqsupset, \sqsupseteq, \Subset, \Supset, \ltimes, \div, \rtimes, \bot, \therefore, \thickapprox, \thicksim, \varpropto, \varnothing, \flat, \vee, \because, \between, \Bumpeq, \bumpeq, \circeq, \curlyeqprec, \curlyeqsucc, \doteq, \doteqdot, \eqcirc, \fallingdotseq, \multimap, \pitchfork, \precapprox, \precnapprox, \preccurlyeq, \precsim, \precnsim, \risingdotseq, \sharp, \bullet, \nexists, \dagger, \ddagger, \not, \top, \natural, \angle, \measuredangle, \backprime, \bigstar, \blacklozenge, \lozenge, \blacksquare, \blacktriangle, \blacktriangledown, \forall, \bigtriangleup, \bigtriangledown, \nprec, \aleph, \beth, \eth, \ell, \hbar, \Im, \imath, \jmath, \wp, \Re, \Perp, \Vbar, \Box, \square, \emptyset, \empty, \exists, \circ, \rhd, \lhd, \lll, \unrhd, \unlhd, \Del, \nabla, \sphericalangle, \heartsuit, \diamondsuit, \partial, \qed, \mod, \bottom, \neg, \neq, \ne, \shortmid, \mid, \int, \integral, \iint, \doubleintegral, \iiint, \tripleintegral, \iiiint, \quadrupleintegral, \oint, \conint, \contourintegral, \times, \star, \circleddash, \odash, \boxminus, \minusb, \boxplus, \plusb, \boxtimes, \timesb, \sum, \prod, \product, \coprod, \coproduct, \otimes, \Otimes, \bigotimes, \oplus, \Oplus, \bigoplus, \bigodot, \bigsqcup, \biguplus, \wedge, \Wedge, \bigwedge, \Vee, \bigvee

4.2. Environments

document, array, itemize, enumerate

Via ``amsmath'' package:

matrix, pmatrix, bmatrix, Bmatrix, vmatrix, Vmatrix, smallmatrix, cases, aligned, gathered, split

5 Notes on the Code

For the user, there are currently two interfaces to the code. The first is in index.php which gives a textarea and then converts the input to that, redisplaying the form and optionally displaying the source code. The second is a file conversion. At the moment, it only works on files in the ``doc'' directory below it (currently only this file). In fact, it is what is rendering this document.

For the hacker, there are more things that one needs to know. The bulk of the program is in ``latex.php''. This contains the routines to read tokens, expand tokens, and various other useful things. Primitives are stored in the directory ``primitives''. A primitive is a function and the file is the body of that function. Packages are storied in the directory ``packages''.

6 ToDo

Fine tune ``getWidthOf'' algorthim.
Sort out reseting of subcounters.
Implement the variety of formatting styles of counters.
Implement the rest of iTeX, and amsmath.
Fully allow nesting of def/newcommands (i.e. parameter expansion).
Hyperlinks!
Primitives: \if, \ifx, \loop, \let
Scoping: commands and styles only take effect for the current scope.
Catcodes!
A range of stylesheets.
Validation and better error handling.
Object-orient stuff to make it more transparent.
Only load the SVG arrowheads once (could check for ability to cross-load?)

6.3 Commands I Wish I'd Already Defined Before Writing This Documentation

\backslash, \verb, verbatim environment.

7 License, Source Code, etc

This program is made available under the GPL. It does not have an official release yet, being only in alpha stage, but you can get it via git from the github repository.