TypeScript Compiler : Documentation Output

Recently I have been developing my TypeScript UI project, which is hosted on CodePlex. CodePlex comes with a reasonable Documentation Wiki tab and so I have been trying to build documentation for all my classes, interfaces etc. both inline (i.e. in the code) and on the project site. However, manually converting JSDoc to Wiki Docs is slow, laborious and very hard to keep up-to-date. To add to this, my sister has agreed to translate much of the online documentation into German. This presents me with the issue of how to automate documentation generation so I can get on with coding, and how to guide her on what does and doesn’t need translating.

My solution to this problem is to modify the existing TypeScript compiler to add a “–documentation” option which will output documentation files for all the classes, interfaces and enumerables in a standard format (“.ts.wiki” files). I can then write a short C# program to parse these files and show my sister what needs translating.

This seems like it ought to be relatively simple, but this turns out to be horribly messy and tricky. This is mainly due to two big issues with the TS Compiler:

  1. There is a total lack of comments/documentation on how the compiler works or what any of the names mean
  2. It’s not designed (so far as I can tell) with a proper post-processor

I have been tackling these issues and have been making reasonable progress, so in this article I will begin to explain where I’ve got to and where it is heading.

Where I began

I began by looking at the TypeScript compiler and considering which of its existing outputs would be closest to what I wanted. The compiler in its original form has two main outputs:

  1. The JS files
  2. Declaration files

The JS files are the compiled code and not exactly useful as documentation, especially since it generally doesn’t contain the comments and isn’t the TypeScript code. I deduced then that whatever code produced the JS probably wasn’t going to help me in producing TS based documentation with JSDoc descriptions included (which are, of course, comments). Declarations files, however, contain TypeScript output, with or without comments, in a standard format and not including any of the actual code. So essentially, documentation but layout out in a different way.

To proceed I knew I would need to add a new option to the compiler and a new file format. By looking at the file names and a few bits of the code I worked out that “emitter“s are the things which use “walkers” to go down the symbol tree and emit the relevant output to a file. So I copied and pasted a version of the declarationEmitter.ts file and refactored till it was a “documentationEmitter”. Finally, by trawling the code I was able to duplicate the declarationEmitter lines of code and change them to documentationEmitter code thus adding –documentation as a compiler option and .ts.wiki as an output format.

Hacking the declarationEmitter

The next stage (and my current stage) is hacking the declarationEmitter code till it becomes a documentationEmitter. A challenge with this is that I’m not outputting actual code nor am I trying to output it to a single file nor in the order that the coee is in the script file (for instance I want to order function names alphabetically and separate functions, properties etc. into groups). This presents the post-processing issue. TS is designed to output as it parses which works fine for a compiler like this, but not for documentation. Documentation needs to be written to file out-of-order (with respect to the code) and in the full light of all relevant code around it. I have therefore, come up with a workaround to the lack of an obvious (if any) post-processor.

The declarationEmitter class contains a “close” method which is supposed to close the current emitters file output stream. I am going to use the documentationEmitter’s “close” method as my post-processor kick to emit the documentation (and emit to multiple files). The rest of the emitter code will build a documentation-block tree as an intermediate step between symbol-tree and documentation output. This means changing all the “emit” and callback methods so that instead of immediately writing to the output file, they are context aware and emit to the current documentation block (or create a new where block appropriate).

A documentation block will consist of the text for that block, what type of block it is (e.g. class block, module block, function description block, etc.), the block signature (e.g. public, private, public static, private static), a reference to the block’s parent documentation block and an array of the child documentation blocks. This will allow me to construct a tree of documentation where the text is ready it just needs piecing together in a different order (e.g. class title then what module (namespace)  it belongs to).

This re-coding should be simple, and conceptually it is, but in reality this is a laborious and tricky process. Some of the names of the emitter methods are obscure like “emitTypeNamesMember” (which so far as I can tell, emits the type information for a function, property, variable or something else e.g. number or { x: number; y:number }). It is not exactly clear for someone who doesn’t know what it does or what the exact contents of the symbols are. So at each stage I am left with the following steps:

  1. Hack it so it emits to the current documentation block but otherwise output remains the same
  2. Hack as much else of the code till it all compiles and produces some form of vague documentation-like output
  3. Deduce what incorrect documentation (either in content, format or both) came from where
  4. Work out what the hell it should have been, if it should have been there at all, and if appropriate, where in the documentation it is supposed to go.
  5. Go back and re-write the code to make it look right
  6. Repeat the above

Not the nicest way to develop since it gives me no real solid idea of how far I have gotten, how much work is left and leaves a lot of guessing (not least I have to mangle 1259 lines of code before it even compiles!)

Advice for others

If you want to do this sort of thing, good luck. It is difficult to get your head around and definitely time-consuming (unless you happen to know your way around a compiler so well that nothing is new to you!) Here’s some information that may help you:

  • Don’t get hung up in TypeScript Services or the Harness – they don’t really help you if you are trying to extend the compiler functionality.
    • Services and Harness are (so far as I can understand) wrappers for:
      • Supporting Node.js, Windows Script Host and web browser environments
      • Diagnostic services/information
  • TypeScript.ts contains the overriding control logic but doesn’t do any parsing etc. in itself – add methods here and link them up for things like calls to new emitters
  • AST – Abstract Symbol Tree – This is the breakdown of the TypeScript into symbols going down from Script (i.e. file level) through modules, classes, functions all the way to variables and their type specifiers along with comments.
    • You do not necessarily have to handle every possible type of symbol – just the ones you are interested in – you can add a general catch-all (that does nothing) for the rest
  • ASTWalker – A “walker” literally walks you down the symbol tree, symbol by symbol and you can request certain information about symbols as you go (e.g. directly related comments, symbol type, symbol name)
    • The walker has two callbacks, pre and post, pre happens just before it “walks over” the symbol, post happens just after it walks over the symbol
    • Pre and Post must return booleans:
      • returning false for pre (I think) makes the walker skip processing the symbol and its children (and you don’t get a post call)
    • Pre and Post pass you a symbol representation object which gives you everything you need about that symbol (though naming is obscure and beware not all properties are always there e.g. ASISymbol can be null)
    • Use GetASTWalkerFactory().walk(pre, post) to start walking down a tree – you can often use the same method for the pre/post callbacks with an extra parameter – see DeclarationEmitter.emitDocumentation for an example
  • Emitters – This uses a walker to walk the symbol tree and handle symbols it is interested in (e.g. declarations emitter only handles public or exported or declared symbols such as exported modules and class but not private variables or code within functions)
    • Emitters output to a particular document (file) specified when they are created (but you can create other new files within the emitter)
    • Emitters can be told to output to a single file but you may wish to ignore this
    • Emitters generally contain callbacks for each symbol type that process the symbol and then pass the essential information to emitter methods
    • Emitter methods actually write to the doc file
  • IOHost – This is global to the compiler to standardise IO to files (to make it work across Node.js, WSH, web-browser)
    • Because this is global you can use it from anywhere so you can use it to create files (there is only one instance per compiler (program) instance)
  • Process – Again this is global and has some very helpful methods for giving debug trace
    • process.stdout and process.stderr is accessible from anywhere – use .write (with “\r\n” for new lines) to emit debug info e.g.
      • “declarationEmitter.ts : Line 59 : Constructor called\r\n”
      • This si a good standard output that lets you trace back to the TS source line easily (don’t forget the \r\n or everything ends up on one line!!)
  • If you are looking for a particular bit of symbol information, think what it is compiled to, find where it gets compiled the main compiler code, copy paste what is there! It is the fasted way to work out how to get certain information. Also, look at what type of PullDecl is used – it affects what information is visible/accessible.

I hope this article helps someone with their attempts at hacking the TS compiler and I will hopefully be submitting my code to the TS CodePlex project at some stage in the future (if not, I’ll at least post the code online for others to use so check back here for updates or follow me on Twitter!).

About these ads

2 responses to “TypeScript Compiler : Documentation Output

  1. Pingback: TypeScript Documentation Generation – Fork & Pull Request | Edward Nutting·

  2. Thanks so much for this post! It’s very helpful as I’m just beginning some modifications to TypeScript and documentation of the compiler is indeed lacking – plus I don’t have much experience with compilers to begin with. Thanks again.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s