GSF Embedding

GSF supports language embedding - supporting nested languages, like JavaScript and CSS support inside HTML files, Ruby support (and JavaScript and CSS support) inside ERb/RHTML files, and so on.

At the lexical level, the Lexer API already supports embedding; you can nest languages to an arbitrarily deep level. GSF adds this to other features based on parsing. This is achieved through two mechanisms:

Language plugins can register EmbeddingModel implementations which describe how to generate a "virtual source" for a given destination language from a given source language. For example, there is an embedding model which expresses the JavaScript virtual source corresponding to a given JSP file. It's this virtual source that is parsed by the JavaScript parser.
All GSF language plugins need to be well aware that the source they parse may not be the same as the document in the editor, and in particular, they must ALWAYS translate offsets from AST offsets to Lexical offsets before attempting to look at the document contents, and vice versa. This is made easy because the embedding model provides conversion methods which take one offset type and provide the other.

This document explains the concept of virtual sources.

Say you have this source RHTML file.

    
<p>
<script type="text/javascript" src="javascripts/effects.js"></script>
<style>body { background: red }</style>
<script>var x = <% foo() %>;</script>
<span>Hello</span>
<script>
function f() { alert(x.toString()); }
</script>
<input type="submit" onclick="f()"/>

In GSF, you will involve several source models here. It will create a separate "virtual source" for each of these languages: CSS, JavaScript, Ruby.

The JavaScript virtual source (which is passed to the JavaScript parser) looks something like this:

    
__netbeans_import__('javascripts/effects.js');
var x = __UNDEFINED__;
function f() { alert(x.toString()); }
;function(){
f()
}

The virtual source translator from RHTML to JavaScript replaced the server call <% foo %> with __UNDEFINED, and the onclick handlers have been translated into anonymous functions (such that writing "return this" etc. in an event handler makes sense from an ast perspective, you can only return inside functions).

Similarly, the Ruby embedding model (RHTML to Ruby) will create this:

    
class ActionView::Base
_buf='';;_buf << '<p>
<script type="text/javascript" src="javascripts/effects.js"></script>
<style>body { background: red }</style>
<script>var x = ';
foo() ;_buf << ';</script>
<span>Hello</span>
<script>
function f() { alert(x.toString()); }
</script>
<input type="submit" onclick="f()"/>
';
end

In both cases, some of the code is coming from the original source, some is generated (e.g. the class ActionView::Base part above, and the netbeans_import to record JavaScript file inclusion and functions to contain event handlers). Some source is removed - typically embedded source that isn't relevant for this language -- such as all the markup.

These blocks of code are maintained by each embedding model, which has a set of code blocks with corresponding offsets in the generated source and in the original source. You get a map like this:

    
codeBlocks=
CodeBlockData[
 RHTML(0,0)="",
 RUBY(0,31)="class ActionView::Base
_buf='';"], 

CodeBlockData[
 RHTML(0,130)="<p>
<script type="text/javascript" src="javascripts/effects.js"></script>
<style>body { background: red }</style>
<script>var x = ",
 RUBY(31,174)=";_buf << '<p>
<script type="text/javascript" src="javascripts/effects.js"></script>
<style>body { background: red }</style>
<script>var x = ';
"], 

CodeBlockData[
 RHTML(132,139)=" foo() ",
 RUBY(174,181)=" foo() "], 
 
CodeBlockData[
 RHTML(141,266)=";</script>
<span>Hello</span>
<script>
function f() { alert(x.toString()); }
</script>
<input type="submit" onclick="f()"/>

",
 RUBY(181,319)=";_buf << ';</script>
<span>Hello</span>
<script>
function f() { alert(x.toString()); }
</script>
<input type="submit" onclick="f()"/>

';
"]

In GSF, a virtual source translator can return a collection of models which is used when you don't create a single "synthesized" model like we do here - e.g. the multiple <script> blocks in JavaScript get merged into one. For CSS, the theory is that each style="foo:bar" block would generate its own model - these are all completely independent and don't have to (and shouldn't) be parsed together. Thus, for CSS you get a series of models, but there is a special method which returns the specific virtual source corresponding to a given offset.

Tor Norbye <tor@netbeans.org>