Monday, September 13, 2010

Source code architecture, part II

The first part told the story of the requirements surrounding a packaging mechanism for code chunks. Alex left some helpful comments (only Buzz, not here). Here, we expand upon what was said in the first part. This should establish a sufficient understand upon which to build a solution.

Assets


Why talk about assets and code chunks instead of functions (procedures, class members, what have you)? The influence here is completely from literate programming where one can name fairly arbitrary chunks of code and re-use them. But surely in a language that has been influenced by Haskell (amongst many others), functions would be sufficient? Maybe. If the polymorphism supported by the language is sufficiently rich. And that's the source of my hedging: at this point, I don't want to mix these two issues (organization and polymorphism). But it is an issue to keep in mind: in a strongly typed language, insufficient polymorphism and/or persistent abstractions leads to macros. [A 'persistent' abstraction is an abstraction mechanism over which the programmer has no control, i.e. cannot reliably tell the compiler to inline it. I'll probably make a post on that in the future.] Macros are not evil per se, but they are a way of "giving up" on a language's intrinsic features as "not powerful enough". I would like to avoid that.

All this to say that, in theory, all assets should be 'like functions' where external dependencies, both input and output, should be visible and controllable. In the meantime, let's just go for "code chunks".

Naming


There was an implicit assumption that all our chunks are (somehow) named; this can be taken as an additional axiom. Names are not the only 'solution', but it sure is very convenient. For theoretical purposes, as well as ease of implementation, there are other solutions. The explicit design choice here is to favour human understandability above other measures. So names it is.

Another assumption is that names are expected to be locally meaningful, but not necessarily globally stable. In other words, it is implicitly assumed that we have a renaming mechanism. That is an easy assumption because, at the lower levels, we already do have a renaming mechanism, it is used a lot, and it is very handy. So we just extend it to the higher levels too.

Assembly


Because we priviledge human understanding (of source code), this means that machine-ready code might require a fair bit of processing to 'build' it. Clearly such processing can be an important part of 'understanding' too. Let us see it another way: whether it is the denotational semantics (of the source code) or the operational semantics (of the assembly code) which is 'complicated', both will present a barrier to understandability. Ideally, the operational semantics of the assembly code would be a straightforward implementation of some kind of hierarchical denotational semantics for the source code.

But we already assume that we 'understand' the MathScheme language. And the language has built-in features dealing with syntax, as well as black boxes. So it is a rather simple leap to think that 'assembly' should be regarded as something which should be an internal rather than external entity. In other words, as a program, it should be written in the same language.

Context


Why do we name chunks at all? To be able to conveniently refer to them. Why would we want to do that? To be able to (re)use them. The simplest situation is when we name a function, we just want to call it. In the situation at hand, what we really want to do is to create a context, aka a set of definitions, within which the code we are currently reading/writing 'makes sense'.

Ideally, everything would be done by reference, so as to minimize actual duplication. In practice, we sometimes do want duplication. But that will be considered to be a separate problem - though needs to be 'part of' the design.

No comments: