Sunday, November 8, 2009

Concept: "file"

So let's examine a possible domain and consider the semantic information that is already part of any programmer's understanding of that domain. Let's take an "easy" one: the file. Not even the file system, just the file.

1. It has a name, a size, modification and creation dates, a type, and contents. The name is a string, with a main part and an extension, the size is a number, the dates are dates, the type is something we can think of as a string or a selection from a database, and the contents are the main event.
2. Its contents can be text or binary. Text is a series of human-readable characters; binary is usually a bunch of packed structures.
3. It has specific sets of commands in different programming languages, like "open", "read", "close"; these are mnemonics for various actions we know we can take with files.
4. There are certain patterns used for file processing in different languages (while (<IN>) { ... }).
5. There are command-line invocations we can use for doing things with files from the outside.
6. Files can be mailed, or attached.
7. Files can be documents. Documents can be managed.
8. A file is an analogy to something kept by businesses in manila envelopes.
9. It's not a very good analogy.

So here are the "neighboring concepts" to "file": name, size, modification, creation, date, contents (container), string, number, parts, main, extension, type, table or database, "focus" (because the contents of a file are sort of the focus of the concept), text, binary, human-readable, human, reading, maybe even parsing, binary, packed structure, structure, series or sequence, command, programming language, language, programming, code, open, read, modify, write, close, delete, action, pattern, "while (<IN>) {...}", command line, command, mail, mailing, attachments, document, document management, analogy, the business meaning of "file".

And that's just what I can think of off the top of my head. Each of the key words I used in that description are just as complex in their own right - the whole point of a semantic approach is that concepts don't break down into simpler concepts; they derive their meaning from the network of relationships with other concepts, each of which can be seen as a world in itself.

That means it's going to be difficult or even impossible to draw a line around a hypothetical semantic programming system, saying, "This is standard knowledge." Leaving anything out will lead to increased levels of Martian logic. So we'll be walking a tightrope.

The thing to do at this point is to consider the above set of semantic information, and to imagine (1) how it might be expressed in a specification language and (2) what categories of information might be expected to appear in a given unit's definition.

It's interesting to note that some of the concepts above are more or less specific to programming or computer systems ("packed structure"), while others are more general in nature ("focus") - that's going to be key, I think.

No comments:

Post a Comment