File: THOUGHTS

package info (click to toggle)
tom 1.1.1-2
links: PTS
area: main
in suites: potato
size: 6,340 kB
ctags: 2,244
sloc: objc: 27,863; ansic: 9,804; sh: 7,411; yacc: 3,377; lex: 966; asm: 208; makefile: 62; cpp: 10
file content (223 lines) | stat: -rw-r--r-- 12,924 bytes
-*- fill-prefix: "    " -*-

  1 [Thu Jan 11 15:10:23 1996] Created.

  2 [Thu Jan 11 15:10:43 1996] A little something on (implicit) up and
    down casting, together with the problem of casting through mutability.
    The latter is the problem of being able to address an Array as a
    MutableArray, if the latter plainly is a subclass of the former and
    downcasting is allowed.  Obviously, upcasting is always allowed: if
    Foo is a Bar, then one can pass a Foo where a Bar is expected.

    A possible solution for the mutable subclass problem seems to be to
    extend the language to be able to have two classes: `Array' and
    `mutable Array', and to not allow downcasting trough a mutable
    qualification.  However, it is unclear what kind of implications this
    will have.

    Another possibility is the following: Up to now, I always thought that
    casting an A to a B was allowable if they had a common subclass, since
    then it is possible for something assumed to be an A to actually be a
    C which also happens to be a subclass of B.  However, this can only be
    true if not both A and B are state instances.  Thus, either A is a
    state instance, or B is, but not both.  I guess that the proper use of
    behaviour added to state instances could solve the mutable subclass
    problem, though I still have to work this out.

  3 [Thu Jan 11 17:51:45 1996] There exists public, protected and private
    at the level of classes.  Is it desirable for such hiding to exist at
    the level of modules/units/packages/insert-your-favourite-name?

  4 [Thu Jan 11 22:50:08 1996] One moment to stop and write about units,
    now I have written a bit of them, and the parser.  A unit file
    declares the classes and extensions present in the unit.  The compiler
    resolves qualified class names using the unit definition files.  The
    resolver will use them to know what to collect from

  5 [Fri Jan 12 13:04:43 1996] Nice abstraction, that SmallTalk `Process'
    class.

  6 [Wed Jan 17 14:50:19 1996] When having built a system to include
    condition checking, the post conditions of methods not catching an
    exception must also be checked.  This probably implies that including
    condition checking implies recompilation (at least of those parts of
    which conditions are to be checked?).

  7 [Wed Jan 17 16:36:23 1996] Intitutionalize uniqued strings, which
    actually employ a subclass of the normal (non-mutable) strings, and
    which override `hash'.  Hashing (strings) repeatedly while searching
    through several dictionaries (or sets) is a waste of time.

  8 [Sat Jan 20 19:42:12 1996] [The results of Michael and Tiggr visiting
    the Amsterdam Arena.]  Multiple state inheritance, as implemented in
    C++, has the disadvantage that for `FooBar', inheriting from `Foo' and
    `Bar', the value of `self' (or `this' in C++ speak) is different in
    methods implemented by `Foo' and those implemented by `Bar'.  In fact,
    passing a `FooBar' where a `Bar' is expected may not pass a pointer to
    the `FooBar' but to the `Bar' part of it.  In short, if I have a
    `FooBar', pass it as a `Bar' and later retrieve it (thus as a `Bar'),
    I won't be able to handle it as a `FooBar', even though it is, or
    better, was.

    Thus, in short, the problem is that the value of `self' is not
    identical in all methods which can ever be invoked for a given
    instance of a class having multiple state superclasses.

    A possible solution to this problem is to access instance variables
    indirectly, much like pointers to method implementations are retrieved
    indirectly.  The latter are keyed on the selector of the method being
    invoked.  Instance variables can be related to a unique value, which
    can then be used to retrieve the offset from `self' to the desired
    instance variable.  This value is called the `ivar index value' (iiv).
    Of course, for every class, all instance variables, including those
    inherited, must have a unique index value.  In the context of dynamic
    loading, which can introduce new classes, this implies that all
    instance variables of all classes must have a unique index value and
    also that the index value can not be precomputed for dynamically
    loaded code.  In fact, libraries, which are not built in the context
    of the resulting application executable also can not use precomputed
    iiv's.  If this doesn't seem obvious, consider two libraries, each
    having been built in an empty context, which are then used in the same
    application.

    If multiple inheritance semantics resulting from this mechanism for
    instance variable accessing imply that repeated inheritance is always
    shared inheritance.  This seems not illogical, since copied
    inheritance raises the question whether the distinction between the
    `is-a' and `part-of' hierarchies has been garbled.

    Another observation, resulting from having an identical `self' in all
    methods invokable for a given object, is that it implies maximum code
    reuse, as opposed to the source reuse of Eiffel, where inheritance
    sharing can be defined for each distinct instance variable, but only
    if the long form of the superclasses is available (the short form does
    not suffice for this).

    A (raw, imprecise, header file based, not showing `hidden' classes)
    count of instance variables in Objective-C class libraries shows +/-
    150 for <foundation/foundation.h>, +/- 800 for the <appkit/appkit.h>,
    +/- 300 for <dbkit/dbkit.h>, +/- 200 for <eoaccess/eoaccess.h>, and
    +/- 200 for <eointerface/eointerface.h>.

    This implies that, for a normal EOF application, which uses appkit,
    foundation, eoaccess, and eointerface, there are more that 1000
    instance variables.  Clearly, a straight table for +/- 200 classes in
    that application would induce too high a memory penalty: 200 * 1000 *
    4 = 800k of memory.

    Fortunately, the instance variable index tables are very sparse, and
    only those locations containing information are actually being used.
    This means that tables can be shared as long as they do not contain
    conflicting information.  Such conflicts arise because of multiple
    inheritance: The tables for `Foo' and `Bar' can not be merged and used
    for `FooBar', since the offsets for the ivars from one of the classes
    will have shifted according to the size of the other.  Observe that in
    a single state inheritance lattice, a single table can be used.

  9 [Sun Jan 21 14:38:18 1996] A quick test (see test-iit.c) on the m68k
    shows that, with SELF on the stack, a direct ivar read reference is
    two (memory referencing) instuctions.  A reference through the ivar
    index table (iit) using a precomputed iiv is two instructions extra
    (both memory referencing, though GCC 2.7.0 throws in an extra, and
    unnecessary, register-to-register transfer); in case the iiv is not
    precomputed another extra reference is needed.  Of course, these
    numbers change with register allocation and CSE activity.

 10 [Sun Jan 21 21:50:35 1996] Running a little test on C++, it shows that
    a direct ivar reference is equally expensive (obviously), being 2
    instructions.  When accessing a member of a virtual superclass, one
    extra instruction is needed.  This is cheaper than using a precomputed
    iiv, but at the expense of the virtual superclass table pointer in
    each instance.

    In fact, given that virtual inheritance or virtual methods are not
    that uncommon, the following statement can be assumed to be generic: a
    class inheriting from multiple superclasses not only needs a (pointer
    to) a virtual table itself, but also inherits a (pointer to) the
    virtual table from each of its superclasses.  This implies that, for a
    class A: B, C, each of which need a virtual table, 12 bytes per
    instance are needed for table maintenance.

 11 [Sun Jan 21 22:20:58 1996] Obviously, the size of the tom ivar index
    tables can be greatly reduced by providing only one entry per class
    introducing new instance variables.  However, this implies that, for
    all but the first ivar of each class, the offset must be computed by
    adding the `class ivar offset' retrieved from the iit and the offset
    of the desired ivar, which is known at compile time.  Thus, it is
    questionable whether the extra add instruction is desirable and worth
    it for saving some memory in the iit's.

    Actually, if the iit holds an entry per ivar (not per class) and if
    we're not using precomputed iiv's, recompilation of all existing
    methods for a class is not necessary if instance variables are added,
    since the global variables already holding, at run time, the iit's
    will not have changed.  Of course, resolving and linking must still be
    redone.

 12 [Tue Jan 23 17:14:36 1996, tiggr@tom] Another thing about dynamic ivar
    binding: Currently, I expect that all stuff needed by the runtime to
    be created by the resolver in one big C file, and that all selectors
    and iit's wil be close to their own species and ordered by class.
    This implies such a locality of reference that it will increase the
    CPU's cache efficiency, compared to the case where all stuff needed
    would be put in the C file resulting from compiling the tom file in
    which the selector or ivar was defined.

 13 [Wed Apr 24 01:35:23 1996, tiggr@tricky.es.ele.tue.nl] Instance-less
    classes can be useful.

 14 [Wed Apr 24 01:35:34 1996, tiggr@tricky.es.ele.tue.nl] Optional parts
    in method names can be useful.  The idea is that every argument can be
    assigned a default value.  This value will be used (at compile time!)
    to complement the argument provided by the caller.  Thus, actual
    shorter selectors will not be used (this raises the question what to
    do with subclasses defining methods with more optional parts).

 15 [Wed Apr 24 01:55:04 1996, tiggr@tricky.es.ele.tue.nl] Names must be
    fully scoppable.  Thus, just like class names can be scoped by the
    unit, any element defined for a class must be scopable by the class
    name.  Clashes between instance and class elements become an error.

 16 [Sun Apr 28 00:26:57 1996, tiggr@tricky.es.ele.tue.nl] Suppose you
    have a FileStream---an open file abstraction.  It is a subclass of the
    fundamental stream abstraction, BasicStream.  However, the BasicStream
    has many implementations on different platforms: BasicUnixStream is
    different from BasicVMSStream.  Still, you always want FileStream to
    be a subclass of the actual implementation of the BasicStream.  Posing
    (by the actual implementing class for this environment) solves this
    problem (that FileStream never needs to inherit from a stream for a
    specific environment, but can inherit the generic BasicStream), but
    the question is if it suffices?  Another question is what problem I'm
    trying to solve, and that every extension to the language in solving
    problems like this isn't an implementation of a particular use of
    cpp...

 17 [Mon Apr 29 23:35:35 1996, tiggr@tricky.es.ele.tue.nl] What if the two
    implicit arguments SELF and CMD are put in a tuple to be a single
    first implicit argument typed `(id, selector)'?

 18 [Tue Jan 28 22:20:37 1997, tiggr@tricky.es.ele.tue.nl] Units as the
    unit of linkage does not suffice.  For example, when DCE is available
    on HP-UX, the Thread class in the tom unit will provide an abstraction
    to the threads provided by DCE, and every TOM program will need to be
    linked against libdce.  As a result, all kinds of library and system
    calls will come with a significant penalty due to the locking and
    unlocking of critial regions (even when running single-threaded but
    that shows only part of libdce's braindeadness).

    Units should provide only a namespace.  The unit of linkage should be
    the class or extension, commonly packed in an object file.  Should
    this be implemented, a mechanism is needed to indicate which classes
    and extensions are needed by some class or extension.  Obviously,
    using a class implicitly needs the class' superclasses and invoking a
    method which is implemented by an extension marks that extension as
    needed.

    For example, in the current setup of the TOM unit, the Thread class is
    needed when the method `performInThread' provided by the All instance
    is used.  This suggests that this method should be provided by an
    optional extension of All which is depended upon by the Thread class,
    as the means of starting new threads, and which itself depends on the
    Thread class to provide its functionality.  Furthermore, the Thread
    class should have linkage information that makes libdce linked in when
    the Thread class is.