idlcomp/THOUGHTS

   1 5/23/04:
   2
   3 What to do about replacing symbols in later passes?  A complete
   4 replacement will break existing pointers, and a symbolic reference
   5 would require substantial code changes.  I'll probably produce output
   6 after each pass and wipe the symbol table; this is less efficient,
   7 but similar to how the code was originally intended to work.
   8
   9 How to determine if a pass 2 is needed if the symbol lookup succeded
  10 due to stale external data that would be overwritten?  Perhaps
  11 eliminate the --overwrite option; then it wouldn't be possible,
  12 but it could be annoying to have to wipe everything before each
  13 compilation.  It would pretty much enforce one batch of source IDLs
  14 per output directory, though that's not necessarily a bad thing.
  15
  16 In fact, I think I'll just make --overwrite act on the entire output
  17 directory; if it's not set, you get an error if there's *anything* in
  18 there.
  19
  20 Now, what happens when you get around to declaring a symbol in a
  21 later pass, which has been loaded from the fs already?  If the only
  22 thing the linkage is used for is to generate a name in the output,
  23 then just replace it and let the old version stay refcounted.  Is
  24 there enough in the pass1 output for error checking, other than
  25 constant ranges?  I think so.
  26
  27 When will circular inheritance checks be done?  That'll require the
  28 ability to compare references, meaning we can't just go replacing
  29 things.  Other than that, I think it can be done at the end of pass
  30 2.  So instead of replacing, I'll just add information to existing
  31 objects (which means I get to go fix all the places where that sort
  32 of work is done in the constructor).
  33
  34 5/25/04:
  35
  36 In conclusion on the above, no replacement is done for future passes.
  37 Constructor calls have been replaced with calls to a static declare()
  38 function, which will either call a constructor or return the existng
  39 one (or complain if there's a real conflict (i.e. within this pass or
  40 with an external symbol)), as well as possibly initialize some data.
  41
  42 Still to do:
  43
  44 Implement pass 3
  45 Finish implementing output (including sanity check for incomplete
  46 data).
  47 Nuke --overwrite, and complain if anything is in the target
  48 directory.
  49
  50 8/1/04:
  51
  52 Most vtable-related design is done.  The GUID pointers will have to
  53 be dynamically generated somehow (either by the dynamic linker or by
  54 startup code), to make sure the same pointer is used in all
  55 components.
  56
  57 The compiled type representation will break badly on a
  58 case-insensitive filesystem.  This is already seen in the current IDL
  59 files.  Some sort of alternate mapping will be needed.  Also, it
  60 looks like the performance of output generation will be fairly poor
  61 under UFS on OS X; even with the current set of IDL it takes 1/4
  62 second to generate all output.  Not that it was expected to perform
  63 well on legacy filesystems, but still, I wonder how long it will take
  64 on the full set of IDL...
  65
  66 9/21/04:
  67
  68 Enum and bitfield inheritance may be useful...
  69
  70 9/22/04:
  71
  72 YYError() should probably be turned into UserError()...
  73
  74 9/25/04:
  75
  76 Or more specifically, into something like RecoverableUserError().
  77
  78 12/7/04:
  79
  80 Arrays need some more thought, specifically multi-dimensional
  81 and inline arrays, and how they interact with typedefs.  Currently,
  82 multi-dimensional arrays are simply not supported, but what happens
  83 if you typedef an array, and then create an array of that?  It's
  84 accepted at the moment, and if you accept that, why not regular
  85 multi-dimensional arrrays?  Plus, with out-of-line arrays
  86 multi-dimensional arrays cannot be created simply by multiplying the
  87 sizes of each dimension.  Support should be added.
  88
  89 12/21/04:
  90
  91 A separate type of reference will likely be needed for persistent
  92 entities, as the overhead would be too much to always do it.  This
  93 would also allow an entity to be locked against decaching (but not
  94 ordinary swap-out) by acquiring a non-persistent reference.
  95
  96 If one is willing to disallow such locking, persistence could simply
  97 be an attribute of a type, but you'd still have problems with
  98 references to embedded types; a persistent type should be able to
  99 contain non-persistent types (either inline or by reference).
 100
 101 One implementation of persistence would be for a persistent reference
 102 to have two states.  An uncached reference consists of a
 103 non-persistent reference to a storage object (or perhaps a cache
 104 object backed by a storage object).  A cached reference is like a
 105 normal, non-persistent reference.  The state would have to be checked
 106 on every dereference.  If it is found to be uncached, the entity is
 107 retrieved (either from storage, or from cache (it may have gotten
 108 there via another reference)), the state of the reference is changed,
 109 and the reference is added to a list to be swept when trying to
 110 decache the object.  Something would need to be done to prevent races
 111 with asynchronous decaching (perhaps an in-use bit or refcount in the
 112 reference).  However, implementing such a mechanism would be
 113 difficult on top of an ordinary language.
 114
 115 An alternative, which is less "automatic" from a programmer's
 116 perspective, but still much better than the current state of things,
 117 is to have the programmer always acquire an ordinary reference before
 118 dereferencing (essentially, the in-use refcount of the previous
 119 mechanism would be managed manually or by garbage collection).  The
 120 programmer can choose whether to keep the ordinary reference around
 121 (which favors simplicity, determinism, speed) or the storage
 122 reference (which minimizes memory consumption and requires more
 123 programmer and CPU time to acquire a usable reference more often).
 124
 125 The difference between this and simply having serialize/deserialize
 126 methods is that you would receive the same entity address if you
 127 convert a storage reference multiple times.  This causes a problem if
 128 you do this from different address spaces, though.  Shared memory is
 129 a possibility, but it would be unsuitable in many circumstances due
 130 to either races or memory wastage (you'd pretty much need to allocate
 131 a page per entity, so that access can be controlled precisely (you
 132 shouldn't be able to access entity B just because some other process
 133 has it in the same page as entity A to which you do have access
 134 rights, and you can't move one of them to another page without
 135 breaking the other process's references)).
 136
 137 12/25/04:
 138
 139 Security issues need some more thought.  In particular, how to handle
 140 the case where the rights of multiple processes are needed to do
 141 something, with no one process fully trusted with all of those
 142 rights.  If you just pass a handle to one process, and don't have any
 143 further restrictions, then it can do other things with that handle,
 144 long after it's returned.  Delegates would allow it to be limited to
 145 one method, and handle revocation would be nice as well.  However, it
 146 could still be more privilege than was intended to be granted.
 147
 148 To be fully secure, one-time-use objects could be created that only
 149 allow a certain, specific operation, but that would have too much
 150 overhead in many cases.
 151
 152 12/28/04:
 153
 154 An ordinary reference has certain rights associated with it, and
 155 these rights are transfered to the callee when the reference is
 156 passed.  For persistent references, only the namespace lookup
 157 permission is bypassed; unserializing (or serializing) the object
 158 requires whatever capability token has been set for the relevant
 159 operation.  I don't think it would be worthwhile to implement a third
 160 type of reference that is unserialized but without privilege grant;
 161 if one wants that, one could make a custom storage object that
 162 doesn't actually serialize anything, but just hands out the
 163 real reference upon presentation of the right capability token.
 164
 165 Access revocation is important for making sure the callee doesn't
 166 hold onto the reference longer than it is supposed to (especially if
 167 the access rights change after the call).  However, it cannot be
 168 determined automatically how long to allow a call-granted reference.
 169 Many calls may only need it for the duration of the call, but some
 170 will need to hold the reference longer.  The reference also must be
 171 revoked if the caller's access to that object is revoked
 172 (technically, it could remain if the callee has another
 173 path-to-privilege, but it may not want to, if the action it takes
 174 assumes that the caller had privilege to carry out the action).
 175
 176 Implementing access revocation requires that we either say fuck-you
 177 to the app and make it unserialize again if it does happen to have an
 178 alternate path-to-privilege (I believe this is what Unix does), or
 179 somehow link the unserialized entity to the persistent reference, and
 180 give it a chance to prove that it's allowed to retain the reference.
 181 I greatly favor the latter approach; though it's more complicated to
 182 implement, going the other way will make lots of apps either buggy or
 183 hideously complicated.
 184
 185 Alternatively, a reference could be more tightly bound to the exact
 186 path-to-privilege, requiring the app to explicitly specify which
 187 source(s) of privilege to consider.  This has benefits in avoiding
 188 odd races where an app would have asked the user for a password to
 189 elevate privilege, but didn't because it happened to have a given
 190 authority already for some other reason, but which got revoked before
 191 the operation completed.  It'd also be nice in general in helping
 192 server processes manage inherited permissions sanely.  It'd open the
 193 multiple-references-per-object can of worms, in that a single address
 194 space could have references to the same object compare unequal (or
 195 else have a more complicated comparison operation than simply
 196 checking the reference pointer).
 197
 198 Aah, fuck it.  If you pass a reference to a task, you're trusting it
 199 not to do bad stuff with it.  If you can't give it that trust, send
 200 it a more limited reference.  The major time you'd really want to do
 201 a revocation is when the access rights to an object change, and the
 202 fuck-you-legitimate-reference-holder approach could be sufficient for
 203 the case where the owner of the object is pretty sure there are no
 204 legitimate references remaining.  Existing OSes don't handle anything
 205 beyond that very well AFAIK, so if I come up with anything better
 206 it'll be a bonus, but I'm not too worried.
 207
 208 The problem with the trust-it approach is that it's harder to know
 209 who you're talking to in a polymorphic OS; all you really know
 210 (without excessive ORB queries) is the interface type.  The trust
 211 level for the implementation will often be zero, and (just about)
 212 anything that can be done to limit leakage of privilege is a good
 213 thing.  Oh well, we'll see how it turns out after further API design.
 214 It might turn out to not be such a big deal, and I need to get on
 215 with making stuff work.
 216
 217 2/1/05: GCC on PPC violates the SYSV ABI by not returning small
 218 structs in registers.  This could have a noticeable performance
 219 impact given that object references are really small structs.
 220 While fixing it for existing OSes is unlikely due to existing
 221 binaries, perhaps it should be fixed for this OS while it still
 222 can be...
 223
 224 3/13/05: Typedefs are not going to be allowed for interfaces.  The
 225 immediate, selfish reason for this is that allowing them would cause
 226 some minor ugliness in the idlc code that I'd rather avoid (it's ugly
 227 enough already).  However, I'm having a hard time thinking of
 228 legitimate uses for them compared to inheritance.  If such a use
 229 comes up, it can be re-allowed later.  Or maybe I'll implement it
 230 soon, and consider it a FIXME until then.
 231
 232 3/19/05: Oops.  There was an ambiguity in the IDL, in that the
 233 double-dot was used both as a global namespace prefix and as a range
 234 specifier.  This wasn't caught before, because idlc wasn't allowing
 235 namespace-qualified constants.  The range specifier is now a
 236 triple-dot.
 237
 238 3/20/05: The memory management scheme is *still* really screwed up;
 239 an interface declared in the namespace of a superinterface (and
 240 similar constructs) will cause reference loops.  I think when it
 241 finally gets to the point that I try to make memory management
 242 actually work right (which probably won't be until this code is made
 243 library-able) I'll just declare the entire tree to be a monolithic
 244 entity, freed in one go when it is no longer needed.  Reference
 245 counting could still be used for things that aren't part of the tree,
 246 like strings and lists.
 247
 248 5/18/05: I'm thinking of allowing actual return values instead of
 249 using only out parameters (even with overloaded return-the-last-
 250 out-parameter features of language bindings).  It would more clearly
 251 express the intent of the programmer to designate one of the out
 252 parameters as a return value, and it would make it easier to take
 253 advantage of an ABI's return value registers (instead of always using
 254 pointers, or continuing the last-out-param hack at the function
 255 pointer level).
 256
 257 Enums in C++ will be typesafe against assigning one initialized enum
 258 to an enum of a different type; however, it doesn't look like it will
 259 be able to be made safe against initializing an enum with a const
 260 initializer from a different enum type, at least not without breaking
 261 things like switch.  Languages such as D should be able to do it
 262 properly with strong typedefs.
 263
 264 GCC is refusing to do CSE on upcasts, even with const all over the
 265 place; this means that repeatedly calling methods from a derived
 266 interface will be less efficient than casting once to the parent
 267 interface and using that.  At some point, this should be resolved,
 268 but that's an optimization issue which can wait (and it may require
 269 compiler changes to let the compiler know that the data *really* is
 270 not going to change, ever, by anyone (apart from initialization which
 271 happens before the code in question runs)).  Maybe LLVM will do
 272 better.  CSE on downcasts would be nice too.
 273
 274 5/20/05: I've decided against method return values.  The efficiency
 275 part is a minor optimization, and would be better handled with an ABI
 276 that can handle out parameters directly (thus giving you many return
 277 registers).  Of course, switching ABIs will be painful, but it's
 278 probably going to happen a few times anyway during the early phases
 279 of development.  As for "express[ing] the intent of the programmer",
 280 it's really not that big of a deal.  Eventually, instead of having
 281 last-out-param hacks like the C++ binding, a language could allow
 282 some keyword or symbol to replace one (or more) arguments, causing
 283 them to be treated as return values.
 284
 285 It has nothing to do with me being lazy.  No, not at all.  *whistling
 286 and walking away*
 287
 288 It should be possible to disable async as an implementation
 289 attribute, so that in-process wrappers can execute directly (e.g.
 290 FileStream's async methods directly sending out an async method to
 291 the file, rather than requiring both steps to be async).
 292
 293 5/23/05: FileStream's methods probably should be async anyway,
 294 and then either call a sync method, or provide its own notifier.
 295 That way, it can keep the position correct if the read or write did
 296 not fully succeed.  It'll also need to keep all operations strictly
 297 ordered, so if async calls are used, it needs a message serializer
 298 object.
 299
 300 See update 7/02/06.
 301
 302 10/04/05: There should be a way to check whether a pointer to a
 303 virtual struct is of the most derived type.
 304
 305 7/02/06: FileStream no longer exists as an interface; instead, an
 306 object combining Seekable, HasFile, and the apppropriate stream
 307 interface(s) (which have both sync and async methods) should be used.
 308 This object will usually be local, so async isn't an issue, but it
 309 can be used remotely if it's really needed to synchronize the file
 310 offset pointer across multiple address spaces.