5/23/04: What to do about replacing symbols in later passes? A complete replacement will break existing pointers, and a symbolic reference would require substantial code changes. I'll probably produce output after each pass and wipe the symbol table; this is less efficient, but similar to how the code was originally intended to work. How to determine if a pass 2 is needed if the symbol lookup succeded due to stale external data that would be overwritten? Perhaps eliminate the --overwrite option; then it wouldn't be possible, but it could be annoying to have to wipe everything before each compilation. It would pretty much enforce one batch of source IDLs per output directory, though that's not necessarily a bad thing. In fact, I think I'll just make --overwrite act on the entire output directory; if it's not set, you get an error if there's *anything* in there. Now, what happens when you get around to declaring a symbol in a later pass, which has been loaded from the fs already? If the only thing the linkage is used for is to generate a name in the output, then just replace it and let the old version stay refcounted. Is there enough in the pass1 output for error checking, other than constant ranges? I think so. When will circular inheritance checks be done? That'll require the ability to compare references, meaning we can't just go replacing things. Other than that, I think it can be done at the end of pass 2. So instead of replacing, I'll just add information to existing objects (which means I get to go fix all the places where that sort of work is done in the constructor). 5/25/04: In conclusion on the above, no replacement is done for future passes. Constructor calls have been replaced with calls to a static declare() function, which will either call a constructor or return the existng one (or complain if there's a real conflict (i.e. within this pass or with an external symbol)), as well as possibly initialize some data. Still to do: Implement pass 3 Finish implementing output (including sanity check for incomplete data). Nuke --overwrite, and complain if anything is in the target directory. 8/1/04: Most vtable-related design is done. The GUID pointers will have to be dynamically generated somehow (either by the dynamic linker or by startup code), to make sure the same pointer is used in all components. The compiled type representation will break badly on a case-insensitive filesystem. This is already seen in the current IDL files. Some sort of alternate mapping will be needed. Also, it looks like the performance of output generation will be fairly poor under UFS on OS X; even with the current set of IDL it takes 1/4 second to generate all output. Not that it was expected to perform well on legacy filesystems, but still, I wonder how long it will take on the full set of IDL... 9/21/04: Enum and bitfield inheritance may be useful... 9/22/04: YYError() should probably be turned into UserError()... 9/25/04: Or more specifically, into something like RecoverableUserError(). 12/7/04: Arrays need some more thought, specifically multi-dimensional and inline arrays, and how they interact with typedefs. Currently, multi-dimensional arrays are simply not supported, but what happens if you typedef an array, and then create an array of that? It's accepted at the moment, and if you accept that, why not regular multi-dimensional arrrays? Plus, with out-of-line arrays multi-dimensional arrays cannot be created simply by multiplying the sizes of each dimension. Support should be added. 12/21/04: A separate type of reference will likely be needed for persistent entities, as the overhead would be too much to always do it. This would also allow an entity to be locked against decaching (but not ordinary swap-out) by acquiring a non-persistent reference. If one is willing to disallow such locking, persistence could simply be an attribute of a type, but you'd still have problems with references to embedded types; a persistent type should be able to contain non-persistent types (either inline or by reference). One implementation of persistence would be for a persistent reference to have two states. An uncached reference consists of a non-persistent reference to a storage object (or perhaps a cache object backed by a storage object). A cached reference is like a normal, non-persistent reference. The state would have to be checked on every dereference. If it is found to be uncached, the entity is retrieved (either from storage, or from cache (it may have gotten there via another reference)), the state of the reference is changed, and the reference is added to a list to be swept when trying to decache the object. Something would need to be done to prevent races with asynchronous decaching (perhaps an in-use bit or refcount in the reference). However, implementing such a mechanism would be difficult on top of an ordinary language. An alternative, which is less "automatic" from a programmer's perspective, but still much better than the current state of things, is to have the programmer always acquire an ordinary reference before dereferencing (essentially, the in-use refcount of the previous mechanism would be managed manually or by garbage collection). The programmer can choose whether to keep the ordinary reference around (which favors simplicity, determinism, speed) or the storage reference (which minimizes memory consumption and requires more programmer and CPU time to acquire a usable reference more often). The difference between this and simply having serialize/deserialize methods is that you would receive the same entity address if you convert a storage reference multiple times. This causes a problem if you do this from different address spaces, though. Shared memory is a possibility, but it would be unsuitable in many circumstances due to either races or memory wastage (you'd pretty much need to allocate a page per entity, so that access can be controlled precisely (you shouldn't be able to access entity B just because some other process has it in the same page as entity A to which you do have access rights, and you can't move one of them to another page without breaking the other process's references)). 12/25/04: Security issues need some more thought. In particular, how to handle the case where the rights of multiple processes are needed to do something, with no one process fully trusted with all of those rights. If you just pass a handle to one process, and don't have any further restrictions, then it can do other things with that handle, long after it's returned. Delegates would allow it to be limited to one method, and handle revocation would be nice as well. However, it could still be more privilege than was intended to be granted. To be fully secure, one-time-use objects could be created that only allow a certain, specific operation, but that would have too much overhead in many cases. 12/28/04: An ordinary reference has certain rights associated with it, and these rights are transfered to the callee when the reference is passed. For persistent references, only the namespace lookup permission is bypassed; unserializing (or serializing) the object requires whatever capability token has been set for the relevant operation. I don't think it would be worthwhile to implement a third type of reference that is unserialized but without privilege grant; if one wants that, one could make a custom storage object that doesn't actually serialize anything, but just hands out the real reference upon presentation of the right capability token. Access revocation is important for making sure the callee doesn't hold onto the reference longer than it is supposed to (especially if the access rights change after the call). However, it cannot be determined automatically how long to allow a call-granted reference. Many calls may only need it for the duration of the call, but some will need to hold the reference longer. The reference also must be revoked if the caller's access to that object is revoked (technically, it could remain if the callee has another path-to-privilege, but it may not want to, if the action it takes assumes that the caller had privilege to carry out the action). Implementing access revocation requires that we either say fuck-you to the app and make it unserialize again if it does happen to have an alternate path-to-privilege (I believe this is what Unix does), or somehow link the unserialized entity to the persistent reference, and give it a chance to prove that it's allowed to retain the reference. I greatly favor the latter approach; though it's more complicated to implement, going the other way will make lots of apps either buggy or hideously complicated. Alternatively, a reference could be more tightly bound to the exact path-to-privilege, requiring the app to explicitly specify which source(s) of privilege to consider. This has benefits in avoiding odd races where an app would have asked the user for a password to elevate privilege, but didn't because it happened to have a given authority already for some other reason, but which got revoked before the operation completed. It'd also be nice in general in helping server processes manage inherited permissions sanely. It'd open the multiple-references-per-object can of worms, in that a single address space could have references to the same object compare unequal (or else have a more complicated comparison operation than simply checking the reference pointer). Aah, fuck it. If you pass a reference to a task, you're trusting it not to do bad stuff with it. If you can't give it that trust, send it a more limited reference. The major time you'd really want to do a revocation is when the access rights to an object change, and the fuck-you-legitimate-reference-holder approach could be sufficient for the case where the owner of the object is pretty sure there are no legitimate references remaining. Existing OSes don't handle anything beyond that very well AFAIK, so if I come up with anything better it'll be a bonus, but I'm not too worried. The problem with the trust-it approach is that it's harder to know who you're talking to in a polymorphic OS; all you really know (without excessive ORB queries) is the interface type. The trust level for the implementation will often be zero, and (just about) anything that can be done to limit leakage of privilege is a good thing. Oh well, we'll see how it turns out after further API design. It might turn out to not be such a big deal, and I need to get on with making stuff work. 2/1/05: GCC on PPC violates the SYSV ABI by not returning small structs in registers. This could have a noticeable performance impact given that object references are really small structs. While fixing it for existing OSes is unlikely due to existing binaries, perhaps it should be fixed for this OS while it still can be... 3/13/05: Typedefs are not going to be allowed for interfaces. The immediate, selfish reason for this is that allowing them would cause some minor ugliness in the idlc code that I'd rather avoid (it's ugly enough already). However, I'm having a hard time thinking of legitimate uses for them compared to inheritance. If such a use comes up, it can be re-allowed later. Or maybe I'll implement it soon, and consider it a FIXME until then. 3/19/05: Oops. There was an ambiguity in the IDL, in that the double-dot was used both as a global namespace prefix and as a range specifier. This wasn't caught before, because idlc wasn't allowing namespace-qualified constants. The range specifier is now a triple-dot. 3/20/05: The memory management scheme is *still* really screwed up; an interface declared in the namespace of a superinterface (and similar constructs) will cause reference loops. I think when it finally gets to the point that I try to make memory management actually work right (which probably won't be until this code is made library-able) I'll just declare the entire tree to be a monolithic entity, freed in one go when it is no longer needed. Reference counting could still be used for things that aren't part of the tree, like strings and lists. 5/18/05: I'm thinking of allowing actual return values instead of using only out parameters (even with overloaded return-the-last- out-parameter features of language bindings). It would more clearly express the intent of the programmer to designate one of the out parameters as a return value, and it would make it easier to take advantage of an ABI's return value registers (instead of always using pointers, or continuing the last-out-param hack at the function pointer level). Enums in C++ will be typesafe against assigning one initialized enum to an enum of a different type; however, it doesn't look like it will be able to be made safe against initializing an enum with a const initializer from a different enum type, at least not without breaking things like switch. Languages such as D should be able to do it properly with strong typedefs. GCC is refusing to do CSE on upcasts, even with const all over the place; this means that repeatedly calling methods from a derived interface will be less efficient than casting once to the parent interface and using that. At some point, this should be resolved, but that's an optimization issue which can wait (and it may require compiler changes to let the compiler know that the data *really* is not going to change, ever, by anyone (apart from initialization which happens before the code in question runs)). Maybe LLVM will do better. CSE on downcasts would be nice too. 5/20/05: I've decided against method return values. The efficiency part is a minor optimization, and would be better handled with an ABI that can handle out parameters directly (thus giving you many return registers). Of course, switching ABIs will be painful, but it's probably going to happen a few times anyway during the early phases of development. As for "express[ing] the intent of the programmer", it's really not that big of a deal. Eventually, instead of having last-out-param hacks like the C++ binding, a language could allow some keyword or symbol to replace one (or more) arguments, causing them to be treated as return values. It has nothing to do with me being lazy. No, not at all. *whistling and walking away* It should be possible to disable async as an implementation attribute, so that in-process wrappers can execute directly (e.g. FileStream's async methods directly sending out an async method to the file, rather than requiring both steps to be async). 5/23/05: FileStream's methods probably should be async anyway, and then either call a sync method, or provide its own notifier. That way, it can keep the position correct if the read or write did not fully succeed. It'll also need to keep all operations strictly ordered, so if async calls are used, it needs a message serializer object. See update 7/02/06. 10/04/05: There should be a way to check whether a pointer to a virtual struct is of the most derived type. 7/02/06: FileStream no longer exists as an interface; instead, an object combining Seekable, HasFile, and the apppropriate stream interface(s) (which have both sync and async methods) should be used. This object will usually be local, so async isn't an issue, but it can be used remotely if it's really needed to synchronize the file offset pointer across multiple address spaces.