MSS Code Factory: A Fractal Programming System    Get MSS Code Factory at SourceForge.net. Fast, secure and Free Open Source software downloads

Prev Documentation Home Next

MSS Code Factory: Design of the Manufactured Code

During the 9 years of MSS Code Factory development that followed the initial 9 years of algorithm research (yes, 18 years in total), several different architectural approaches to the manufactured code were tried. I also had exposure to different architectural approaches for systems design over the years, and the code manufactured by MSS Code Factory tries to incorporate a "best of breed" approach from all those different approaches to building scalable applications. When you manufacture your code, you gain all of these architectural benefits without training a single junior programmer to implement them.

One of the first decisions was that using a "framework" methodology was too limiting, regardless of who provided or developed the framework in question. For example, XDoclet2/Hibernate was a very powerful and well documented framework, but it limited the scope of the manufactured code to JEE servers only. As a result, you'll only find the slim library of very basic common code in the LGPLv3 licensed CFLib package. This is the only fragment of code not manufactured by the system itself which has to be imported and linked in order for the code to run; CFCore is optional and only required if you are manufacturing a customized expert system.

Java itself imposes some semantic standards as well, in particular the way getters and setters for attributes are named and typed. As these standards are critical to Java's "introspection" code components, the manufactured code follows this style instead of re-inventing the wheel.

The Importance of the SchemaObj

The SchemaObj is the most important piece of the manufactured code, tying together the table objects and their implementations to the implementation of a data cache by the SchemaObj itself.

The SchemaObj objects produced incorporate management of the database connection used to work with the schema data. You can have as many SchemaObj instances as you like in a Java process; some programmers might even want to code one SchemaObj for each window of a client-server system as an alternative to flushing the cache, so that each window can only be individually stale, rather than the whole system. Be aware that the SchemaObj approach is not ideal for this type of coding, as there is no support for common or shared objects such as lookups at this time, which means every window would be reloading its referenced lookups seperately, slowing down the system and increasing the use of memory.

The SchemaObj can deal with either a JEE JNDI named resource for database connections, allowing it to leverage the connection pooling of most production JEE systems, or it can use a directly specified JDBC connection configuration from a configuration file for client-server systems.

Caching data is very important for application performance. At a minimum, even a web server system needs to cache the data referenced by an in-flight transaction. Client-Server systems retain the data even longer, and only flush and reload their caches when a serious stale data condition is detected.

Whether client server or JEE transaction processing server, the approach of the manufactured code is the same: All data loaded into the the schema is retained as runtime objects, and if any of the data is being edited, the edit buffer objects are "tacked on" to the basic read-only objects. That way, when a schema cache is compacted by invoking the minimizeMemory() methods of the schema table objects, the objects which have active edit buffers are not flushed.

Stale Data and minimizeMemory()

There is no concept of lookup data that should be retained when you invoke the schema-level implementation of minimizeMemory(), because there is no implementation of shared lookup objects or tables in the system. But the programmer can overload and customize the implementation to support a process-level cache at least.

Each table implementes a seperate minimizeMemory() implementation. While this may seem inconvenient to the programmer, it was done quite intentionally so that relatively static lookup data would not have to be reloaded after a cache compaction.

When minimizeMemory() is invoked, all objects in the table's cache that are not referenced by edit buffers are released. Edit buffers are retained, even when building a JEE system. Otherwise the application has no buffer to rely on when "undoing" an edit.

Edit buffers are a critical aspect of the manufactured code. In order to manipulate a record or object, you must first do a beginEdit() of the object in question to refresh/reread it and to "pin" a copy of the current object into memory. With 2.0, this will be enhanced slightly by having beginEdit() automatically invoke a "SELECT...FOR UPDATE" type read statement instead of the regular reads being used now. This will serve to "pin" the record in the database as well as in cache memory, though it will not be particularly useful for transactional processing systems that implement atomic services. But for client-server coding, it is a long-standing critical aspect of being able to successfully build a client-server system.

WARNING: Invoking the commit() or rollback() methods of the SchemaObj will not automatically post the object edits to the database. You must manufally invoke the create(), update(), or delete() methods of the edited objects in order to persist them before committing the transaction.

Furthermore, a rollback does not release or undo the edit buffers, because while that would be desirable for a client-server system, it goes against the fundamental philosophy of a JEE transactionally designed system which has to retain edit buffers across multiple transactions.

Use of class/table hierarchies

There are two key schools of object-relational mapping design.

With one model, each table in the database comprises a complete object in order to improve read performance by being able to retrieve an object with a single database probe.

In the alternative model, each table in the database only has the primary key and the new attributes of the subclass in each table, so you need to do joins in order to read complete objects.

The latter approach was chosen after working with both models several years ago. The ability to read the entire set of objects which derive from a given table or class was determined to have more benefit to the final system's programmability than the marginal performance improvements of the single-object table approach.

More importantly, techniques were discovered and implemented for minimizing the database reads while still expanding sub-objects to their appropriate classes by using unions and joins of multiple reads to fetch the objects for each of the derivations found for a given table/class query.

Miscellaneous Details

Whenever and wherever possible final constants are used to help the Java compiler minimize memory usage. Although most Java compilers will rationalize constant strings so there is only one instance of the string data, you cannot count on it doing so as a programmer. True, this is a trivial amount of memory and adds little to the runtime footprint, but it was a consideration of the design of the manufactured code.

There is overhead to every object instantiated in a system, from memory allocation and initialization to object references in the code itself. Rather than take an older table-record oriented approach, 1.8/1.9 shifted back to an object-buffer oriented approach that Mark Sobkow successfully experimented with about 8-10 years ago. Use of this code style proved to have substantial performance benefits for the system, even if the resulting code is a little less intuitive for those with a table-record programming background instead of experience with object-relational mapping systems.

Fast-Fail Semantics

A "fast fail" architecture is one that protects itself from bad data by validating each field as it's applied, and verifying cross-reference object links as soon as possible. Fast fail architectures throw exceptions like crazy whenever the user or batch job provides "bad data", but they do so without hitting the database with an insert or update that the application code should know will fail.

Anything you can do to avoid unnecessary database probes will improve the performance of the system.

More importantly, for a client-server architecture, fast-fail processing means that you don't have a transaction automatically rolled back by the database in the event of a minor data typo; instead the user can be given the chance to correct the error before the insert or update is posted.

Please note that fast-fail semantics do not avoid the need for the occasional custom data validation code in the Business Logic layers -- it's far from unusual for data validation to require considering several fields of data, or for a field validation to require some calculations or correlation to other information to be evaluated usefully. Fast-fail semantics just take care of simple field and relationship validation.

Protection from SQL Injection Attacks

SQL injection attacks are common in both web and client-server applications. They occur when a programmer forgets to "wrap" a string in the SQL syntax appropriate to the database being used, allowing a malicious client to "inject" a fragment of SQL that will be executed by the database without any control by the application itself.

MSS Code Factory manufactured code protects from SQL injections by validating and encoding all data in the syntax appropriate to the database being supported. It is impossible to execute an SQL injection attack against the manufactured code.

With manufactured code, there is no chance one of the junior programmers on the team will forget to apply the lessons they've been taught about SQL injections in order to save time and get a deliverable out the door.

Heavy use of the "Factory" pattern

The "Factory" pattern allows the application programmer to "plug in" an alternative implementation of an object, provided that it implements the interface hierarchy specified by the system. The business logic (BL) layers in particular rely on factories to enable the programmer to inject custom application code methods into the system by extending or modifying the BL objects with the necessary custom code, and wiring a replacement factory as appropriate.

Factories are also used for exceptions thrown by the manufactured code, an implementation of hard-coded NLS support that relies on language-specific exception factories instead of reading, parsing, and formatting exception messages using resource strings. Nothing will slow a high-volume application down quite like a disk-based resource string probe when the system is already so heavily bogged down that the JVM is flushing resources it can re-read at a later time.

With a fast-fail exception architecture, it is therefore critical to avoid loading resource strings when throwing exceptions. The performance hit of loading resource strings is just far too great -- any disk I/O is.

Multi-object Deletion Quirks

Although it's a bad idea for complex objects that own sub-objects, many business systems are comprised of relatively straight-forward table buffers, not object-relational hierarchies. In order to help support the development of such systems, multi-record deleteBy[Key]() methods were added to the system. However, when you invoke these methods, the SchemaObj cache is not cleaned up, as the manipulation is done entirely on the database side by stored procedures.

Locking/pinning data for edits

The lock[Table]() methods either perform a "SELECT...FOR UPDATE" (DB/2, Oracle, PostgreSQL, and MySQL) or update the value of an artificial column (SQL Server) to "pin" the record for update by client-server systems. When you invoke a beginEdit() on an object, this pinning is done automatically to ensure that you are editing a fresh copy of the database information.

In order to support cross-transaction edits, such as with JEE applications, the system also implements record version stamping to detect edit collisions. This is a tried and true technique that has been used on 90% of the systems I worked on during my 30 year career.

Security Features

The system relies on a set of security tables at the cluster layer and another set at the tenant layer to define access permissions using 8-level group and group membership data. As with most of the features of the system, this is something that has been done on virtually every application I ever wrote. One key difference is that the checks for the permissions are pushed to the back end stored procedures, rather than being coded in the client.

Specify the SecurityScope of your objects to control the security code generated by the system. "None" will produce no security code (and thereby maximize performance.) "System" allows anyone to read the data, but only the "system" user to update or delete it. "Cluster" uses the cluster SecGroup, SecMemb, and SecInclude objects to define the access priveleges for the data (Read[Table], Update[Table], and Delete[Table] are the expected group names.) "Tenant" uses the TSecGroup, TSecMemb, and TSecInclude objects in a similar fashion to cluster-enforced security. Be aware that if you use Cluster or Tenant security, the object in question *must* have a relationship to the Cluster or Tenant objects so that it can resolve the security data at runtime.

Audit Stamping

Enabling audit stamping produces artificial columns in the base table of an object hierarchy which track the user ids of who created and updated the records, and the timestamps of when those changes were made. There is virtually no runtime cost to enabling this feature on a table or object hierarchy.

History Tables/Logging

Just enable "HasHistory" to produce [Table]_h history tables for your model. There is overhead to producing the history records, but it's not as heavy an impact as the security checks unless you allow your history tables to grow excessively large instead of pruning them on some schedule. Note that unlike the main object hierarchy, the history tables are structured such that the entire object for a class is stored as a single table record, rather than requiring a union of the inherited tables.