Archive for March 29, 2006
I’ve gotten (over my stint offline) a few enquiries about the Data Access model used in Quantum Star SE. Not everyone has come across the concept before, so I decided to take a stab at explaining it, and why I implemented it in the fashion used by QSE (or more accurately its backend, Partholan). This is just an overview – no real details. You can explore the concept better using Google and other sources.
ORM: Object Relational Mapping. ORM is (in this context) a method for mapping Objects to Database Rows. The simplest example is a db_user database table with three fields: user_name, user_id and email_address. At this point we have data fields and data – but how do we represent/use these in a PHP application?
The simplest approach is a direct SQL call, using the PHP functions to query the database and loading the resulting data into an array. The problem is that an array is still a variable – you need to somehow inject it into functions to be operated on, then you need more functions to change, update or delete the data – not to mention create new row records. At this point the inevitable conclusion is hard coded SQL, based on data arrays which are given a GLOBAL scope for use in functions (just might be all functions allow them as parameters). This is the de-facto standard practice in the majority of PHP applications. I’m not saying it’s a bad practice as such – it’s certainly fast and inherently insecure only for using GLOBALs – but it’s not re-useable nor easily maintained. Why? Because SQL queries may be replicated across many files – a bug in one necessitates fixes to many. There are other maintenance pitfalls with the approach – but that’s not the topic of this entry!
However if you progress to using Object Oriented Programming – you learn a valuable lesson. GLOBALs are a necessary evil, but an evil nonetheless… It’s at this point developers might start thinking in new directions. If we have user data, perhaps we should have User objects? Suddenly the OOP approach clicks home…. On a side note – I’m not a OOP “Purist”. If you are procedural minded, feel free to stay with that approach – OOP is however a very powerful approach to programming, and not half as complex as people think.
It’s here that ORM comes into play. If you intend populating an object with data from a database you have several options:
1. Build SQL queries into object methods (internal handling)
2. Pass data arrays into the object at construction or with a separate import() style method (external handling)
3. Give the object responsibility for managing data (hands-off handling)
The first two, again, are common. They are most intuitive when you have limited exposure to OOP (and even then it often just makes sense). The third has a bad reputation for being needlessly complex and over designed – untrue if all you look at are Propel style libraries in the context of simple web applications. Now all are just as valid as the other – but the third neatly sidesteps using manually coded SQL. It also “boils down” the functionality into a standard, predictable set of methods. A classic example common to nearly all ORM solutions.
We have a user (#3) for whom we need to reset an email address. Using the ORM model one could (in their PHP action) code this using:
$user = new User();
Our User object has suddenly been transformed into a powerful unit for managing data. No SQL, no acrobatics, no blocks of code calling database functions (or database abstraction layers), no error management – nothing but 4 lines of code that do it all for you.
Now if that sounds magically fantastic I suggest reading up on ORM elsewhere (sorry, but I’m a truly bad technical writer and I’ll just ramble across the topic without digging into minutae). All this does come with a cost – which differs depending on the ORM approach taken. Since we rely on methods generated standardised SQL in place of custom optimised SQL, there is a performance hit. No JOINS, UNIONS, etc have that effect. So does not being able to limit the range of data from a row that is requested – this all can be solved by the way, but my home brewed ORM was built for simplicity, not speed. See ORM solutions like Propel for something far more complete (and therefore complex).
I think the above serves as a very non-detailed intro to the workings of ORM – if not a full technical debate over its merits. Now on to the implementation of the version in QSE.
To implement ORM you need a few building blocks as basics:
1. Database Table details and metainformation
2. An SQL Generator (and optionally an Optimiser – never added one)
3. A database abstraction layer (to enable rapid switching between database types)
Many fast and easy implementations of a dead simple ORM class like ActiveRecord omit all but number 2 – which is usually built in. This is a direction that personally I view as dangerous. One ActiveRecord implementation I saw did the following (most follow it closely):
1. Request Table details from MySQL
2. Generate a domain class from this metainfo specific to the Table structure
3. Instantiate the generated class to an object
That makes sense (it works for Ruby on Rails afterall ) but it imposes a cost – all that extra querying and class generation is done at runtime, for every request a server handles. Now personally I prefer to take the pre-generate option. If my database tables are going to remain static (I don’t see many PHP applications changing table structure every few minutes) I will pre-generate all those domain objects and store them for use in the application before its released/published as Transfer Objects. Why not skip all three of those costly steps?
This still does not tell anyone where the SQL comes from. Since I aimed for simplicity, I designed a small class which generates SQL depending on the structure of the table – most of the SQL follow the simplest route for CRUD operations (Create, Read, Update, Delete). This Data Access object in combination with a Transfer Object enables a very simple ORM layer without the frills and thrills of something like Propel, but without the performance hit of RoR’s ActiveRecord. Is it feature full? No. Is it simple? Yes. Objectives met.
The last piece of the puzzle is using ADOdb-Lite as a database abstraction layer in place of where PHP’s native library functions (mysql_query() and such) would be. ADOdb-Lite simplifies other aspects of SQL generation a great deal – making even the all powerful Data Access class incredibly simple.
Basic operation: The user instantiates the selected Transfer Object (e.g. User). This object includes a set of CRUD methods which delegate SQL query operations to the Data Access object. The DAO generates the required SQL, and executes via the ADOdb-Lite database abstraction library. The sequence of method calls allows the return of any resulting data, usually a boolean false on failure, or some other value (an array for row data for example) back to the parent Transfer Object which stores such data internally. The stored data is accessible using a set of getters/setters which are based in individual names (e.g. setUserName() maps to the user_name field).
There ends my overview. Maybe it sheds a little light of what I was up to in QS, then again maybe not. If anyone needs further clarification, well, you all know my email address from the forums.