So as part of this 'Modern Perl' push, I was looking at KiokuDB for persistence.
I went through the KiokuDB::Tutorial and kept wondering why the DBI backend did it the way it did: Objects encoded into JSPON blobs, stored in a single field in an 'entries' table, with "indexing" in a separate table.
I could understand that that probably works well for the BerkeleyDB backend, but putting relational data into a single, essentially unindexable (doing a JOIN from another table with 'index' data doesn't count) blob field is something you really shouldn't be doing in an RDBMS. Basically it moves data validation and management out of the database where it belongs and into the application (A very Rails-ish way of doing things, and that is a bad thing), and making reporting or other analysis of the data impossible without the additional overhead of a Javascript/JSON library.
So while reading I thought of several alternatives that I would have used instead, mostly PostgreSQL-specific (since that is where I live):
Sadly, I doubt that Kioku's DBI backend will change at this stage, so I will probably have to write it myself, and will probably remain PostgreSQL-specific for a while. That is, if I find myself with the need for KiokuDB-level persistence (I usually go from database to code, not the other way round).
I went through the KiokuDB::Tutorial and kept wondering why the DBI backend did it the way it did: Objects encoded into JSPON blobs, stored in a single field in an 'entries' table, with "indexing" in a separate table.
I could understand that that probably works well for the BerkeleyDB backend, but putting relational data into a single, essentially unindexable (doing a JOIN from another table with 'index' data doesn't count) blob field is something you really shouldn't be doing in an RDBMS. Basically it moves data validation and management out of the database where it belongs and into the application (A very Rails-ish way of doing things, and that is a bad thing), and making reporting or other analysis of the data impossible without the additional overhead of a Javascript/JSON library.
So while reading I thought of several alternatives that I would have used instead, mostly PostgreSQL-specific (since that is where I live):
- Using PL/Perl and JSON::Any to create a set of functions for inspecting the JSPON data, which could then provide real indexing with PostgreSQL's functional indexes and make reporting possible without too much overhead on the client;
- Storing the data in XML in an XML field and using PostgreSQL's XML/XPath functions for indexing/reporting on the data;
- Creating a custom type for each persisted class and creating a separate table for each class containing an ID field and a field of that specific type;
Sadly, I doubt that Kioku's DBI backend will change at this stage, so I will probably have to write it myself, and will probably remain PostgreSQL-specific for a while. That is, if I find myself with the need for KiokuDB-level persistence (I usually go from database to code, not the other way round).
Hi, I'm the author of KiokuDB.
First off, I would be very interested in your XML field approach. I think that this could make a very interesting backend option for KiokuDB. It should be fairly simple to write and could share a lot of code with the DBI backend.
If you could come on IRC or email me (nothingmuch@woobling.org) to discuss this further I would be very grateful. Even if you have no interest in pursuing this yourself I would be very interested in trying for a PostgreSQL specific KiokuDB backend.
Now several points responding to specific comments in the post:
1. KiokuDB doesn't attempt to do things "right" from a DBA standpoint. It essentially treats its backends as opaque key-value stores. Whether you should or shouldn't do that with an RDBMs depends on what you're trying to do.
Since the goal is to store data that is not expressible in the relational model the DB cannot be used to perform validation. Neither graph structures nor open ended polymorphism can be encoded in relational algebra without denormalizing (for instance the way nested set hacks project the transitive closure of the tree's child/parent relationship into 1 dimensional lookup space).
2. For what it's worth, the DBI backend also supports per row values in custom columns, but they are not used for storage, only for lookups. This is similar to what you propose doing in PG with PL/SQL, but it's done before the insert, and is designed to work with any SQL backend obviously.
3. Another reason indexing is done in perl space as opposed to by the database is so that the object's API can be used to extract interesting keys, that may not be available verbatim in the stored data. Since many index keys could be extracted the relationship is inherently N:N. Most databases can't provide this without array types (though if my memory serves me right this should be possible to do in PG). Since the DBI backend also targets SQLite and MySQL (which is arguably not an RDBMS at all) the auxillary table is used instead.
4. The multiple table approach poses problems when you fetch solely based on the ID. Without using name mangling that ensures that the table to query can be extrapolated from the ID you can't know what to query. This is possible in KiokuDB (pretty easy to do, too) but imposes a backend specific implementation detail onto the ID generation.
Keep in mind that the goal of the DBI backend is to work just like the plain file or Berkeley DB backends. The system is not designed to be an ORM. For that I use DBIx::Class =)
Lastly, I reccomend using JSON.pm instead of JSON::Any nowadays. JSON.pm will automatically load JSON::XS if available, but without the risk of using an incompatible implementation (JSON::PC, JSON::Syck and JSON::DWIW all vary considerably in their subtleties).
Cheers,
Yuval
Hi,
I am just now examining docs on Moose+KiokuDB+DBI-Backend.
One may want to be able to work with the rdbms data independently
(if not via an SQL client, then as noted above, by examining
some XML representation of the instance. (However preserving
a meaningful ability to do SQL queries would be good)
If one is using an RDBMS backend but one is not able to use
RDBMS queries or benefit from the RDBM's efficiency/performance,
then why would want to opt for the RDBMS back-end anyway?
It is more than possible to persist objects into and retrieve objects from relational databases (ORMs do it all the time), even without storing values in BLOBs or TEXT or XML or some other non-relational format. It just requires the kind of change control (keeping table fields and object attributes in sync) that too many projects are fine with giving up. sadly.