Had a first serious go today at building a database schema in Postgresql using pgadmin3. It’s a fine tool, although I suspect I might go faster writing CREATE TABLE statements by hand once I get used to Postgres’ flavour of SQL.
What I’m looking at right now is a variant on the table-of-triples approach to sparse and latently-structured metadata. In the table-of-triples approach, you represent the attributes of each item by asserting a series of subject-predicate-object triples with the item as the subject, e.g.:
subject predicate object
------------------------------------------------
dominic likes sausages
dominic hates mashed potato
In the approach I’m considering, you have one table containing predicate-object pairs, each of which has an auto-incrementing integer alias, and another table containing subject-alias(predicate-object) pairs, e.g.
property_id predicate object
----------------------------------------------------
1 likes sausages
2 hates mashed potato
subject property_id
------------------------------------
dominic 1
dominic 2
The point of this is to eliminate some redundancy (assuming that certain predicate-object pairs occur frequently, e.g. lots of people like sausages, or have the surname “Smith”). It also allows us to split a search into two parts - first find the property_id for a given property, then find the subjects that have that property - which means that the results of the first part of the search can be cached, speeding up repeated queries with the same criteria.