SparqlPress explores the addition of an RDF store to the Wordpress weblogging system through PHP-based extensions, providing a basic Personal Semantic Web Aggregator that can integrate interesting data from nearby in the Web, exposing it to local and remote applications via the SPARQL query language and protocol.
See DanBriSlides for some motivations, SparqlingNaut for possible UI application.
The primary goal is to populate the local store with an interesting subset of the nearby Semantic Web, through discovery and crawling of RDF data from the websites (typically blogs; initially Wordpress blogs running the FOAF/SKOS plugin).
As there is not yet a full PHP Sparql parser or query engine, this will be prototyped using Redland and Redland's PHP bindings.
Related but different ideas:
generic RDF crawlers (aka 'scutters'). These harvest 1000s of documents from the SW by traversing rdfs:seeAlso references between RDF documents. SParqlPress could use crawling code and will probably need to deal with seeAlso, but isn't intended to create a massive store; only things like RSS feeds, calendars, SKOSonomies, FOAF, geo, RDF-calendar, photo gallery, etc from "nearby" sites. In particular, since SparqlPress is designed as an extension for generic Wordpress installations, we have to operate in 'while the user waits' mode (ie. no server daemons, crontabs, ...). This probably means the "crawler" will import a file (or site...) at a time, through some HTML UI.
generic RDF backends for mainstream weblog systems
Kasei's
mtredland system uses the Movable Type database abstraction layer to store all MT info in Redland. It allows RDQL (a SPARQL precursor) queries against the entire MT database. Burningbird's
Wordform version of Wordpress, which (I believe) adapts Wordpress to use an RDF database as its backend storage system.
Wordpress (and MT) add-ons to expose RDF information (eg. MortenF's
FOAF Plugin which also now exports SKOS info (user category scheme in RDF, and page/category associations). SparqlPress serves to motivate the use of such add-ons, by providing an easy to use aggregator for this data.
Background Reading
Some things to know about... (todo: add links)
Redland's PHP interface
sparqltool, which exposes Redland's Rasqal engine via HTML forms (using PHP) Chris Bizer's RAP work; doesn't do SPARQL or named graphs yet (but he gave a talk on SPARQL and named graphs at SWIG F2F in March)
Redland's conventions for Mysql table naming
Wordpress's conventions for Mysql table naming (eg. wp_myblog_foo)
Wordpress's conventions for allowing extensions to create Mysql tables
Wordpress conventions for managing lists of contacts/friends/etc (XFN add-on, blogroll, FOAF plugin etc)
whether a robots.txt impl is available in PHP (not needed if all data-loading is manual)
Jena's
ARQ SPARQL engine (java not PHP but good to know about...).(eg.
arq.jj). Also maybe
Howard Katz's article on parsing XQuery, in passing, might be of interest. Mortenf's page on
Redland/MySQL utilities and the
mysql for redland documentation. PHP's
dl() function (useful re debugging redland.so library path issues; see also php-config).
Usage Scenarios
SparqlPress should...
allow Wordpress users to load RDF data from the sites of their contacts into a local store
provide manual control over data loading and refresh, with sensible defaults and use of cache/etags to minimise load elsewhere
also collect RDF data from its own site, using the same HTTP interfaces, data formats etc.
expose this aggregated RDF through a clean and simple interface to Wordpress/PHP developers
allow installation to
Wordpress 1.5 or later via normal Wordpress plugin conventions minimise dependencies on non-PHP code (eg. prototype against Redland/Raptor; migrate to RAP?) to allow mass uptake
create a set of appropriately named mysql tables (automatically? on user advice?) suitable for some RDF store
require no central coordinating site; each blog has some information about itself and neighbours. We might use a lightweight bootstrapping mechanism initially, eg. a wiki page listing blogs using the software.
Example scenarios we hope to support:
when adding a category to one's Wordpress blog, do so within the context of categories in the blogs of one's contacts (described in W3C SKOS/RDF).
have contacts got identically named categories?
do we already know something about relationships between 'my' categories and those of contacts?
...noting that this app scenario goes beyond SparqlPress core goal, since it is data storage by a local app, not harvesting.
...noting that we'd use SKOS mapping vocabulary for this (@@url?)
...
TODOs
(in no real order...)
example feeds (eg. from morten and danbri's wordpress+plugin'd sites)
migrating plan? moving code from Redland to RAP should be relatively straightforward; migrating data might involved dump-to-RDF and restore, since SQL table structures will differ.
make a better list of blog addons and common RDF/FOAF/etc structures we expect to find/store/query
need a function that takes an HTML URI, looks around for likely RDF sources referenced from LINK REL tags (FOAF autodiscovery etc.).
need a function that takes an RDF/XML URI, parses triples, and stores appropriately in our RDF tables (tagged with some context).
need a function the examines our RDF store, gets a list of contexts/graphs
need a function that gets a list of HTML URIs associated with contacts known to this Wordpress installation
need a 'do the right thing' startup wizard which (i) gets HTML pages of friends, inspects for RDF, stores each retrieved graph in a separate redland context, and (perhaps?) run's MortenF's smusher.c code to fold together graph nodes that represent common entities.
need a simple demo page that feeds a SPARQL (or draft-SPARQL) query to Redland-PHP and iterates through results
need a .php page that emits SPARQL result-set format XML given some query
need a demo that generates RDF/XML or resultset markup for consumption by a local FoafNaut (SparqlingNaut
installation
Development Notes
DanBri made a Wordpress weblog installation to hack on; see
sparqlpress blog. howto from dave re building/installing redland php wrapper -
irc log. current headache: my PHP install was made in /usr/local/apache/php/bin/ but i build the redland stuff against the debian'd version. doh.
test installation of
sparqlTool.php Next job, make sure my new Redland install (via Debian Testing) knows how to talk to mysql. Trying Morten's
utilities.
OK, We're talking to MySQL OK... and created a mysql-backed store. It made these tables in our 'wordpress' database:
| Bnodes | | Literals | | Models | | Resources | | Statements14932826769506100049 |
These are alongside the various sets of wordpress tables, eg.:
| sp1_categories | | sp1_comments | | sp1_linkcategories | | sp1_links | | sp1_options | | sp1_post2cat | | sp1_postmeta | | sp1_posts | | sp1_users |
(where sp1_ is specified in wp-config.php).
So, do we try to relativise Redland to store things as sp1_Models, sp1_Resources, sp1_StatementsBlahBlahBlah? (is that it? or more tables needed ever?).
Probably yes we should, so installations don't conflict. OK here is a design: all Wordpress within one Mysql database share a common set of Redland mysql tables, and are kept distinct using different named storages, where the naming convention follows those used within this wordpress installation.
For example, our Wordpress install uses the sp1_ prefix for its Wordpress tables. So within Redland, the scoping is handled using the 'model' mechanism. So we have:
mysql> select * from Models; +----------------------+-----------+ | ID | Name | +----------------------+-----------+ | 14932826769506100049 | db4 | | 17733401352462907705 | sp1_test1 | +----------------------+-----------+ 2 rows in set (0.00 sec)
This was created using the following PHP:
sdrdf_load_module(); # function from spaqlTool.php $world = librdf_php_get_world(); $storage=librdf_new_storage($world, 'mysql', 'sp1_test1', "new='yes',host='localhost',database='wordpress',user='wpuser',password='pwdhere'");
So. Let's consider the dataset scoping problem solved. Now, how to take an URL and put data into the store?
Next things:
how to populate a redland storage from some rdf url? how do we get prefix name right?
# here we store some RDF in the database. # Q? how to do it as a context? $swpage='http://danbri.org/foaf.rdf'; $model = librdf_new_model($world, $storage, ''); $uri = librdf_new_uri($world, $swpage); $parser = librdf_new_parser($world, 'raptor', 'application/rdf+xml', $nulluri); librdf_parser_parse_into_model($parser, $uri, $uri, $model);
Does the 2nd $uri there set an implicit context? What query would list all contexts?
Each Model has a separate 'statements' table, eg. Statements14637794015643685849 (see Models table to map from the statement table number to the name eg. sp1_test2). Each time I run the loader, it adds yet more triples to the specified Model. Would be nice to replace tather than add...
The sparqlTool.php script lets us send queries to the store (if you enter appropriate password or configure read-only access account).
Things to figure out:
list all contexts in some storage?
how to check to see if a database exists, if not create it
exception handling in php?
how to replace all the triples in some context with a new bunch of triples from that source
Problems with Redland:
to report: if there is a graph with no context (ie context=0), list_contexts will fail. Fixed (thanks Morten
Some sample data to play with... What I'm using:
The SKOS parts look like this, where skos is
http://www.w3.org/2004/02/skos/core#
<foaf:Document rdf:about="http://www.w3.org/2004/OWL/">
<dc:subject rdf:resource="#c24"/>
</foaf:Document>
<Concept rdf:about="#c30">
<dc:identifier>30</dc:identifier>
<prefLabel>Politics</prefLabel>
<rdfs:label>Politics</rdfs:label>
<inScheme rdf:resource="#scheme"/>
<broader rdf:resource="#c1"/>
</Concept>
It worked first time:
test (if you s/xxx/the real passwd/, or if i make a read-only account).
here's the query:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?label WHERE (?node skos:prefLabel ?label) (?node rdf:type skos:Concept)
Ugly hack notes: putting a copy of redland.so where my PHP expects it, rather than where the Redland+bindings I built put it. Probably should have tried telling Redland where my original php-config lived (ie. under apache) so it could have used the right php installation. Anyway it works for now.
mkdir -p /usr/local/apache/php/lib/php/extensions/no-debug-non-zts-20020429/ cp /usr/lib/php4/20020429/redland.so /usr/local/apache/php/lib/php/extensions/no-debug-non-zts-20020429/
SparqlPress is Semantic Web Vapourware for the masses... (if it gets coded up, will be linked from this page...)