How to execute DBPSB (Tutorial)

This tutorial describes how to execute the 2012 DBpedia SPARQL Benchmark.

Choose the triplestores

Choose the triplestores you want to test. In the original 4 triplestores were tested, but in this tutorial we simply choose the following triplestores:

All 3 triplestores are easily to install (please visit their sites to install each of them)

Download DBPSB Data

You can download all necessary data here: http://benchmark.dbpedia.org/

We will use the following files:

  • Queries2012.txt (Be aware that we need to edit this file a little bit)
  • benchmark_10.nt.bz2
  • benchmark_100.nt.bz2
  • benchmark_50.nt.bz2

Please download each and extract all of them.

Further on you need to replace each %%var%% to %%v%%. This is very important!

You should download warmup queries provided here (you could use other queries though, but do not use the Queries2012.txt file)

Download the file, rename it to warmup.txt

Setup the triplestores

To setup RAM specific (and such) please visit the developer sites.

What you need is to upload the dataset. Each of those 3 TS have implemented Bulk Load scripts to easily upload big datasets.

At first you should upload benchmark_10.nt to each triplestore. If each triplestore is filled with the data. you need to make a backup.

Lets make a directory backup_10/ and in this directory for each triplestore a subdirectory with the name of the TS. and copy all the data (for example: virtuoso has a directory called db/) into the triplestore specific folder. Do this for each dataset.

Virtuoso backup

  1. Start Virtuoso
  2. Start isql 1112 (change port if changed in configuration)
  3. execute following commands: commit(); checkpoint(); commit(); checkpoint();
  4. close isql
  5. copy whole db folder (assuming you did not change the database folders in the configuration) into desired location.

Fuseki backup

Assuming you initialized Fuseki with the Database DS, there will be a directory called DS in the fuseki installation path. Copy the whole folder into the desired location

Blazegraph backup

Simply copy the blazegraph.jnl file to the desired location.

We should now have a directory structure like this:

backup_10 
    fuseki
        DS
    virtuoso
        db
    blazegraph
backup_50 
    fuseki
        DS
    virtuoso
        db
    blazegraph
backup_100
    fuseki
        DS
    virtuoso
        db
    blazegraph

Create the configuration

Lets start with the root xml:

<?xml version="1.0" encoding="UTF-8"?>
<iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

</iguana>

Now lets add our databases (assuming virtuoso runs on port 8890, fuseki on 3030 and blazegraph on 9999. As the DBPSB Queries are query patterns, we will also add the Dbpedia to connections. The query patterns has variables (%%var%% and so on) these must be filled with data. To avoid empty results as this can lead to faster results we will add real data into them. So we need a reference connection where we can get the data from. This will be the dbpedia online endpoint.

<?xml version="1.0" encoding="UTF-8"?>
<iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <databases>
        <database id="fuseki" type="impl">
        <endpoint uri="http://localhost:3030/ds/sparql" />
    </database>
    <database id="blazegraph" type="impl">
        <endpoint uri="http://localhost:9999/blazegraph/sparql" />
    </database>
    <database id="virtuoso" type="impl">
        <endpoint uri="http://localhost:8890/sparql" />
    </database>
    <database id="dbpedia" type="impl">
    <endpoint uri="http://dbpedia.org/sparql"/>
    </database>
    </databases>
</iguana>

Now we will create our benchmark suite. This will configure the benchmark itself.

<?xml version="1.0" encoding="UTF-8"?>
<iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <databases>
        ...
    </databases>
    <suite>

    </suite>
</iguana>

Now we need to decide which of the previously defined triple stores should be tested. Also we need to decide which of the defined triple stores should be the reference triple store. We gave each triple store an ID. we will use them now.

<?xml version="1.0" encoding="UTF-8"?>
<iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <databases>
        ...
    </databases>
    <suite>
         <test-db type="choose" reference="dbpedia" >
             <db id="virtuoso"/>
             <db id="fuseki"/>
             <db id="blaezgraph"/>
         </test-db>
    </suite>
</iguana>

Further on we need to provide the datasets (WARNING: We will use a pre and post testcase hook to put the correct dataset in the triplestore). We will only say which percantage (from 0 to 1.0) the dataset has. 100% will be 1.0 , 50% will be 0.5 and 10% will 0.1 . Set generate to false, as we do not need to generate our data.

<?xml version="1.0" encoding="UTF-8"?>
<iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <databases>
        ...
    </databases>
    <suite>
         <test-db type="choose" reference="dbpedia" >
             ...
         </test-db>
         <random-function generate="false">
         <percent value="1.0"/>
             <percent value="0.5"/>
             <percent value="0.1"/>
     </random-function>
    </suite>
</iguana>

Did you upload the dataset into the named graph: "http://dbpedia.org" ? Then you can add the following parameter.

<?xml version="1.0" encoding="UTF-8"?>
<iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <databases>
        ...
    </databases>
    <suite>
         <test-db type="choose" reference="dbpedia" >
             ...
         </test-db>
         <random-function generate="false">
         ...
     </random-function>
         <graph-uri name="http://dbpedia.org" />
    </suite>
</iguana>

We want to provide a warmup with the previously downloaded warmup queries. The warmup should be 20 minutes.

<?xml version="1.0" encoding="UTF-8"?>
<iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <databases>
        ...
    </databases>
    <suite>
         <test-db type="choose" reference="dbpedia" >
             ...
         </test-db>
         <random-function generate="false">
         ...
     </random-function>
         <graph-uri name="http://dbpedia.org" />
         <warmup time="20" file-name="warmup.txt" />
    </suite>
</iguana>

Now we need to add our testcase. The original DBpedia SPARQL Benchmark had one SPARQL user, zero update user and took one hour. Further on we will create two scripts pre.sh and post.sh while as pre.sh will take the following arguments: the triple store ID and the dataset size.

The time will be measured in ms, so we need for one hour 3600000ms.

<?xml version="1.0" encoding="UTF-8"?>
<iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <databases>
        ...
    </databases>
    <suite>
         <test-db type="choose" reference="dbpedia" >
             ...
         </test-db>
         <random-function generate="false">
         ...
     </random-function>
         <graph-uri name="http://dbpedia.org" />
         <warmup time="20" file-name="warmup.txt" />
         <testcases testcase-pre="./pre.sh %DBID% %PERCENT%" testcase-post="./post.sh">
             <testcase class="org.aksw.iguana.testcases.StressTestcase">
                 <property name="sparql-user" value="1" />
                 <property name="update-user" value="0" />
                 <property name="queries-path" value="Queries2012.txt" />
                 <property name="is-pattern" value="true" />
                 <property name="timelimit" value="3600000" />
             </testcase>
        </testcases>
    </suite>
</iguana>

Create a pre and post testcase hook

We will do a little bit of scripting here, to stop each triplestore and start the triplestore which is on the line to test, as well as feed the triplestore with the correct dataset backup. This will not be explained, but provided. Still you may need to change the paths to the triplestores to execute the scripts. Pleas do not forget to chmod +x booth scripts.

pre.sh

./post.sh 
if [ "$1" = "virtuoso" ]
then
    if [ "$2" = "1.0" ]
    then
        cp -r ./backup_100/virtuoso/db/* ./virtuoso/db/
    elif [ "$2" = "0.5" ]
    then
        cp -r ./backup_50/virtuoso/db/* ./virtuoso/db/
    elif [ "$2" = "0.1" ]
    then
        cp -r ./backup_10/virtuoso/db/* ./virtuoso/db/
    fi
    ./virtuoso/bin/virtuoso-t +configfile ~./virtuoso/virtuoso.ini
elif [ "$1" = "fuseki" ]
then
    if [ "$2" = "1.0" ]
    then
        cp -r ./backup_100/fuseki/DS/* ./fuseki/DS/
    elif [ "$2" = "0.5" ]
    then
        cp -r ./backup_50/fuseki/DS/* ./fuseki/DS/
    elif [ "$2" = "0.1" ]
    then
        cp -r ./backup_10/fuseki/DS/* ./fuseki/DS/
    fi
    cd ./fuseki/
    ./fuseki-server --update --loc=DS /ds > fuseki.out 2>&1 &
    cd ..
elif [ "$1" = "blazegraph" ]
then
    if [ "$2" = "1.0" ]
    then
        cp ./backup_100/blazegraph/blazegraph.jnl ./blazegraph/
    elif [ "$2" = "0.5" ]
    then
        cp ./backup_50/blazegraph/blazegraph.jnl ./blazegraph/
    elif [ "$2" = "0.1" ]
    then
        cp ./backup_10/blazegraph/blazegraph.jnl ./blazegraph/
    fi
    java -server -jar -Xmx16g ./blazegraph/blazegraph.jar >  blazegraph.out 2>&1 &

post.sh

ps -ef | grep "virtuoso-t" | grep -v grep | awk '{print $2}' | xargs kill -9
ps -ef | grep "fuseki" | grep -v grep | awk '{print $2}' | xargs kill -9
ps -ef | grep "blazegraph" | grep -v grep | awk '{print $2}' | xargs kill -9

The whole config

<?xml version="1.0" encoding="UTF-8"?>
<iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <databases>
        <database id="fuseki" type="impl">
        <endpoint uri="http://localhost:3030/ds/sparql" />
    </database>
    <database id="blazegraph" type="impl">
        <endpoint uri="http://localhost:9999/blazegraph/sparql" />
    </database>
    <database id="virtuoso" type="impl">
        <endpoint uri="http://localhost:8890/sparql" />
    </database>
    <database id="dbpedia" type="impl">
    <endpoint uri="http://dbpedia.org/sparql"/>
    </database>
    </databases>
    <suite>
         <test-db type="choose" reference="dbpedia" >
             <db id="virtuoso"/>
             <db id="fuseki"/>
             <db id="blaezgraph"/>
         </test-db>
         <random-function generate="false">
         <percent value="1.0"/>
             <percent value="0.5"/>
             <percent value="0.1"/>
     </random-function>
         <graph-uri name="http://dbpedia.org" />
         <warmup time="20" file-name="warmup.txt" />
         <testcases testcase-pre="./pre.sh %DBID% %PERCENT%" testcase-post="./post.sh">
             <testcase class="org.aksw.iguana.testcases.StressTestcase">
                 <property name="sparql-user" value="1" />
                 <property name="update-user" value="0" />
                 <property name="queries-path" value="Queries2012.txt" />
                 <property name="is-pattern" value="true" />
                 <property name="timelimit" value="3600000" />
             </testcase>
        </testcases>
    </suite>
</iguana>

Save this as config.xml

Run Iguana

Simply run now:

java -cp "lib/*" org.aksw.iguana.benchmark.Main config.xml 

DOWNLOAD & INSTALLATION

If you need help getting started with Iguana visit HERE

DOWNLOAD