Getting Started

Got Iguana?

If not, here is how you get it Download

Configuration

So now we know how to donwload and start IGUANA, but what is the config.xml file and how do i configure it? The config.xml file is the file which represent an IGUANA config. An IGUANA config is built out of the two Objects:

  • Triplestores (Databases) which should be tested
  • Suites (A Suite is the benchmark configuration itself, what should be tested on which triplestores, ...)

WARNING! At the momenent IGUANA has a bug with several suites. It starts IGUANA for each suite "at the same time" instead of "in a row". Hence it is highly recommend to only use one suite per config.

Where as a Suite is also a Set of Objects:

  • Testcases (which contains several Testcase Objects)
  • The references to the triplestores (Stated above) which should be tested in this suite.
  • A definition of a warmup phase
  • A Set of datasets to benchmark on.

A Testcase is the defintion of one Benchmark on one triplestore at the time. It defines one Benchmark. this Benchmark will then be tested on each triplestore provided by the suite. One Suite can have several testcases but needs at least one embedded in the testcases object. The Results of the Testcase T in the i th Suite will be located in the folder "results_i/T/"

For example: Testcase A is the benchmark which tests query A and B against a triplestore 10 times Testcase B is the benchmark which tests query C and D against a triplestore for one hour. Testcase C is the benchmark which let 16 threads request query A - D against a triplestore as well as updates the triplstore with changesets I and D for 10 minutes.

The suite is: those testcases + the defintion of a warmup phase + the triplestores which should be tested + the datasets which should be loaded in the triplestores (Each dataset will be tested independet from the other datasets).

The IGUANA Config will then be realized through a xml file

Root Element

<?xml version="1.0" encoding="UTF-8"?>
<iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  ....
</iguana>

Databases

A Set of Database (Triplestore) defintions

 <databases>
     <database id="virtuoso" type="impl">
     <endpoint uri="http://localhost:8890/sparql" />
     <update-endpoint uri="http://localhost:8890/sparql-auth" /> 
         <user value="dba" />
         <pwd value="secret" />
 </database>
     ...
     <database id="dbpedia" type="impl">
     <endpoint uri="http://dbpedia.org/sparql" />
 </database>
     ...
</databases> 

update-endpoint, user and pwd are optional

Suites

 <suite>
     <graph-uri name="http://dbpedia.org" />
     <sparql-load value="true"/>
     <warmup time="10" file-name="warmup.txt" update-path="changesets/"  />
     <random-function class="org.aksw.iguana.DataGenerator" generate="false">
         <property name="" value=""/>
         ...
     <percent value="1.0" file-name="dbpedia2/" />
     ...
     </random-function>
 <test-db type="choose" reference="dbpedia">
         <db id="virtuoso" />
         ...
 </test-db>
 <testcases>
         ...
     </testcases>
 <suite>

sparql-load and graph-uri are both suite based variables which are optional and can be used by the testcases.

warmup defines an optional warmup phase before every testcase. you can specify the time (in minutes), the file with queries as well as a directory with changesets in it. file-name as well as update-path are optional attributes

random-function defines how many datasets will be tested. Further on it states if IGUANA should generate datasets. The attributes are generate, which configures IGUANA if it should generate the dataset and class (optional) which configures IGUANA which implementation of a Data Generator to use. With percent you configure one dataset (value is like the id and should be unique!). The file-name can be just a note or can be an actual initial file which will be passed to the Dataset Generator. The property elements are just name value keys which will be passed to the Dataset Generator (The DG is an interface and will not be part of the getting-started documentation). Those are optional.

test-db defines which triplestores should be tested and which is the reference triplestore (the ref ts is the triplestore which can be used by the testcases, for example to initiate query patterns with real data. The reference ts will not be tested). You can choose between two types "all" and "choose". all will test all defined triplestores (except the ref ts) while choose will only test those who will be defined in the test-db node. You can define a triplestore in the test-db node as following: while as the id is the id of a previously defined triplestore.

testcases will be explained in detail below

Testcase(s)
<testcases testcase-pre="./testcasePre.sh %DBID% %PERCENT% %TESTCASEID%" testcase-post="./testcasePost.sh %DBID% %PERCENT% %TESTCASEID%">
    <testcase class="org.aksw.iguana.testcases.StressTestcase">
        <property name="sparql-user" value="1" />
        <property name="update-user" value="0" />
        ...
    </testcase>
     ...
</testcases>

The testcases is just a cointainer for several testcase and has as attributes testcase-pre and testcase-post. These are pre and post testcase Hooks (will be edxplained later on)

The testcase itself needs a class attribute to determine which testcase implemention it should use. and has key-value properties which are testcase specifc.

Implemented Testcases

StressTestcase

The StressTestcase shold simulate a real case scenario. Several users are requesting the sparql endpoint, while a few update users are trying to update the endpoint. Parameters will be explaoned below

class name: org.aksw.iguana.testcases.StressTestcase

PROPERTIES

name value (DOMAIN) needs incompatible with/preferred over optional Description
sparql-user 0,1,2... yes number of threads which should simultanously query against the TS
update-class 0,1,2... yes number of threads which should simultanously update against the TS
latency-amountX (X=0,1,2...) positive number latency-strategy yes Simulate Network latency (in ms)
latency-strategyX FIXED, VARIABLE latency-amountX yes FIXED=only same amount of latency, VARIABLE= latency will be choose from a small intervall based upon the latency-amount
queries-path String sparql-user yes
update-path String update-user yes
is-pattern true, false queries-path yes Due to a small bug, just let it be true no matter what (this does no harm!)
timelimit positive number yes time in ms the testcase should work
no-of-query-mixes postive number timelimit yes instead of time you can choose to run X query mixes in the testcase
worker-strategyX (X=0,1,2...) ADDED, REMOVED, NEXT update-user not if multiple update-user A wokerstrategy configures the xth worker to only add, only remove or take the next (not yet updated) changeset
update-strategy FIXED, VARIABLE update-user, timelimit no-of-query-mixes yes Amount of time between two updates; FIXED=only same amount of time, VARIABLE= latency will be choose from a gaussian intervall based upon the timelimit
linking-strategy I, D, ID, DI update-user yes (default=DI) Should one update worker first add all cahngesets to add (resp. to remove) and then all remove (resp. add) use I (resp. D). Should it alternate between inserts and removes use ID for first one INSERT, than one DELETION (resp. DI)
number-of-triples positive number yes testcase will only update files with max the number. If files are to big they get split and each splitted file will be uploaded

You can specify several latencies. The only restriction is that they need to be in order. (latency-amount0 latency-amount1 latency-amount3, latency-amount2 is totally okay and will use all 4 latency, while latency-amount0, latency-amount4 only will use latency-amount0)

There needs to be as much worker-strategies as update-users (if update-users is greater than 1)

SUITE BASED PARAMETERS

if graph-uri is specified the testcass will upload/remove all changesets into the specified graph if sparql-load is true (default=false) instead of generating INSERT Queries the testcase will use LOAD

RESULTS

The Results will be in the following folder "results_i/org.aksw.iguana.StressTestcase/testcaseID/sparql-Users/update-Users/".

There will be several files. Each file has a describing name f.e. "No_of_Queries_SPARQL Worker4.csv" is the metric No Of Queries successfully executed from the 5th SPARQL Worker. The csv files have an header with the ordered no of queries. the first column will state which triplestore represents the row.

In the calculated subdirectory will be the same metrics, but the Workers will be summed up (...SPARQL_SUM.csv) or summed up and divided by the no of workers (it represents the average user) (...SPARQL_MEAN.csv)

The Stresstest does have the following Result Metrics: Queries per Second, No of Queries per Time limit, Succeded Queries, Failed QUeries, Total time of queries (for each query the summed up time query x took in the whole test), and Min/Max Time of queries (for each query, the minimum execution as well as the maximum execution time) The time is always in ms.

If the results are results for each query (Qps, Succeded, Failed, Total Time, Min and Max), the header will be the number of the query. For example will 12 represent the query in the 13th line in the queries file. One row always represents one triplestore (the first cell of the row is the triplestore id) and the value of the cell represents the value of the metric for the given column.

FederatedStressTestcase

Its a StressTestcase for a federated system.

class name: org.aksw.iguana.testcases.FederatedStressTestcase

Same Properties as StressTestcase, but it has one more PROPERTIES

name value (DOMAIN) needs incompatible with/preferred over optional Description
workerX id (ref to defined triplestores - - no The update workers need to be defined to which system it should update, thus you must define each worker (if you have 3 worker, you must define, worker0, worker1 and worker2) in which system it should upload

Pre- & Post Testcase Hooks

As stated above testcases can have pre and post hooks provided by a script. The scripts gets three values %DBID% %PERCENT% and %TESTCASEID%. The %DBID% is the id of the defined triplestore which is currently tested, %PERCENT% provides the id (value) of the current tested dataset as well as %TESTCASEID% states the ordered nr (starting by 0) of the testcase. for example: the 4. testcase will be started on the 0.5 dataset on the triplestore with the id virtuoso: %DBID% = virtuoso, %PERCENT% = 0.5, %TESTCASEID% = 3 This can be used to stop and start the triplestore (this is needed to stop triplestores after they finished the benchmark so it wont run while the next triplestore will be tested) or to re bulk load the data, as updates could not be tested twice without this hooks.

DOWNLOAD & INSTALLATION

If you need help getting started with Iguana visit HERE

DOWNLOAD