Web Services

Available test web services are

  • DataMiningService: a data mining web service based on Weka
  • ODMService: a data mining web service built on Oracle [NOTE: currently not working due to unavailability of Oracle server]

  • A sample dataset is available if you wish to make some tests.


    DataMiningService

    The data mining engine used by DataMiningService service is
    Weka 3.4.6.
    The service is made available through Apache Tomcat 4.1.3/Apache Tomcat 6.0.18 and Axis 1.3/Axis2 1.4.1.
    You can have a look at web service interface by browsing its WSDL definition on the main server, or its WSDL definition on the backup server [NOTE_081120: backup server is off-line at the moment].

    OPERATIONINPUT OUTPUTDESCRIPTION
    attributeRank arffTrainingDataset (string): training dataset in arff format or URL pointing to it, last attribute is class label attributeRankResponse (string): list of attributes with chi-square ranking values Performs attribute ranking on arffTrainingDataset. Outputs the list of all attributes together with their chi-square ranking value.
    attributeSelect arffTrainingDataset (string): training dataset in arff format or URL pointing to it, last attribute is class label
    attributeNumber (string): max number of attributes to be selected
    attributeSelectResponse (string): comma separated list of selected attributes Performs attribute selection on arffTrainingDataset. Outputs the list of most important attributes based on chi-square ranking. Max number of output attributes is specified by attributeNumber.
    datasetFilter arffDataset (string): dataset in arff format or URL pointing to it, last attribute is class label
    attributeList (string): comma separated list of attributes to be kept
    datasetFilterResponse (string): filtered dataset in arff format, all attributes not included in attributeList are removed Filters arffDataset removing all attributes not included in attributeList. Outputs filtered dataset in arff format.
    datasetDiscretize arffDataset (string): dataset in arff format or URL pointing to it, last attribute is class label datasetDiscretizeResponse (string): discretized dataset in arff format Discretizes numeric attributes in arffDataset according to entropy discretization method proposed by Fayyad and Irani. Outputs discretized dataset in arff format.
    classifierBuild classifierName (string): weka classifier name in dot notation (e.g. "weka.classifiers.lazy.IB1")
    arffTrainingDataset (string): training dataset in arff format or URL pointing to it, last attribute is class label
    classifierBuildResponse (string): hex encoded classifier Builds classifier specified by classifierName, using arffTrainingDataset as training dataset. Outputs serialized classifier object (hex encoded).
    clustererBuild clustererName (string): weka clusterer name in dot notation (e.g. "weka.clusterers.SimpleKMeans")
    arffTrainingDataset (string): training dataset in arff format or URL pointing to it, last attribute is class label
    clustererBuildResponse (string): hex encoded clusterer Builds clusterer specified by clustererName, using arffTrainingDataset as training dataset. Outputs serialized clusterer object (hex encoded).
    modelTest arffTestDataset (string): test dataset in arff format or URL pointing to it, last attribute is class label
    encodedModel (string): hex encoded weka classifier or clusterer
    modelTestResponse (string): confusion matrix and other measures of model accuracy Tests classifier or clusterer specified by encodedModel, using arffTestDataset as test dataset. Outputs confusion matrix and other measures of model accuracy.
    modelApply arffApplyDataset (string): apply dataset in arff format or URL pointing to it, last attribute is class label (class values are not important but must be present)
    encodedModel (string): hex encoded weka classifier
    labelList (string): comma separated list of class values (labels) present in the training dataset in the same order
    modelApplyResponse (string): instance number, predicted class value and associated probability for each instance in arffApplyDataset Applies classifier specified by encodedModel to arffApplyDataset instances. Outputs instance number, predicted class value and associated probability.

    The following figure represents the snapshot of a workflow written with Triana and using DataMiningService. Web services are shown in red, Triana components in light blue.



    The same workflow generated with Taverna is shown in the figure below. The XML representation of the workflow can be opened in Taverna Workbench and executed with this sample XML input document. The results of the execution are saved in a text file named workflow_output.txt on the machine running taverna.





    ODMService

    The data mining engine used by ODMService is
    Oracle 10g.
    The service is made available through Apache Tomcat 4.1.3 and Axis 1.3.
    Currently the web service is not working due to unavailability of Oracle server, sorry.

    OPERATIONINPUT OUTPUTDESCRIPTION
    datasetLoad arffDataset (string): dataset in arff format or URL pointing to it, last attribute is class label
    datasetTable (string): name of Oracle ODM table containing the dataset
    aiDatasetLoadResponse (string): datasetTable (if load is successful) or error message Loads arffDataset into Oracle ODM table datasetTable.
    aiBuildAsync datasetTable (string): name of the table in the Oracle ODM database containing the dataset aiBuildAsyncResponse (string): name of Oracle task submitted for execution Sets up the execution task to build an attribute importance model based on the dataset in table datasetTable. The task is submitted to the Oracle DBMS for asynchronous execution. Outputs the name of the Oracle task submitted for execution. NOTE: task execution should be monitored (taskState operation) until it has completed (successfully) before the model can be used.
    aiAttributeRank syncInput (string): any string on this input indicates that attribute importance model build task (operation aiBuildAsync) has completed execution aiAttributeRankResponse (string): list of attributes with mdl ranking values Uses the (most recently built) attribute importance model (operation aiBuildAsync) to perform attribute ranking. Outputs the list of all attributes together with their mdl ranking value.
    aiAttributeSelect attributeNumber (string): max number of attributes to be selected
    syncInput (string): any string on this input indicates that attribute importance model build task (operation aiBuildAsync) has completed execution
    aiAttributeSelectResponse (string): comma separated list of selected attributes Uses the (most recently built) attribute importance model (operation aiBuildAsync) to perform attribute selection. Outputs the list of most important attributes based on mdl ranking. Max number of output attributes is specified by attributeNumber.
    datasetFilter datasetTable (string): name of Oracle ODM table containing the dataset
    attributeList (string): comma separated list of attributes to be kept
    datasetFilterResponse (string): name of Oracle ODM table containing the filtered dataset, all attributes not included in attributeList are removed Filters dataset in Oracle ODM table datasetTable removing all attributes not included in attributeList. Outputs the name of Oracle ODM table containing filtered dataset.
    classifierBuildAsync classifierName (string): name of Oracle classifier in dot notation (e.g. "classifier.abn")
    datasetTable (string): name of Oracle ODM table containing the training dataset
    classifierBuildAsyncResponse (string): name of Oracle task submitted for execution Sets up the execution task to build classifier specified by classifierName, using training dataset in table datasetTable. The task is submitted to the Oracle DBMS for asynchronous execution. Outputs the name of the Oracle task submitted for execution. NOTE: task execution should be monitored (taskState operation) until it has completed (successfully) before the classifier can be used.
    modelTestAsync datasetTable (string): name of Oracle ODM table containing the test dataset
    classifierName (string): name of Oracle classifier in dot notation (e.g. "classifier.abn")
    syncInput (string): any string on this input indicates that classifier build task (operation classifierBuildAsync) has completed execution
    modelTestAsyncResponse (string): name of Oracle task submitted for execution Sets up the execution task to test (most recently built) classifier specified by classifierName, using test dataset in table datasetTable. The task is submitted to the Oracle DBMS for asynchronous execution. Outputs the name of the Oracle task submitted for execution. NOTE: task execution should be monitored (taskState operation) until it has completed (successfully) before test results can be used.
    testConfusionMatrix syncInput (string): any string on this input indicates that test classifier task (operation modelTestAsync) has completed execution testConfusionMatrixResponse (string): confusion matrix and other measures of model accuracy Fetches the (most recent) test results (operation modelTestAsync). Outputs confusion matrix and other measures of model accuracy.
    modelApplyAsync datasetTable (string): name of Oracle ODM table containing the apply dataset
    classifierName (string): name of Oracle classifier in dot notation (e.g. "classifier.abn")
    syncInput (string): any string on this input indicates that classifier build task (operation classifierBuildAsync) has completed execution
    modelApplyAsyncResponse (string): name of Oracle task submitted for execution Sets up the execution task to apply (most recently built) classifier specified by classifierName, to the apply dataset in table datasetTable. The task is submitted to the Oracle DBMS for asynchronous execution. Outputs the name of the Oracle task submitted for execution. NOTE: task execution should be monitored (taskState operation) until it has completed (successfully) before apply results can be used.
    applyPrediction syncInput (string): any string on this input indicates that apply classifier task (operation modelApplyAsync) has completed execution applyPredictionResponse (string): instance number, predicted class value and associated probability for each instance in apply dataset Fetches the (most recent) apply results (operation modelApplyAsync). Outputs instance number, predicted class value and associated probability.

    The following figure represents the snapshot of a workflow written with Triana and using ODMService. Web services are shown in red, Triana components in light blue.




    Sample dataset

    Sample dataset: Oracle-MDL filtered
    Acute Lymphoblastic Leukemia datasets, class labels are BCR-ABL, E2A-PBX1, Hyperdip>50, MLL, T-ALL, TEL-AML1, OTHERS.
    Sample arff training dataset (400 attributes + class label, 215 instances): train400.arff.
    Sample arff test dataset (400 attributes + class label, 112 instances): test400.arff.

    A package of components for Triana 3.2, that can be used to build data mining workflows, will be available for download in the future.


    More in the next future, stay tuned.