Andrea Bosin - Web Services

Web Services

Available test web services are

DataMiningService: a data mining web service based on Weka

ODMService: a data mining web service built on Oracle [NOTE: currently not working due to unavailability of Oracle server]

A sample dataset is available if you wish to make some tests.

DataMiningService

The data mining engine used by DataMiningService service is Weka 3.4.6.
The service is made available through Apache Tomcat 4.1.3/Apache Tomcat 6.0.18 and Axis 1.3/Axis2 1.4.1.
You can have a look at web service interface by browsing its WSDL definition on the main server, or its WSDL definition on the backup server [NOTE_081120: backup server is off-line at the moment].

OPERATION	INPUT	OUTPUT	DESCRIPTION
attributeRank	arffTrainingDataset (string): training dataset in arff format or URL pointing to it, last attribute is class label	attributeRankResponse (string): list of attributes with chi-square ranking values	Performs attribute ranking on arffTrainingDataset. Outputs the list of all attributes together with their chi-square ranking value.
attributeSelect	arffTrainingDataset (string): training dataset in arff format or URL pointing to it, last attribute is class label attributeNumber (string): max number of attributes to be selected	attributeSelectResponse (string): comma separated list of selected attributes	Performs attribute selection on arffTrainingDataset. Outputs the list of most important attributes based on chi-square ranking. Max number of output attributes is specified by attributeNumber.
datasetFilter	arffDataset (string): dataset in arff format or URL pointing to it, last attribute is class label attributeList (string): comma separated list of attributes to be kept	datasetFilterResponse (string): filtered dataset in arff format, all attributes not included in attributeList are removed	Filters arffDataset removing all attributes not included in attributeList. Outputs filtered dataset in arff format.
datasetDiscretize	arffDataset (string): dataset in arff format or URL pointing to it, last attribute is class label	datasetDiscretizeResponse (string): discretized dataset in arff format	Discretizes numeric attributes in arffDataset according to entropy discretization method proposed by Fayyad and Irani. Outputs discretized dataset in arff format.
classifierBuild	classifierName (string): weka classifier name in dot notation (e.g. "weka.classifiers.lazy.IB1") arffTrainingDataset (string): training dataset in arff format or URL pointing to it, last attribute is class label	classifierBuildResponse (string): hex encoded classifier	Builds classifier specified by classifierName, using arffTrainingDataset as training dataset. Outputs serialized classifier object (hex encoded).
clustererBuild	clustererName (string): weka clusterer name in dot notation (e.g. "weka.clusterers.SimpleKMeans") arffTrainingDataset (string): training dataset in arff format or URL pointing to it, last attribute is class label	clustererBuildResponse (string): hex encoded clusterer	Builds clusterer specified by clustererName, using arffTrainingDataset as training dataset. Outputs serialized clusterer object (hex encoded).
modelTest	arffTestDataset (string): test dataset in arff format or URL pointing to it, last attribute is class label encodedModel (string): hex encoded weka classifier or clusterer	modelTestResponse (string): confusion matrix and other measures of model accuracy	Tests classifier or clusterer specified by encodedModel, using arffTestDataset as test dataset. Outputs confusion matrix and other measures of model accuracy.
modelApply	arffApplyDataset (string): apply dataset in arff format or URL pointing to it, last attribute is class label (class values are not important but must be present) encodedModel (string): hex encoded weka classifier labelList (string): comma separated list of class values (labels) present in the training dataset in the same order	modelApplyResponse (string): instance number, predicted class value and associated probability for each instance in arffApplyDataset	Applies classifier specified by encodedModel to arffApplyDataset instances. Outputs instance number, predicted class value and associated probability.

The following figure represents the snapshot of a workflow written with Triana and using DataMiningService. Web services are shown in red, Triana components in light blue.

The same workflow generated with Taverna is shown in the figure below. The XML representation of the workflow can be opened in Taverna Workbench and executed with this sample XML input document. The results of the execution are saved in a text file named workflow_output.txt on the machine running taverna.

ODMService

The data mining engine used by ODMService is Oracle 10g.
The service is made available through Apache Tomcat 4.1.3 and Axis 1.3.
Currently the web service is not working due to unavailability of Oracle server, sorry.

OPERATION	INPUT	OUTPUT	DESCRIPTION
datasetLoad	arffDataset (string): dataset in arff format or URL pointing to it, last attribute is class label datasetTable (string): name of Oracle ODM table containing the dataset	aiDatasetLoadResponse (string): datasetTable (if load is successful) or error message	Loads arffDataset into Oracle ODM table datasetTable.
aiBuildAsync	datasetTable (string): name of the table in the Oracle ODM database containing the dataset	aiBuildAsyncResponse (string): name of Oracle task submitted for execution	Sets up the execution task to build an attribute importance model based on the dataset in table datasetTable. The task is submitted to the Oracle DBMS for asynchronous execution. Outputs the name of the Oracle task submitted for execution. NOTE: task execution should be monitored (taskState operation) until it has completed (successfully) before the model can be used.
aiAttributeRank	syncInput (string): any string on this input indicates that attribute importance model build task (operation aiBuildAsync) has completed execution	aiAttributeRankResponse (string): list of attributes with mdl ranking values	Uses the (most recently built) attribute importance model (operation aiBuildAsync) to perform attribute ranking. Outputs the list of all attributes together with their mdl ranking value.
aiAttributeSelect	attributeNumber (string): max number of attributes to be selected syncInput (string): any string on this input indicates that attribute importance model build task (operation aiBuildAsync) has completed execution	aiAttributeSelectResponse (string): comma separated list of selected attributes	Uses the (most recently built) attribute importance model (operation aiBuildAsync) to perform attribute selection. Outputs the list of most important attributes based on mdl ranking. Max number of output attributes is specified by attributeNumber.
datasetFilter	datasetTable (string): name of Oracle ODM table containing the dataset attributeList (string): comma separated list of attributes to be kept	datasetFilterResponse (string): name of Oracle ODM table containing the filtered dataset, all attributes not included in attributeList are removed	Filters dataset in Oracle ODM table datasetTable removing all attributes not included in attributeList. Outputs the name of Oracle ODM table containing filtered dataset.
classifierBuildAsync	classifierName (string): name of Oracle classifier in dot notation (e.g. "classifier.abn") datasetTable (string): name of Oracle ODM table containing the training dataset	classifierBuildAsyncResponse (string): name of Oracle task submitted for execution	Sets up the execution task to build classifier specified by classifierName, using training dataset in table datasetTable. The task is submitted to the Oracle DBMS for asynchronous execution. Outputs the name of the Oracle task submitted for execution. NOTE: task execution should be monitored (taskState operation) until it has completed (successfully) before the classifier can be used.
modelTestAsync	datasetTable (string): name of Oracle ODM table containing the test dataset classifierName (string): name of Oracle classifier in dot notation (e.g. "classifier.abn") syncInput (string): any string on this input indicates that classifier build task (operation classifierBuildAsync) has completed execution	modelTestAsyncResponse (string): name of Oracle task submitted for execution	Sets up the execution task to test (most recently built) classifier specified by classifierName, using test dataset in table datasetTable. The task is submitted to the Oracle DBMS for asynchronous execution. Outputs the name of the Oracle task submitted for execution. NOTE: task execution should be monitored (taskState operation) until it has completed (successfully) before test results can be used.
testConfusionMatrix	syncInput (string): any string on this input indicates that test classifier task (operation modelTestAsync) has completed execution	testConfusionMatrixResponse (string): confusion matrix and other measures of model accuracy	Fetches the (most recent) test results (operation modelTestAsync). Outputs confusion matrix and other measures of model accuracy.
modelApplyAsync	datasetTable (string): name of Oracle ODM table containing the apply dataset classifierName (string): name of Oracle classifier in dot notation (e.g. "classifier.abn") syncInput (string): any string on this input indicates that classifier build task (operation classifierBuildAsync) has completed execution	modelApplyAsyncResponse (string): name of Oracle task submitted for execution	Sets up the execution task to apply (most recently built) classifier specified by classifierName, to the apply dataset in table datasetTable. The task is submitted to the Oracle DBMS for asynchronous execution. Outputs the name of the Oracle task submitted for execution. NOTE: task execution should be monitored (taskState operation) until it has completed (successfully) before apply results can be used.
applyPrediction	syncInput (string): any string on this input indicates that apply classifier task (operation modelApplyAsync) has completed execution	applyPredictionResponse (string): instance number, predicted class value and associated probability for each instance in apply dataset	Fetches the (most recent) apply results (operation modelApplyAsync). Outputs instance number, predicted class value and associated probability.

The following figure represents the snapshot of a workflow written with Triana and using ODMService. Web services are shown in red, Triana components in light blue.

Sample dataset

Sample dataset: Oracle-MDL filtered Acute Lymphoblastic Leukemia datasets, class labels are BCR-ABL, E2A-PBX1, Hyperdip>50, MLL, T-ALL, TEL-AML1, OTHERS.
Sample arff training dataset (400 attributes + class label, 215 instances): train400.arff.
Sample arff test dataset (400 attributes + class label, 112 instances): test400.arff.

A package of components for Triana 3.2, that can be used to build data mining workflows, will be available for download in the future.

More in the next future, stay tuned.