Async Request

From PyWPS
Jump to: navigation, search

Introduction

Async requests are complicated to deal with, and are not natively supported by SOAP/WSDL. Fortunately Taverna supports for loops inside services, allowing for a doRequest-checkUntil-getResult polling loop.

PyWPS adopted the strategy of splitting services into sync and async. If a service is sync it will have the following name Execute syntax (foo is a process name):

request:
 <ExecuteProcess_<foo>>
 <input1>...<input1>
 </ExecuteProcess_<foo>>
response:
 <ExecuteProcess_<foo>Response>
 <output1Result>...<output1Resul>
 <ExecuteProcess_<foo>Response>

In case the service is async then:

request:
 <ExecuteProcessAsync_<foo>>
 <input1>...<input1>
 </ExecuteProcessAsync_<foo>>
response:
 <ExecuteProcessAsync_<foo>Response>
 <output1Result>...<output1Result>
 <ExecuteProcessAsync_<foo>Response>

The separation between async/sync occurs only in the WSDL file. Making a standard WPS DescribeProcess request, there will be only one process with storeSupported="true" statusSupported="true". During WSDL's generation the XSLT tranformer checks for storeSupported and statusSupported in the process description, and:

 if process (storeSupported=true and statusSupported=true):
      - One service called ExecuteProcess_<foo>
              # Normal input and output
      - Another (one) service called ExecuteProcessAsync_<foo>
              # Normal input
              # statusURL as only output

 if process (storeSupported=false and statusSupported=false):
       - One service called ExecuteProcess_<foo>
              # Normal input and output

Therefore the user just has to manipulate the standard WPS parameters during process creation (see: First Process) to support or not async calls to the process.

Async sync services.png

In the example above, we have 3 normal sync processes: ExecuteProcess_dummyprocess, ExecuteProcess_histogramprocess and ExecuteProcess_ultimatequestionprocess

Two of these, ExecuteProcessAsync_dummyprocess and ExecuteProcessAsync_histogramprocess, are also offered as async processes.

An async service will follow the output structure described in Loading process and XML Splitter, so it needs an XML splitter to output a pure URL:

Async xmlsplitter.png

Looping

PyWPS uses the ultimatequestion process to test async calls. This call doesn't take any input, and returns a statusURL that after some updates will contain the process output, the Answer to the Ultimate Question of Life, the Universe and Everything (see: wikipedia).

So, statusURL has to be polled by the WPS client (in our case Taverna) until it contains the answer. In Taverna this is done by adding a netbeans service (available in Taverna as Service templates > Beanshell) and configuring it to fetch the XML content at the statusURL and parse it to check if the process is running or has finished.

Explanation of the code below: there are several mutually-exclusive process status tags defined for WPS XML status: ProcessAccepted, ProcessStarted, ProcessPaused, ProcessFailed, and ProcessSucceeded.

However, in the case of an async process completing successfully, the result is encapsulated as a SOAP response, not as a WPS response, meaning that we will never see ProcessSucceeded. This means that rather than looking for ProcessSucceeded or ProcessFailed to determine whether the process completed, we must look for the absence of ProcessAccepted, ProcessStarted and ProcessPaused.

import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import org.jdom.Document;
import org.jdom.xpath.XPath;

Document doc=new SAXBuilder().build(statusURL);

//ProcessAccepted,ProcessStarted,ProcessPaused --> Assumed that process is running
//ProcessFailed, ProcessSucceeded --> Assumed conclusion of process

XPath xPath = XPath.newInstance("//*[local-name()='ProcessAccepted' or local-name()='ProcessStarted' or local-name()='ProcessPaused']");
List list = xPath.selectNodes(doc);


if (list.size()>0){
 status="RUNNING";

} else {
 status="COMPLETE";
 outputDoc=new XMLOutputter();
 XMLContent=outputDoc.outputString(doc);
}

The first part of the code:

Document doc=new SAXBuilder().build(statusURL);

Will fetch some XML content from the statusURL, this URL could be something like http://<foo>/wpsoutputs/pywps-128879468789.xml and then it will build an XML Document from it.

The next step will be to check the presence of ProcessAccepted or ProcessStarted, that indicates that the process is running:

XPath xPath = XPath.newInstance("//*[local-name()='ProcessAccepted' or local-name()='ProcessStarted' or local-name()='ProcessPaused']");
List list = xPath.selectNodes(doc);

If we have elements with ProcessAccepted/ProcessStarted the number of nodes in the output list will be bigger than zero, flagging that the process is running

if (list.size()>0){

 status="RUNNING";
} else {
 status="COMPLETE";
 outputDoc=new XMLOutputter();
 XMLContent=outputDoc.outputString(doc);
}

Inside the if-check there is a status and XMLContent variables, these variables are created when the service is added to the workflow, by a pop-up wizard, and they work as service's output. Above there was the statusURL that is a service input variable.

Service's inputs:


Beanshell output.png


Service's outputs:


Beanshell input.png



The variables are created by clicking the Add Port button and givin them a name.

The script tab will contain the code explained above:


Beanshell script.png


The script will run in a loop setting the service's output (status and XMLContent) according to the situation, so we just lack a way to instruct Taverna on what to do when output port status contains RUNNING or COMPLETE.

By right click on the service > Configure running > Looping

Looping.png

A wizard will open where the loop structure is configured in accordance with the programmed script.

setting the service port that contains the status information:


Looping wizard.png

and then indicating the test condition applied to the output port. Taverna will loop, polling the service, until this check is true.


Looping wizard2.png

When the statusURL contain the process response the status string will be COMPLETE and the loop broken, and the XMLContent port will contain the response's XML.

Parsing XML content (old strategy)

The port will contain a pure WPS response therefore we don't have any WSDL functionalities like XMLSpliter and all the XML has to be parsed to obtain the answer. As explained in the XML Input/Ouput section Taverna has a Xpath plugin that is easy to use. In the case above the XMLContat will be parsed and Xpath will return the output result.


Xpath 42.png


The object is to reach the content inside <wps:LiteralData>, which can be fetched using the following Xpath:

//*[local-name()='Identifier' and text()='answer']/following-sibling::wps:Data/wps:LiteralData/text()

The Xpath translates to something like "Get me the element called 'identifier' that has text 'answer' and then get me its sibling called wps:Data, from there go to child wps:LiteralData and fetch its text content".

The problem with async call is the lack of information on the sort of output, in the normal sync call at least there is some output name information, but in this case the user has know to process or run a standard WPS DescribeProcess

Parsing XML content (new strategy)

As of SVN release 1150 the XML output will be SOAP compressed, facilitating the Xpath search. Basically the XML structure is identical to the sync version, therefore the "new outputs" from ExecuteProcessAsync_ultimatequestion will be identifical to ExecuteProcess_ultimatequestion.

The XMLContent will be as follows

<ExecuteProcess_ultimatequestionprocessResponse>
    <answerResult>42</answerResult>
</ExecuteProcess_ultimatequestionprocessResponse>

The Xpath expression to fetch the result (42) will be as simple as:

//answerResult/text() 

Final workflow

Final workflow can look something like this:


Ultimatequestion workflow.png

--Wikiadmin 17:16, 10 January 2011 (UTC)