AnnaHomolka

From PyWPS
Jump to: navigation, search

This is the section, where I will present my Google Summer of Code Project and store the weekly reports.

Process Chaining Capabilitis for PyWPS

Student Name Anna Homolka, Friedrich-Schiller-Universität Jena
Mentor Name Jonas Eberle Backup mentor: Jorge de Jesus
Title Process Chaining Capabilitys for PyWPS
Respository GitHub

Abstract

There are a few ways to do process orchestration in PyWPS, but experience showed us that they are complicated or have problems like dealing with large datasets. So I propose the idea to expand the capabilities of PyWPS so it is possible to describe the orchestration workflow in a simple xml document which is parsed and executed by a PyWPS process. This orchestration process will support if/else conditions, loops and the retrieval of results from a running process.

Complete Proposal

Project Plan

Period Task revised plan
May 23 definition of XML Structur
May 30 definition of XML Structur
June 6 Program which Parses the XML Document
June 13 writing orchestration process
June 20 writing orchestration process
June 27 extension of PyWPS Core, Mid Term Evaluation Goal: working orchestration process
July 4 enhancing orchestration process
July 11 enhancing orchestration process
July 18 enhancing orchestration process
July 25 implementation of intelligent data transfer Make PyWPS allow URL output, implementing Try/Catch blocks
August 1 implementation of intelligent data transfer Asynchronous process invocation, error handling
August 8 Testing and writing documentation testing in complex cases, solving last errors
August 15 Final Evaluation writing documentation, structure code on GitHub

Roadmap

Week 1 (May 19-23)

Definition of the XML structure.

The following functionalities should be supported:

  • start process chain
  • define inputs
  • define static variables
  • try/catch
  • start process
  • if/else
  • for
  • write output
  • terminate process


Week 2 (May 26-30)

In the table are three example workflows with different Business Processing Languages. It can be seen, that the workflow in the language I suggest is a lot shorter and easier to understand than the workflows in BPEL or XProc. This languages are very flexible in requesting web services but this makes them extremely complex and difficult to use, which results in long XML documents.

Advantages of BPEL/XProc

  • XML language independent from OGC WPS, can also be used for non WPS processes
  • XProc provides tags of try/catch, if/else, loops, etc.
  • Processing engines for workflow execution exist already
  • Experiences from other technological fields

Disadvantages of BPEL/XProc

  • Complex because of flexibility
  • Realizing status updates of workflow process is impossible without further HTTP requests because workflow execution is done outside of WPS process
  • Hard to read because of complete WPS Execute statements in description
  • Waiting for asynchronous execution of WPS processes has to be implemented within the workflow description: Can be realized with for-loops, but is a complex step!

The new workflow description language will be:

  • Simple but flexible with focus on OGC WPS
  • Orchestration engine implemented as WPS process


My suggestion BPEL XProc
workflow which starts a process which downloads some data and processes them. If the first download process fails it starts a different one. workflow which downloads MODIS data and detects fire hotspots. Author: Jonas Eberle, University of Jena This is just a fragment of a workflow. It sends a execute request to a WPS and recives the output XML
 
<workflow>

<!-- inputs of the process workflow with default values, if workflow is stored in PyWPS, these inputs are exposed as process inputs --> 
 <inputs> 
  <input localIdentifier="date" defaultValue="2013-03-05" type="stringValue" title="Date of scene"/>
  <input localIdentifier="area" defaultValue="49.3, 10, 50.8, 11.8, EPSG:4326" title="Bounding box of interested area" type="BoundingBoxValue"/> 
 </inputs> 


 <try> 
  <!-- WPS process with identifier "download_data_1" at http://localhost/cgi-bin/wps1.cgi with inputs "scene_date" and "area"-->
  <startProcess processID="download_data_1" identifier="download_data_1" service="http://localhost/cgi-bin/wps1.cgi">  
   
   <!-- attribute identifier is from WPS process, sourceIdentifier from workflow input or prior process --> 
   <input identifier="scene_date" sourceIdentifier="date"/>  
   <input identifier="area" sourceIdentifier="area"/>
   
   <!-- localIdentifier as variable name to reference in "process_data" process -->  
   <output identifier="image" localIdentifier="image"/>
  </startProcess> 
  
  <catch>
  <!-- If the process fails a different process with identifier "download_data_2" at the same host and with the same input is started. -->  
   <startProcess processID="download_data_2" identifier="download_data_2" service="http://localhost/cgi-bin/wps1.cgi">
    <input identifier="scene_date" sourceIdentifier="date"/>  
    <input identifier="area" sourceIdentifier="area"/>  
    <output identifier="image" localIdentifier="image"/> 

   </startProcess> 
  </catch>
 </try> 

 <!-- check if output result of "image" is null, if true, then terminate workflow with exception message --> 
 <if test="isNull('image')"> <br/> 
  <terminate message="no image available"/>  
 </if>  
	
 <try> 
  <!-- process which processes the downloaded data with the identifier "process_data" at "http://localhost/cgi-bin/wps1.cgi" 
       the input is the image downloaded before referenced by its localIdentifier "image" --> 
  <startProcess processID="process_data" identifier="process_data" service="http://localhost/cgi-bin/wps1.cgi"> 
   <input identifier="data" sourceIdentifier="image"/>  
   <!-- the declaration of output is optional -->  
  </startProcess>  

  <catch>
   <terminate message="unable to execute process data"/> 
  </catch> 
 </try> 

 
 <outputs>
 <!-- defines the outputs from the process "process_data" with the identifier "processed_image". This is the second possibility 
      to reference outputs from processes-->
  <output sourceProcess="process_data" sourceIdentifier="processed_image" asReference="True" />
 </outputs> 

</workflow>
<!-- ModisFire BPEL Process [Generated by the Eclipse BPEL Designer] -->
<bpel:process name="ModisFire"
         targetNamespace="http://pyros.intra.dlr.de"
         suppressJoinFailure="yes"
         xmlns:tns="http://pyros.intra.dlr.de"
         xmlns:bpel="http://docs.oasis-open.org/wsbpel/2.0/process/executable"
         xmlns:wps="http://www.opengis.net/wps/1.0.0"
         xmlns:ows="http://www.opengis.net/ows/1.1"
         xmlns:om="http://www.opengis.net/om/1.0"
         xmlns:swe="http://www.opengis.net/swe/1.0"
         xmlns:gml="http://www.opengis.net/gml"
         xmlns:un="http://www.uncertml.org"
	   xmlns:modis="http://modis.nasa.org"
         exitOnStandardFault="yes">

    <!-- Import the client WSDL -->
    <bpel:import namespace="http://www.opengis.net/wps/1.0.0"  
     location="WebProcessingServices.wsdl" importType="http://schemas.xmlsoap.org/wsdl/">   
    </bpel:import>
    <bpel:import location="ModisFireArtifacts.wsdl" namespace="http://pyros.intra.dlr.de" 
     importType="http://schemas.xmlsoap.org/wsdl/" />
         
    <!-- ================================================================= -->         
    <!-- PARTNERLINKS                                                      -->
    <!-- List of services participating in this BPEL process               -->
    <!-- ================================================================= -->         
    <bpel:partnerLinks>
        <!-- The 'client' role represents the requester of this service. -->
        <bpel:partnerLink name="client" partnerLinkType="tns:ModisFire" myRole="ModisFireProvider" />
        <bpel:partnerLink name="pyrospywps" partnerLinkType="tns:WebProcessingService"
         partnerRole="WebProcessingServiceProvider"></bpel:partnerLink>
    </bpel:partnerLinks>
  
    <!-- ================================================================= -->         
    <!-- VARIABLES                                                         -->
    <!-- List of messages and XML documents used within this BPEL process  -->
    <!-- ================================================================= -->         
    <bpel:variables>
        <!-- Reference to the message passed as input during initiation -->
        <bpel:variable name="input" messageType="tns:ModisFireRequestMessage"/>
                  
        <!-- Reference to the message that will be returned to the requester -->
        <bpel:variable name="output" messageType="tns:ModisFireResponseMessage"/>
        
        <!-- Reference to the messages used for intermediate service calls -->
        <bpel:variable name="modisDownloadResponse" messageType="wps:OutputMessage"></bpel:variable>
        <bpel:variable name="modisDownloadRequest" messageType="wps:InputMessage"></bpel:variable>
        <bpel:variable name="modisL1bResponse" messageType="wps:OutputMessage"></bpel:variable>
        <bpel:variable name="modisL1bRequest" messageType="wps:InputMessage"></bpel:variable>
        <bpel:variable name="modisMod14Response" messageType="wps:OutputMessage"></bpel:variable>
        <bpel:variable name="modisMod14Request" messageType="wps:InputMessage"></bpel:variable>
    </bpel:variables>

    <!-- ================================================================= -->         
    <!-- ORCHESTRATION LOGIC                                               -->
    <!-- Set of activities coordinating the flow of messages across the    -->
    <!-- services integrated within this business process                  -->
    <!-- ================================================================= -->         
    <bpel:sequence name="main">
        
        <!-- Receive input from requester. Note: This maps to operation defined in ModisFire.wsdl -->
        <bpel:receive name="receiveInput" partnerLink="client" portType="tns:ModisFire" operation="process" variable="input" createInstance="yes"/>
        
        <!-- Generate reply to synchronous request -->
        <bpel:assign validate="no" name="assignModisDownloadRequest">
            <bpel:copy>
                <bpel:from>
                    <bpel:literal xml:space="preserve"><wps:Execute service="WPS" version="1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" 
                     xmlns:wps="http://www.opengis.net/wps/1.0.0" 
                     xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                     xsi:schemaLocation="http://www.opengis.net/wps/1.0.0  
                      http://geoserver.itc.nl:8080/wps/schemas/wps/1.0.0/wpsExecute_request.xsd">
	<ows:Identifier>modis_download</ows:Identifier>
	<wps:DataInputs>
		<wps:Input>
			<ows:Identifier>l0_url</ows:Identifier>
			<wps:Data>
				<wps:LiteralData></wps:LiteralData>
			</wps:Data>
		</wps:Input>
	</wps:DataInputs>
	<wps:ResponseForm>
		<wps:ResponseDocument>
			<wps:Output>
				<ows:Identifier>basename</ows:Identifier>
			</wps:Output>
		</wps:ResponseDocument>
	</wps:ResponseForm>
</wps:Execute></bpel:literal>
                </bpel:from>
                <bpel:to part="InPart" variable="modisDownloadRequest"></bpel:to>
            </bpel:copy>
            
            <bpel:copy>
                <bpel:from>
                    <![CDATA[$input.payload//modis:url]]>
                </bpel:from>
                <bpel:to>
                    <![CDATA[$modisDownloadRequest.InPart//wps:LiteralData]]>
                </bpel:to>
            </bpel:copy>
        </bpel:assign>
        <bpel:invoke name="invokeModisDownload" partnerLink="pyrospywps" operation="Execute" inputVariable="modisDownloadRequest"       
         outputVariable="modisDownloadResponse"></bpel:invoke>
        <bpel:assign validate="no" name="assignModisL1bRequest">
            <bpel:copy>
                <bpel:from>
                    <bpel:literal xml:space="preserve"><wps:Execute service="WPS" version="1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" 
                     xmlns:wps="http://www.opengis.net/wps/1.0.0" 
                     xmlns:xlink="http://www.w3.org/1999/xlink" 
                     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0                            
                     http://geoserver.itc.nl:8080/wps/schemas/wps/1.0.0/wpsExecute_request.xsd">
	<ows:Identifier>modis_l1b</ows:Identifier>
	<wps:DataInputs>
		<wps:Input>
			<ows:Identifier>modis_scene</ows:Identifier>
			<wps:Data>
				<wps:LiteralData></wps:LiteralData>
			</wps:Data>
		</wps:Input>
	</wps:DataInputs>
	<wps:ResponseForm>
		<wps:ResponseDocument>
			<wps:Output asReference="true">
				<ows:Identifier>data_l1b1km</ows:Identifier>
			</wps:Output>
			<wps:Output asReference="true">
				<ows:Identifier>data_l1bhkm</ows:Identifier>
			</wps:Output>
			<wps:Output asReference="true">
				<ows:Identifier>data_l1bqkm</ows:Identifier>
			</wps:Output>
			<wps:Output asReference="true">
				<ows:Identifier>data_l1bgeo</ows:Identifier>
			</wps:Output>
			<wps:Output asReference="true">
				<ows:Identifier>data_l0l1alog</ows:Identifier>
			</wps:Output>
			<wps:Output asReference="true">
				<ows:Identifier>data_l1al1blog</ows:Identifier>
			</wps:Output>
		</wps:ResponseDocument>
	</wps:ResponseForm>
</wps:Execute></bpel:literal>
                </bpel:from>
                <bpel:to part="InPart" variable="modisL1bRequest"></bpel:to>
            </bpel:copy>
            <bpel:copy>
                <bpel:from>
                    <![CDATA[$modisDownloadResponse.OutPart//wps:LiteralData]]>
                </bpel:from>
                <bpel:to>
                    <![CDATA[$modisL1bRequest.InPart//wps:LiteralData]]>
                </bpel:to>
            </bpel:copy>
        </bpel:assign>
        <bpel:invoke name="invokeModisL1bRequest" partnerLink="pyrospywps" operation="Execute" 
         inputVariable="modisL1bRequest" outputVariable="modisL1bResponse"></bpel:invoke>
        <bpel:assign validate="no" name="assignModisMod14Request">
            <bpel:copy>
                <bpel:from>
                    <bpel:literal xml:space="preserve"><wps:Execute service="WPS" version="1.0.0" 
                     xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:wps="http://www.opengis.net/wps/1.0.0" 
                     xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                     xsi:schemaLocation="http://www.opengis.net/wps/1.0.0  http://geoserver.itc.nl:8080/wps/schemas/wps/1.0.0/
                     wpsExecute_request.xsd">
	<ows:Identifier>modis_mod14</ows:Identifier>
	<wps:DataInputs>
		<wps:Input>
			<ows:Identifier>modis_scene</ows:Identifier>
			<wps:Data>
				<wps:LiteralData></wps:LiteralData>
			</wps:Data>
		</wps:Input>
	</wps:DataInputs>
	<wps:ResponseForm>
		<wps:ResponseDocument>
			<wps:Output asReference="true">
				<ows:Identifier>data_mod14hdf</ows:Identifier>
			</wps:Output>
			<wps:Output asReference="true">
				<ows:Identifier>data_mod14gml</ows:Identifier>
			</wps:Output>
			<wps:Output asReference="true">
				<ows:Identifier>data_mod14txt</ows:Identifier>
			</wps:Output>
			<wps:Output>
				<ows:Identifier>numberOfFires</ows:Identifier>
			</wps:Output>
		</wps:ResponseDocument>
	</wps:ResponseForm>
</wps:Execute></bpel:literal>
                </bpel:from>
                <bpel:to part="InPart" variable="modisMod14Request"></bpel:to>
            </bpel:copy>
            <bpel:copy>
                <bpel:from>
                    <![CDATA[$modisDownloadResponse.OutPart//wps:LiteralData]]>
                </bpel:from>
                <bpel:to>
                    <![CDATA[$modisMod14Request.InPart//wps:LiteralData]]>
                </bpel:to>
            </bpel:copy>
        </bpel:assign>
        <bpel:invoke name="invokeModisMod14" partnerLink="pyrospywps" operation="Execute" 
         inputVariable="modisMod14Request" outputVariable="modisMod14Response"></bpel:invoke>
        <bpel:assign validate="no" name="assignOutput">
		<bpel:copy>
                <bpel:from>
                    <bpel:literal xml:space="preserve"><tns:ModisFireResponse 
                     xmlns:wps="http://www.opengis.net/wps/1.0.0" 
                     xmlns:gml="http://www.opengis.net/gml" xmlns:om="http://www.opengis.net/om/1.0" 
                     xmlns:p="http://www.opengis.net/swe/1.0.1" xmlns:p1="urn:us:gov:ic:ism:v2" 
                     xmlns:smil20="http://www.w3.org/2001/SMIL20/" xmlns:smil20lang="http://www.w3.org/2001/SMIL20/Language" 
                     xmlns:sml="http://www.opengis.net/sensorML/1.0.1" xmlns:swe="http://www.opengis.net/swe/1.0" 
                     xmlns:tns="http://pyros.intra.dlr.de" xmlns:modis="http://modis.nasa.org" xmlns:un="http://www.uncertml.org" 
                     xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xml="http://www.w3.org/XML/1998/namespace" 
                     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <modis:mod14Fires><modis:fires></modis:fires></modis:mod14Fires>
</tns:ModisFireResponse>
</bpel:literal>
                </bpel:from>
                <bpel:to variable="output" part="payload"></bpel:to>
            </bpel:copy>
            <bpel:copy>
                <bpel:from>
                    <![CDATA[$modisMod14Response.OutPart//wps:LiteralData]]>
                </bpel:from>
                <bpel:to>
                    <![CDATA[$output.payload//modis:fires]]>
                </bpel:to>
            </bpel:copy>
        </bpel:assign>
        <bpel:reply name="replyOutput" partnerLink="client" portType="tns:ModisFire" operation="process" variable="output" />
    </bpel:sequence>
</bpel:process>
<?xml version="1.0" encoding="UTF-8" ?> 
  
<p:pipeline version="1.0" 
 xmlns:p="http://www.w3.org/ns/xproc" 
 xmlns:c="http://www.w3.org/ns/xproc-step" 
 xmlns:l="http://xproc.org/library"> 
  
 <p:serialization port="result" media-type="text/plain" method="text" /> 
  <p:http-request>
   <p:input port="source">
     <p:inline>
      <c:request method="POST"
       href="http://192.168.77.102/cgi-bin/pywps.cgi">
       <c:body content-type="application/xml">
	<wps:Execute service="WPS" version="1.0.0" 
         xmlns:ows="http://www.opengis.net/ows/1.1" 
         xmlns:wps="http://www.opengis.net/wps/1.0.0"        
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 http://schemas.opengis.net/wps/1.0.0/wpsExecute_request.xsd"> 
          <ows:Identifier>dummyprocess2</ows:Identifier>
	  <wps:DataInputs>
           <wps:Input>
	    <ows:Identifier>i3</ows:Identifier>
	    <wps:Data>
	     <wps:LiteralData>5</wps:LiteralData>
            </wps:Data>
           </wps:Input>
           <wps:Input>
            <ows:Identifier>i2</ows:Identifier>
             <wps:Data>
              <wps:LiteralData>9</wps:LiteralData>
	     </wps:Data>
            </wps:Input>
           </wps:DataInputs>
	   <wps:ResponseForm>
            <wps:ResponseDocument lineage="true" storeExecuteResponse="false" status="false">
             <wps:Output asReference="false">
              <ows:Identifier>output</ows:Identifier>
             </wps:Output>
            </wps:ResponseDocument>
           </wps:ResponseForm>
          </wps:Execute>
         </c:body>  
        </c:request>
       </p:inline>
      </p:input>
     </p:http-request>
    </p:pipeline>

Week 3 (June 02-06)

Implementation: The process chain is described in a XML document. By invoking the process "recieveXML" with the XML as the input, recieveXML translates the XML to a PyWPS process. This new process will be saved in the "process" directory and register in the "__init__.py" file so it is possible to call it like a normal PyWPS process.

Week 4 (June 09-13)

Wrote the "recieveXML" process so it can recieve the XML and has a PyWPS process executing the XML as its output. The XML is parsed via SAX so I wrote "handler.py" which contains the XML handler. The parser is located at "recieveXML". At the moment "handler.py" supports a XML which calls one WPS process with literalInput and has its results as outputs.

Week 5 (June 16-20)

"handler.py" now supports the invocation of multiple WPS services and the exchange of inputs and outputs between them.

The process "recieveXML" is able to save the translated processchain in the "processes" directory and registers it in the "__init__.py" file. Now two problems encountered:

1) the webserver needs writing access for the "processes" directory in order to change "__init__.py" and to save the new process.
2) existing processes can be overwriten when a process chain with the same name is registered.
This can be avoided by a password the user has to send along with his process description. If "recieveXML" is called with a process chain named similar to an existing process the password is checked and if it is wrong, the user is not allowed to register his process chain under this name.
Another solution would be to give each registered process chain an automatically an individual name, but this would lead to cryptic names which are unattractive in my opinion.

I made progresses with handling complexInput and -Outputs.

Week 6 (June 23-27)

Week 7 (June 30- July 04)

Implemented the password. At the moment it is of type integer, this will be changed in the future.

I Read about XSD

Week 8 (July 07-11)

the complex input and output now works. The outputs of the process are determined automatically with the owslib, so they dont't have to named explicitely in the process chain description.

Week 9 (July 14-18)

found a bug at the complex output. The output of a PyWPS process can't be written as an URL.

tried different errors which can happen in the process chain and started implementing try/catch blocks.

Week 10 (July 21-25)

Implemented Try/Catch blocks

changed the _asReferenceOutput function so it is possible to have a URL as process output. This works now if the process is invoced with asReference=true.

Week 11 (July 28- August 01)

fixed a problem with the default input values. started implementing asynchronous process execution

Week 12 (August 04-08)

Implemented asynchronous process execution. If the process has just literal or bounding box input the user can decide if the execution is synchronous or asynchronous. If there is also complex input the execution will always be asynchronous.

Changed the current version of PyWPS so it allows for outputs from http addresses.

Week 13 (August 11-15)

wrote the xml description. One smale issue now is, that every process invocation has to be surrounded by a try catch block, because the xsd definition doesn't allow to have different elements of an undefined number unordered within one tag.

improved manuelparser so it is possible to call it from the command line


fixed minor bugs

wrote tutorial https://github.com/AnnaHomolka/PyWPS/blob/master/doc/tutorial_process_chaining.pdf