First Process

From PyWPS
Jump to: navigation, search

First Process

Here, a simple process that returns the inputs introduced (mirror process) will be created from scratch.

Processes as Python module

A PyWPS Process is a Python module or instance of the module. Processes are stored in special directory, Python Package. This must contain a __init__.py file which sets the __all__ array to contain the list of available processes.

This directory can be anywhere on the system; we will put it at the same level, as PyWPS installation:

$ mkdir /usr/local/processes

This path should be set in the PYWPS_PROCESS environment variable, inside the wrapper script, or in pywps.cfg.

To create a Python module that will contain PyWPS processes, it is necessary to create and edit a __init__.py file in the folder created above:

$ cd /usr/local/processes
$ nano __init__.py

In the example above we use the nano editor, but any other editor (like vim, gedit, emacs) should be okay.

__init__.py will contain the "module information" that will be used by Python - in our case, a list of the scripts in the script directory that contain processes, minus the .py extension. For example, the following line will include any processes contained in returner.py:

__all__=["returner"]

Normally each process has its own file with extension .py but it is possible to define more than one process per file. __all__ lists the files that should be loaded/imported by Python.

Process structure

Each process is a python Class (or its instance) with at least two methods:

__init__ which initializes the process, where you will define inputs and outputs, and other stuff
execute which will be called, once PyWPS accepts an Execute request for the process.

Basically the class constructor contains the Input/Output and process metadata, then all the algorithms are in the execute() method.

Then the returner.py file should be edited to contain the following:

from pywps.Process import WPSProcess                                
class Process(WPSProcess):


    def __init__(self):

        ##
        # Process initialization
        WPSProcess.__init__(self,
            identifier = "returner",
            title="Return process",
            abstract="""This is demonstration process of PyWPS, returns
            the same file, it gets on input, as the output.""",
            version = "1.0",
            storeSupported = True,
            statusSupported = True)

        ##
        # Adding process inputs
        
        self.dataIn = self.addComplexInput(identifier="data",
                    title="Input vector data",
                    formats = [{'mimeType':'text/xml'}])

        self.textIn = self.addLiteralInput(identifier="text",
                    title = "Some width")

        ##
        # Adding process outputs

        self.dataOut = self.addComplexOutput(identifier="output",
                title="Output vector data",
                formats =  [{'mimeType':'text/xml'}])

        self.textOut = self.addLiteralOutput(identifier = "text",
                title="Output literal data")

    ##
    # Execution part of the process
    def execute(self):

        # just copy the input values to output values
        self.dataOut.setValue( self.dataIn.getValue() )
        self.textOut.setValue( self.textIn.getValue() )

        return

and now step-by-step

Initialization

The first section of code deals with initialization of the process/class

from pywps.Process import WPSProcess
class Process(WPSProcess):

 def __init__(self):

 ##
 # Process initialization
 WPSProcess.__init__(self,
 identifier = "returner",
 title="Return process",
 abstract="""This is demonstration process of PyWPS, returns
 the same file, it gets on input, as the output.""",
 version = "1.0",
 storeSupported = "true",
 statusSupported = "true")

The process is inherited from the pywps.Process.WPSProcess class. In Python, you must initialize this parent class at first place. In the initialization, you have to define:

  • identifier Process identifier
  • title Human-readable title
  • abstract Optionally longer text for process description

Several other things can be defined:

  • version process version
  • statusSupported indicates, whether the process supports asynchronous calls
  • storeSupported indicates, whether the process can deploy it’s result to the server for later usage

Other parameters like language, version, grassLocation, metadata etc, can be used in the initialization.

Process Inputs

After we have to define what sort of inputs we have:

 self.dataIn = self.addComplexInput(identifier="data",
 title="Input vector data",
 formats = [{'mimeType':'text/xml'}])

 self.textIn = self.addLiteralInput(identifier="text",
 type=type(""),
 title = "Some width")

In this case, we create and add ComplexData input and LiteralData input. Both do have long line of configuration options, but for now, we let the default ones. The LiteralData input will be of type string as the default (integer) is misleading for the given identifier name. The ComplexData input will accept only XML-encoded files, ideally GML.

Max File size

PyWPS has 2 levels where maximum size of input files can be defined:

The PyWPS configuration file sets the maximum file size possible in the entire instance, 3Mb or 3000Mb for example, and the process can specify a lower value in a specific process input.

However - be aware that under some conditions, some versions of PyWPS will allocate RAM according to the configured maximum, before reading the input file. If this maximum is more than the currently available RAM, the process in question will die.

The ComplexDataInput Class has a maxmegabites argument that sets the max allowed input for the specific WPS input:

 self.indata = self.addComplexInput(identifier="indata",title="Complex in",
                formats=[{"mimeType":"image/tiff"}],maxmegabites=2)

Note: maxmegabites is a int/float indicating MegaBytes.


Assuming the following:

* pywps cfg with maxfilesize=3mb
* complexRaster process with indata1 input with maxmegabites=2
* complexRaster process with indata2 input no maxmegabites

If the indata1 input receives content with 2.5megas a FileSizeExceeded WPS exception will be reported since 2.5>2, but still below the 3 megabyte limit set in configuration file.

If indata2 receives a 2.5 megabyte file, there is no problem.

If indata2 receives a 4 megabyte file, a FileSizeExceeded WPS exception will be reported since 4>3 (3 megabyte limit from PyWPS cfg file)

NB: the per-input limit is 5 megabytes by default, even if the limit specified in the config file is higher; any process that requires a bigger input must specify this using the maxmegabites argument when instantiating the ComplexDataInput class.

Beginners are confused by FileSizeExceeded errors reporting 5mega limit, when cfg file was set to something higher.

Process Outputs

The same way we need to define what sort of outputs we are going to have.

 self.dataOut = self.addComplexOutput(identifier="output",
 title="Output vector data",
 formats = [{'mimeType':'text/xml'}],
 asReference = False)

 self.textOut = self.addLiteralOutput(identifier = "text",
 type=type(""),
 title="Output literal data")

For a complete description please check the API documentation concerning I/O

Process Outputs as Reference

WPS defines that a Output shall be returned as a reference when the user makes the request in KVP and/or XML. For example:

...responsedocument=vectorout=@asreference=true;
rasterout=@asreference=true

SVN ref 1271 introduces a new attribute to the Output classes (LiteralOutput,ComplexOutput and BoundingBoxOutput). The new attribute "asReference" when True will return the output as a reference. It would be like the client sending all the time a rasterout=@asreference=true. The attribute default value is False, meaning the normal WPS behaviour.

In any case the client request will overwrite the asReference=True, for example:

...responsedocument=vectorout=@asreference=true;
rasterout=@asreference=false

Will return the output in the response document.

The introduction of reference as default facilitates web service orchestration since it allows for all the chained services to use URLs even if the services don't request them.

Execute

Here it comes: this is the actual working method. Here your data are analyzed, imported to GRASS, translated using GDAL, transformed using OGR, interpolated using R. Here it happens. In our case, the “analysis” is pretty simple.

 def execute(self):

 # just copy the input values to output values
 self.dataOut.setValue( self.dataIn.getValue() )
 self.textOut.setValue( self.textIn.getValue() )

 return

We used the getValue() method of data inputs and setValue() method of data outputs and just copied input values to output values.The set() get() naming is more or less familiar to JAVA programmers.

Extra Stuff

Language translation

Multiple language translation is supported using the self.lang.string list in the process, indicating the language code and using the original process,I/O description string as dictionary key, for example:

class Process(WPSProcess):
    def __init__(self):
         WPSProcess.__init__(self,
              identifier = "ogrbuffer", # must be same, as filename
              title="Buffer process using OGR",
              version = "0.1",
              storeSupported = "true",
              statusSupported = "true",
              metadata=[{'title':'buffer' ,'href':"http://foo/bar"}],
              abstract="Process demonstrating how to work with OGR inside PyWPS")
              
         self.data = self.addComplexInput(identifier = "data",
                                            title = "Input vector file",
                                            formats = [{'mimeType': 'text/xml', 'schema': 'http://schemas.opengis.net/gml/2.1.2/feature.xsd', 'encoding': 'UTF-8'}],
                                            metadata=[{'title':'buffer' ,'href':"http://foo/bar"}])
         self.size = self.addLiteralInput(identifier="size", 
                                           title="Buffer area size",
                                           type=type(0.0),
                                           allowedValues = [[0,10000]],
                                           metadata=[{'title':'number','href':'http://integer'}])
         self.output =self.addComplexOutput(identifier="buffer", 
                                            title="Buffered data",
                                            formats = [{'mimeType': 'text/xml', 'schema': 'http://schemas.opengis.net/gml/2.1.2/feature.xsd', 'encoding': 'UTF-8'}],
                                            metadata=[{'title':'bufferOut','href':'http://buffer/out'}],
                                            useMapscript=True)

         self.lang.strings["pt-PT"]["Buffer process using OGR"]="Processo tampao usando OGR"
         self.lang.strings["pt-PT"]["Buffered data"]="Dados tampao"

         self.lang.strings["es-ES"]["Buffer process using OGR"]="Proceso tampon usando OGR"
         self.lang.strings["es-ES"]["Buffered data"]="Dados tampon"

Remember to indicate the available languages in the configuration file: language=en-CA,pt-PT,es-ES

Reporting Error

If execute() returns anything but null, it is considered an error, and exception is called:

return "Oups! Something went wrong"

PyWPS also defines OGC WPS exceptions so they can be imported and raised if necessary.

from pywps.Exceptions import *
raise NoApplicableCode("This is a WPS standard NoApplicableCode Exception")

Setting Status

WPS defines a status parameter that can be set and then used in the status response document, the status can contain information like the status or a progress calculation. in PyWPS the status can be set using self.status.set(string message, number percent) method from within execute() - the client can poll the process status as required to monitor progress.

def execute(self):

    self.status.set("Calculating", 0)
    <code and calculations>
    self.status.set("Calculating", 25)
    <more code and calculations>
    self.status.set("Calculating", 75)
    <even more code>
    self.status.set("Finished", 100)

Getting process ID

Process ID hereby described is used in all pywps version until 3.2, current SVN uses a different approach, see: Getting process ID (new)

Each process has an unique ID based on the Unix time, this is normally reported during a async request, for example:

http://localhost/wpsoutputs/pywps-128773472351.xml

This ID with number 128773472351 can be accessed inside the execute method as follows:

def execute(self):
 ......
 id=int(self.status.creationTime) # attention that id will be a float number, better to cast it to int

Please note that when requesting an output asReference=True, PyWPS will use another ID based on the instance, something like: 17277R2vWNz that will be added to output name as unique identifier

<wps:ProcessOutputs>
 <wps:Output>
 <ows:Identifier>buffer</ows:Identifier>
 <ows:Title>Output vector data</ows:Title>
 <wps:Reference xlink:href="http://localhost/wpsoutputs/buffer-17277R2vWNz.xml" mimeType="text/xml"/>
 </wps:Output>
 </wps:ProcessOutputs>

Getting process ID (new)

PyWPS ID is based on Unix Time which in turn uses the number of seconds since 1/1/1970, therefore there's the risk of ID duplication if a server gets more than one WPS request in the same second. Also the WPS outputs contain a random identificator and it is impossible to relate the outputs to a specific request without reading the status document.

Current SVN trunk replaced the old unix time with a UUID (Universally Unique IDentifier) that is process specific, meaning, two WPS requests to the server made at the "same time" will have different id.

The new PyWPS ID uses UUID version 1, that will generate an unique identifier like this: 998a0a2-7982-11e1-8eda-abbadbfc3214

For more information on uuid please check the python documentation: [[1]] And of course the wikipedia: [[2]]

For example a status document named like this:

pywps-998a0a2-7982-11e1-8eda-abbadbfc3214.xml

Will be related to outputs that contain the same UUID e.g:

vectorout-f998a0a2-7982-11e1-8eda-a4badbfc32f4.gml
imageout-998a0a2-7982-11e1-8eda-abbadbfc3214.png

The pywps ID can now be obtained inside the execute() function of the process like this:

uuid=self.pywps.UUID

Final note, the mapserver map file that is generated in case of useMapscript=True and asReference=True (in the WPS request) will use the same uuid as the process.

Temporary files

Each process that's executed gets a temporary directory created for it as its current working directory (this can be queried using getcwd()).

def execute(self):
    ....
    import os
    tmpFolderPath=os.getcwd()

tmpFolderPath will be something like: /tmp/pywps-instancevBd_4i , depending on the temporary folder defined in the pywps.cfg file.

If a process needs to create temporary files, the best strategy is to use the tempfile module to create a file inside this current working directory.

In many cases we may want to write a file, then fork a process to read the file we've just written. If the process needs exclusive access to the file, we must close it first, but tempfile's default behaviour is to delete on close. If we specify delete=false when creating the tempfile, this will request a 'permanent' temporary file (which will get cleaned up when our process ends and PyWPS removes the (temporary) current working directory).

We don't need to query the cwd, as we can specify it by passing dir="./" to tempfile; this is equivalent in effect to dir=os.getcwd()

def execute(self):
 ...
 import tempfile
 tmpFile=tempfile.NamedTemporaryFile("w", suffix=".tmp",dir="./", delete=False)

tmpFile.name will be something like: /tmp/pywps-instance80snjx/tmplWtoUE.tmp

delete=false is discussed above. The use of suffix is not necessary, it just looks better :)

Common mistakes

Process isn't listed in PyWPS GetCapabilities or WSDL response

This is probably because python can't parse the script file containing the process.

Try running the script from the command line (python returner.py) to see if python reports any errors. Python newbies: remember python is indent-sensitive!

Execute code doesn't run, NoneType error in output

An error reporting a NoneType error like this with stdout as process output:

<ows:ExceptionText>
            'Process executed. Failed to build final response for output
[stdout]: coercing to Unicode: need string or buffer, NoneType found'
</ows:ExceptionText>

And any changes to the execute code don't seem to work

I could be that def execute(self): is not properly indented with WPSProcess.__init__(self..... and def execute(self) is actually inside the class constructor.

os.remove(fout.name+".base64") .. No such file or directory

This error occurs when an input is submitted as an URL and it's content hasnt been fetched. PyWPS searches for "http://" and "http%3A%2F%2F" to determine if the user is sending a content as URL. If for some reason these strings are not present it will continue and later will crash on the base64 transformation.

Therefore things like "www.foo.org/nicePic.tiff" and "http%3A//www.foo.org/nicePic.tiff" will raise an error. In python the urllib has a quote() method, the default one should be avoided since it consideres "/" a safe char and will output "http%3A//", the correct solution is to use quote_plus()

import urllib
print urllib.quote_plus("http://bbc.co.uk")
http%3A%2F%2Fwww.bbc.co.uk

--Wikiadmin 15:45, 10 January 2011 (UTC)

v.in.ogr and r.in.gdal Can't find file

See:GRASS