[ASTERIXDB-2894] Update UDF docs

- user model changes: no
- storage format changes: no
- interface changes: no

Details:

- Update API examples to include type
- Include details about typing and execution model

Change-Id: Id9780d72960f9094c29f7f5766185782069fe7cf
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/11225
Reviewed-by: Ian Maxon <imaxon@uci.edu>
Reviewed-by: Dmitry Lychagin <dmitry.lychagin@couchbase.com>
Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
diff --git a/asterixdb/asterix-doc/src/main/user-defined_function/udf.md b/asterixdb/asterix-doc/src/main/user-defined_function/udf.md
index 7ca23bb..655113b 100644
--- a/asterixdb/asterix-doc/src/main/user-defined_function/udf.md
+++ b/asterixdb/asterix-doc/src/main/user-defined_function/udf.md
@@ -19,7 +19,7 @@
 
 ## <a name="introduction">Introduction</a>
 
-Apache AsterixDB supports three languages for writing user-defined functions (UDFs): SQL++, Java and Python
+Apache AsterixDB supports three languages for writing user-defined functions (UDFs): SQL++, Java, and Python
 A user can encapsulate data processing logic into a UDF and invoke it
 later repeatedly. For SQL++ functions, a user can refer to [SQL++ Functions](sqlpp/manual.html#Functions)
 for their usages. This document will focus on UDFs in languages other than SQL++
@@ -27,8 +27,10 @@
 
 ## <a name="authentication">Endpoints and Authentication</a>
 
-The UDF endpoint is not enabled by default until authentication has been configured properly. To enable it, we
-will need to set the path to the credential file and populate it with our username and password.
+The UDF API endpoint used to deploy functions is not enabled by default until authentication has been configured properly.
+Even if the endpoint is enabled, it is only accessible on the loopback interface on each NC to restrict access.
+
+To enable it, we need to set the path to the credential file and populate it with our username and password.
 
 The credential file is a simple `/etc/passwd` style text file with usernames and corresponding `bcrypt` hashed and salted
 passwords. You can populate this on your own if you would like, but the `asterixhelper` utility can write the entries as
@@ -50,9 +52,7 @@
 ## <a name="installingUDF">Installing a Java UDF Library</a>
 
 To install a UDF package to the cluster, we need to send a Multipart Form-data HTTP request to the `/admin/udf` endpoint
-of the CC at the normal API port (`19002` by default). The request should use HTTP Basic authentication. This means your
-credentials will *not* be obfuscated or encrypted *in any way*, so submit to this endpoint over localhost or a network
-where you know your traffic is safe from eavesdropping. Any suitable tool will do, but for the example here I will use
+of the CC at the normal API port (`19004` by default). Any suitable tool will do, but for the example here I will use
 `curl` which is widely available.
 
 For example, to install a library with the following criteria:
@@ -65,7 +65,7 @@
 
 we would execute
 
-    curl -v -u admin:admin -X POST -F 'data=@./lib.zip' localhost:19002/admin/udf/udfs/testlib
+    curl -v -u admin:admin -X POST -F 'data=@./lib.zip' -F 'type=java' localhost:19004/admin/udf/udfs/testlib
 
 Any response other than `200` indicates an error in deployment.
 
@@ -119,7 +119,7 @@
 
 Then, deploy it the same as the Java UDF was, with the library name `pylib` in `udfs` dataverse
 
-    curl -v -u admin:admin -X POST -F 'data=@./lib.pyz' localhost:19002/admin/udf/udfs/pylib
+    curl -v -u admin:admin -X POST -F 'data=@./lib.pyz' -F 'type=python' localhost:19002/admin/udf/udfs/pylib
 
 With the library deployed, we can define a function within it for use. For example, to expose the Python function
 `sentiment` in the module `sentiment_mod` in the class `sent_model`, the `CREATE FUNCTION` would be as follows
@@ -131,14 +131,14 @@
       AS "sentiment_mod", "sent_model.sentiment" AT pylib;
 
 By default, AsterixDB will treat all external functions as deterministic. It means the function must return the same
-result for the same input, irrespective of when or how many times the function is called on that input. 
-This particular function behaves the same on each input, so it satisfies the deterministic property. 
+result for the same input, irrespective of when or how many times the function is called on that input.
+This particular function behaves the same on each input, so it satisfies the deterministic property.
 This enables better optimization of queries including this function.
-If a function is not deterministic then it should be declared as such by using `WITH` sub-clause:
+If a function is not deterministic then it should be declared as such by using a `WITH` sub-clause:
 
     USE udfs;
 
-    CREATE FUNCTION sentiment(a)
+    CREATE FUNCTION sentiment(text)
       AS "sentiment_mod", "sent_model.sentiment" AT pylib
       WITH { "deterministic": false }
 
@@ -155,6 +155,43 @@
     SELECT t.msg as msg, sentiment(t.msg) as sentiment
     FROM Tweets t;
 
+## <a name="pytpes">Python Type Mappings</a>
+
+Currently only a subset of AsterixDB types are supported in Python UDFs. The supported types are as follows:
+
+- Integer types (int8,16,32,64)
+- Floating point types (float, double)
+- String
+- Boolean
+- Arrays, Sets (cast to lists)
+- Objects (cast to dict)
+
+Unsupported types can be cast to these in SQL++ first in order to be passed to a Python UDF
+
+## <a name="execution">Execution Model For UDFs</a>
+
+AsterixDB queries are deployed across the cluster as Hyracks jobs. A Hyracks job has a lifecycle that can be simplified
+for the purposes of UDFs to
+ - A pre-run phase which allocates resources, `open`
+ - The time during which the job has data flowing through it, `nextFrame`
+ - Cleanup and shutdown in `close`.
+
+If a SQL++ function is defined as a member of a class in the library, the class will be instantiated
+during `open`. The class will exist in memory for the lifetime of the query. Therefore if your function needs to reference
+files or other data that would be costly to load per-call, making it a member variable that is initialized in the constructor
+of the object will greatly increase the performance of the SQL++ function.
+
+For each function invoked during a query, there will be an independent instance of the function per data partition. This
+means that the function must not assume there is any global state or that it can assume things about the layout
+of the data. The execution of the function will be parallel to the same degree as the level of data parallelism in the
+cluster.
+
+After initialization, the function bound in the SQL++ function definition is called once per tuple during the query
+execution (i.e. `nextFrame`). Unless the function specifies `null-call` in the `WITH` clause, `NULL` values will be
+skipped.
+
+At the close of the query, the function is torn down and not re-used in any way. All functions should assume that
+nothing will persist in-memory outside of the lifetime of a query, and any behavior contrary to this is undefined.
 
 ## <a id="UDFOnFeeds">Attaching a UDF on Data Feeds</a>
 
@@ -239,7 +276,7 @@
 functions declared with the library are removed. First we'll drop the function we declared earlier:
 
     USE udfs;
-    DROP FUNCTION mysum@2;
+    DROP FUNCTION mysum(a,b);
 
 Then issue the proper `DELETE` request