blob: 65bc7bbc9693521b854b3f95f39c59cd3fd339ec [file] [log] [blame]
Ian Maxond00eca82018-10-05 17:29:55 -07001<!DOCTYPE html>
2<!--
Ian Maxonb2f1d3e2018-10-12 14:42:34 -07003 | Generated by Apache Maven Doxia Site Renderer 1.8.1 from target/generated-site/markdown/udf.md at 2018-10-12
Ian Maxond00eca82018-10-05 17:29:55 -07004 | Rendered using Apache Maven Fluido Skin 1.7
5-->
6<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7 <head>
8 <meta charset="UTF-8" />
9 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
Ian Maxonb2f1d3e2018-10-12 14:42:34 -070010 <meta name="Date-Revision-yyyymmdd" content="20181012" />
Ian Maxond00eca82018-10-05 17:29:55 -070011 <meta http-equiv="Content-Language" content="en" />
Ian Maxonb2f1d3e2018-10-12 14:42:34 -070012 <title>AsterixDB &#x2013; User-defined Functions</title>
Ian Maxond00eca82018-10-05 17:29:55 -070013 <link rel="stylesheet" href="./css/apache-maven-fluido-1.7.min.css" />
14 <link rel="stylesheet" href="./css/site.css" />
15 <link rel="stylesheet" href="./css/print.css" media="print" />
16 <script type="text/javascript" src="./js/apache-maven-fluido-1.7.min.js"></script>
17
18 </head>
19 <body class="topBarDisabled">
20 <div class="container-fluid">
21 <div id="banner">
22 <div class="pull-left"><a href="./" id="bannerLeft"><img src="images/asterixlogo.png" alt="AsterixDB"/></a></div>
23 <div class="pull-right"></div>
24 <div class="clear"><hr/></div>
25 </div>
26
27 <div id="breadcrumbs">
28 <ul class="breadcrumb">
Ian Maxonb2f1d3e2018-10-12 14:42:34 -070029 <li id="publishDate">Last Published: 2018-10-12</li>
Ian Maxond00eca82018-10-05 17:29:55 -070030 <li id="projectVersion" class="pull-right">Version: 0.9.4</li>
31 <li class="pull-right"><a href="index.html" title="Documentation Home">Documentation Home</a></li>
32 </ul>
33 </div>
34 <div class="row-fluid">
35 <div id="leftColumn" class="span2">
36 <div class="well sidebar-nav">
37 <ul class="nav nav-list">
38 <li class="nav-header">Get Started - Installation</li>
39 <li><a href="ncservice.html" title="Option 1: using NCService"><span class="none"></span>Option 1: using NCService</a></li>
40 <li><a href="ansible.html" title="Option 2: using Ansible"><span class="none"></span>Option 2: using Ansible</a></li>
41 <li><a href="aws.html" title="Option 3: using Amazon Web Services"><span class="none"></span>Option 3: using Amazon Web Services</a></li>
42 <li class="nav-header">AsterixDB Primer</li>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -070043 <li><a href="sqlpp/primer-sqlpp.html" title="Using SQL++"><span class="none"></span>Using SQL++</a></li>
Ian Maxond00eca82018-10-05 17:29:55 -070044 <li class="nav-header">Data Model</li>
45 <li><a href="datamodel.html" title="The Asterix Data Model"><span class="none"></span>The Asterix Data Model</a></li>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -070046 <li class="nav-header">Queries</li>
Ian Maxond00eca82018-10-05 17:29:55 -070047 <li><a href="sqlpp/manual.html" title="The SQL++ Query Language"><span class="none"></span>The SQL++ Query Language</a></li>
48 <li><a href="sqlpp/builtins.html" title="Builtin Functions"><span class="none"></span>Builtin Functions</a></li>
Ian Maxond00eca82018-10-05 17:29:55 -070049 <li class="nav-header">API/SDK</li>
50 <li><a href="api.html" title="HTTP API"><span class="none"></span>HTTP API</a></li>
51 <li><a href="csv.html" title="CSV Output"><span class="none"></span>CSV Output</a></li>
52 <li class="nav-header">Advanced Features</li>
Ian Maxond00eca82018-10-05 17:29:55 -070053 <li><a href="aql/externaldata.html" title="Accessing External Data"><span class="none"></span>Accessing External Data</a></li>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -070054 <li><a href="feeds.html" title="Data Ingestion with Feeds"><span class="none"></span>Data Ingestion with Feeds</a></li>
Ian Maxond00eca82018-10-05 17:29:55 -070055 <li class="active"><a href="#"><span class="none"></span>User Defined Functions</a></li>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -070056 <li><a href="sqlpp/filters.html" title="Filter-Based LSM Index Acceleration"><span class="none"></span>Filter-Based LSM Index Acceleration</a></li>
57 <li><a href="sqlpp/fulltext.html" title="Support of Full-text Queries"><span class="none"></span>Support of Full-text Queries</a></li>
58 <li><a href="sqlpp/similarity.html" title="Support of Similarity Queries"><span class="none"></span>Support of Similarity Queries</a></li>
59 <li class="nav-header">Deprecated</li>
60 <li><a href="aql/primer.html" title="AsterixDB Primer: Using AQL"><span class="none"></span>AsterixDB Primer: Using AQL</a></li>
61 <li><a href="aql/manual.html" title="Queries: The Asterix Query Language (AQL)"><span class="none"></span>Queries: The Asterix Query Language (AQL)</a></li>
62 <li><a href="aql/builtins.html" title="Queries: Builtin Functions (AQL)"><span class="none"></span>Queries: Builtin Functions (AQL)</a></li>
Ian Maxond00eca82018-10-05 17:29:55 -070063</ul>
64 <hr />
65 <div id="poweredBy">
66 <div class="clear"></div>
67 <div class="clear"></div>
68 <div class="clear"></div>
69 <div class="clear"></div>
70<a href="./" title="AsterixDB" class="builtBy"><img class="builtBy" alt="AsterixDB" src="images/asterixlogo.png" /></a>
71 </div>
72 </div>
73 </div>
74 <div id="bodyColumn" class="span10" >
75<!--
76 ! Licensed to the Apache Software Foundation (ASF) under one
77 ! or more contributor license agreements. See the NOTICE file
78 ! distributed with this work for additional information
79 ! regarding copyright ownership. The ASF licenses this file
80 ! to you under the Apache License, Version 2.0 (the
81 ! "License"); you may not use this file except in compliance
82 ! with the License. You may obtain a copy of the License at
83 !
84 ! http://www.apache.org/licenses/LICENSE-2.0
85 !
86 ! Unless required by applicable law or agreed to in writing,
87 ! software distributed under the License is distributed on an
88 ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
89 ! KIND, either express or implied. See the License for the
90 ! specific language governing permissions and limitations
91 ! under the License.
92 !-->
Ian Maxonb2f1d3e2018-10-12 14:42:34 -070093<h1>User-defined Functions</h1>
Ian Maxond00eca82018-10-05 17:29:55 -070094<div class="section">
95<h2><a name="Table_of_Contents"></a><a name="atoc" id="#toc">Table of Contents</a></h2>
96<ul>
97
Ian Maxonb2f1d3e2018-10-12 14:42:34 -070098<li><a href="#introduction">Introduction</a></li>
99<li><a href="#installingUDF">Installing an UDF Library</a></li>
100<li><a href="#UDFOnFeeds">Attaching an UDF on Data Feeds</a></li>
101<li><a href="#udfConfiguration">A quick look of the UDF configuration</a></li>
102<li><a href="#uninstall">Unstalling an UDF Library</a><!--
103! Licensed to the Apache Software Foundation (ASF) under one
104! or more contributor license agreements. See the NOTICE file
105! distributed with this work for additional information
106! regarding copyright ownership. The ASF licenses this file
107! to you under the Apache License, Version 2.0 (the
108! "License"); you may not use this file except in compliance
109! with the License. You may obtain a copy of the License at
110!
111! http://www.apache.org/licenses/LICENSE-2.0
112!
113! Unless required by applicable law or agreed to in writing,
114! software distributed under the License is distributed on an
115! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
116! KIND, either express or implied. See the License for the
117! specific language governing permissions and limitations
118! under the License.
119!--></li>
120</ul></div>
Ian Maxond00eca82018-10-05 17:29:55 -0700121<div class="section">
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700122<h2><a name="Introduction"></a><a name="introduction">Introduction</a></h2>
123<p>Apache AsterixDB supports two languages for writing user-defined functions (UDFs): SQL++ and Java. A user can encapsulate data processing logic into a UDF and invoke it later repeatedly. For SQL++ functions, a user can refer to <a href="sqlpp/manual.html#Functions">SQL++ Functions</a> for their usages. In this document, we focus on how to install/invoke/uninstall a Java function library using the Ansible script that we provide.</p></div>
124<div class="section">
125<h2><a name="Installing_an_UDF_Library"></a><a name="installingUDF">Installing an UDF Library</a></h2>
126<p>UDFs have to be installed offline. This section describes the process assuming that you have followed the preceding <a href="ansible.html">ansible installation instructions</a> to deploy an AsterixDB instance on your local machine or cluster. Here are the instructions to install an UDF library:</p>
Ian Maxond00eca82018-10-05 17:29:55 -0700127<ul>
128
129<li>
130
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700131<p>Step 1: Stop the AsterixDB instance if it is ACTIVE.</p>
Ian Maxond00eca82018-10-05 17:29:55 -0700132
133<div>
134<div>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700135<pre class="source">$ bin/stop.sh
Ian Maxond00eca82018-10-05 17:29:55 -0700136</pre></div></div>
137</li>
138<li>
139
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700140<p>Step 2: Deploy the UDF package.</p>
Ian Maxond00eca82018-10-05 17:29:55 -0700141
142<div>
143<div>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700144<pre class="source">$ bin/udf.sh -m i -d DATAVERSE_NAME -l LIBRARY_NAME -p UDF_PACKAGE_PATH
145</pre></div></div>
146</li>
147<li>
148
149<p>Step 3: Start AsterixDB</p>
150
151<div>
152<div>
153<pre class="source">$ bin/start.sh
Ian Maxond00eca82018-10-05 17:29:55 -0700154</pre></div></div>
155</li>
156</ul>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700157<p>After AsterixDB starts, you can use the following query to check whether your UDFs have been sucessfully registered with the system.</p>
Ian Maxond00eca82018-10-05 17:29:55 -0700158
159<div>
160<div>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700161<pre class="source"> SELECT * FROM Metadata.`Function`;
Ian Maxond00eca82018-10-05 17:29:55 -0700162</pre></div></div>
163
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700164<p>In the AsterixDB source release, we provide several sample UDFs that you can try out. You need to build the AsterixDB source to get the compiled UDF package. It can be found under the <tt>asterixdb-external</tt> sub-project. Assuming that these UDFs have been installed into the <tt>udfs</tt> dataverse and <tt>testlib</tt> library, here is an example that uses the sample UDF <tt>mysum</tt> to compute the sum of two input integers.</p>
Ian Maxond00eca82018-10-05 17:29:55 -0700165
166<div>
167<div>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700168<pre class="source"> use udfs;
169
170 testlib#mysum(3,4);
Ian Maxond00eca82018-10-05 17:29:55 -0700171</pre></div></div>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700172</div>
Ian Maxond00eca82018-10-05 17:29:55 -0700173<div class="section">
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700174<h2><a name="Attaching_a_UDF_on_Data_Feeds"></a><a name="UDFOnFeeds" id="UDFOnFeeds">Attaching a UDF on Data Feeds</a></h2>
175<p>In <a href="feeds.html">Data Ingestion using feeds</a>, we introduced an efficient way for users to get data into AsterixDB. In some use cases, users may want to pre-process the incoming data before storing it into the dataset. To meet this need, AsterixDB allows the user to attach a UDF onto the ingestion pipeline. Following the example in <a href="feeds.html">Data Ingestion</a>, here we show an example of how to attach a UDF that extracts the user names mentioned from the incoming Tweet text, storing the processed Tweets into a dataset.</p>
176<p>We start by creating the datatype and dataset that will be used for the feed and UDF. One thing to keep in mind is that data flows from the feed to the UDF and then to the dataset. This means that the feed&#x2019;s datatype should be the same as the input type of the UDF, and the output datatype of the UDF should be the same as the dataset&#x2019;s datatype. Thus, users should make sure that their datatypes are consistent in the UDF configuration. Users can also take advantage of open datatypes in AsterixDB by creating a minimum description of the data for simplicity. Here we use open datatypes:</p>
Ian Maxond00eca82018-10-05 17:29:55 -0700177
178<div>
179<div>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700180<pre class="source"> use udfs;
Ian Maxond00eca82018-10-05 17:29:55 -0700181
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700182 create type TweetType if not exists as open {
183 id: int64
Ian Maxond00eca82018-10-05 17:29:55 -0700184 };
185
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700186 create dataset ProcessedTweets(TweetType) primary key id;
Ian Maxond00eca82018-10-05 17:29:55 -0700187</pre></div></div>
188
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700189<p>As the <tt>TweetType</tt> is an open datatype, processed Tweets can be stored into the dataset after they are annotated with an extra attribute. Given the datatype and dataset above, we can create a Twitter Feed with the same datatype. Please refer to section <a href="feeds.html">Data Ingestion</a> if you have any trouble in creating feeds.</p>
Ian Maxond00eca82018-10-05 17:29:55 -0700190
191<div>
192<div>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700193<pre class="source"> use udfs;
Ian Maxond00eca82018-10-05 17:29:55 -0700194
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700195 create feed TwitterFeed with {
196 &quot;adapter-name&quot;: &quot;push_twitter&quot;,
197 &quot;type-name&quot;: &quot;TweetType&quot;,
198 &quot;format&quot;: &quot;twitter-status&quot;,
199 &quot;consumer.key&quot;: &quot;************&quot;,
200 &quot;consumer.secret&quot;: &quot;************&quot;,
201 &quot;access.token&quot;: &quot;**********&quot;,
202 &quot;access.token.secret&quot;: &quot;*************&quot;
203 };
Ian Maxond00eca82018-10-05 17:29:55 -0700204</pre></div></div>
205
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700206<p>After creating the feed, we attach the UDF onto the feed pipeline and start the feed with following statements:</p>
207
208<div>
209<div>
210<pre class="source"> use udfs;
211
212 connect feed TwitterFeed to dataset ProcessedTweets apply function udfs#addMentionedUsers;
213
214 start feed TwitterFeed;
215</pre></div></div>
216
217<p>You can check the annotated Tweets by querying the <tt>ProcessedTweets</tt> dataset:</p>
218
219<div>
220<div>
221<pre class="source"> SELECT * FROM ProcessedTweets LIMIT 10;
222</pre></div></div>
223</div>
Ian Maxond00eca82018-10-05 17:29:55 -0700224<div class="section">
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700225<h2><a name="A_quick_look_of_the_UDF_configuration"></a><a name="udfConfiguration">A quick look of the UDF configuration</a></h2>
226<p>AsterixDB uses an XML configuration file to describe the UDFs. A user can use it to define and reuse their compiled UDFs for different purposes. Here is a snippet of the configuration used in our <a href="#UDFOnFeeds">previous example</a>:</p>
Ian Maxond00eca82018-10-05 17:29:55 -0700227
228<div>
229<div>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700230<pre class="source"> &lt;libraryFunction&gt;
231 &lt;name&gt;addMentionedUsers&lt;/name&gt;
232 &lt;function_type&gt;SCALAR&lt;/function_type&gt;
233 &lt;argument_type&gt;TweetType&lt;/argument_type&gt;
234 &lt;return_type&gt;TweetType&lt;/return_type&gt;
235 &lt;definition&gt;org.apache.asterix.external.library.AddMentionedUsersFactory&lt;/definition&gt;
236 &lt;parameters&gt;text&lt;/parameters&gt;
237 &lt;/libraryFunction&gt;
Ian Maxond00eca82018-10-05 17:29:55 -0700238</pre></div></div>
239
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700240<p>Here are the explanations of the fields in the configuration file:</p>
Ian Maxond00eca82018-10-05 17:29:55 -0700241
242<div>
243<div>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700244<pre class="source"> name: The proper name that is used for invoke the function.
245 function_type: The type of the function.
246 argument_type: The datatype of the arguments passed in. If there is more than one parameter, separate them with comma(s), e.g., `AINT32,AINT32`.
247 return_type: The datatype of the returning value.
248 definition: A reference to the function factory.
249 parameters: The parameters passed into the function.
Ian Maxond00eca82018-10-05 17:29:55 -0700250</pre></div></div>
251
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700252<p>In our feeds example, we passed in <tt>&quot;text&quot;</tt> as a parameter to the function so it knows which field to look at to get the Tweet text. If the Twitter API were to change its field names in the future, we can accommodate that change by simply modifying the configuration file instead of recompiling the whole UDF package. This feature can be further utilized in use cases where a user has a Machine Learning algorithm with different trained model files. If you are interested, You can find more examples <a class="externalLink" href="https://github.com/apache/asterixdb/tree/master/asterixdb/asterix-external-data/src/test/java/org/apache/asterix/external/library">here</a></p></div>
Ian Maxond00eca82018-10-05 17:29:55 -0700253<div class="section">
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700254<h2><a name="Unstalling_an_UDF_Library"></a><a name="uninstall">Unstalling an UDF Library</a></h2>
255<p>If you want to uninstall the UDF library, put AsterixDB into <tt>INACTVIVE</tt> mode and run following command:</p>
Ian Maxond00eca82018-10-05 17:29:55 -0700256
257<div>
258<div>
Ian Maxonb2f1d3e2018-10-12 14:42:34 -0700259<pre class="source"> $ bin/udf.sh -m u -d DATAVERSE_NAME -l LIBRARY_NAME
Ian Maxond00eca82018-10-05 17:29:55 -0700260</pre></div></div></div>
261 </div>
262 </div>
263 </div>
264 <hr/>
265 <footer>
266 <div class="container-fluid">
267 <div class="row-fluid">
268<div class="row-fluid">Apache AsterixDB, AsterixDB, Apache, the Apache
269 feather logo, and the Apache AsterixDB project logo are either
270 registered trademarks or trademarks of The Apache Software
271 Foundation in the United States and other countries.
272 All other marks mentioned may be trademarks or registered
273 trademarks of their respective owners.
274 </div>
275 </div>
276 </div>
277 </footer>
278 </body>
279</html>