INTRODUCTION

In this article, we are going see how can we create our own UDF in Hive.

Hive API enables to create our own function by means of extending its API Classes.

In this post, I am going to write a UDF using org.apache.hadoop.hive.ql.exec.UDF Class.

SOFTWARES & TOOLS

Eclipse IDE (Mars2)
Java 7
Maven

DATABASE & TABLES

Create a new database

Query

CREATE DATABASE IF NOT EXISTS ranjith;

Create a new table by pointing the HDFS Location: "/ranjith/hive/data/emp/empinfo/"

Query

USE ranjith;

DROP TABLE IF EXISTS empinfo;

CREATE EXTERNAL TABLE empinfo(empid STRING, firstname STRING, lastname STRING, dob STRING, designation STRING, doj STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE

LOCATION "/ranjith/hive/data/emp/empinfo/";

Check the available data.

USECASE

Calculate the year of experience of the employee using their joining date.

IMPLEMENTATION

Create UDF

import org.apache.hadoop.hive.ql.exec.UDF

public class EmpExperienceUDFString extends UDF {

public String evaluate(Text empid, Text doj) {

// write you logic here

}

Note the below points

Your UDF class should extend
Your UDF class should have mandatory evaluate method since hive will look for this method.
You can return any String, Map or List from the evaluate method.

Create JAR & Copy to HDFS

Create Jar using eclipse IDE or
Go the project folder and run command: mvn jar:jar
Assume your jar name is: HiveUDF-1.0.jar
Copy the Jar to HDFS location: /ranjith/hive/jars/

Run UDF

Now we will how to run it, first enter the below commands

ADD JAR ranjith/hive/jars/HiveUDF-1.0.jar;

CREATE TEMPORARY FUNCTION emp_exp_string AS 'jbr.hiveudf.EmpExperienceUDFString';

SELECT emp_exp_string(empid,doj) from empinfo;

Line 1: add the jar to the classpath

Line 2: creating a temporary function name for your UDF.

Line 3: Get the output of the UDF by calling the temporary function name.

Now Map-Reduce job will run and display the output as below

Complete Source Code

Java & J2EE Technology Blog

Pages

Thursday, November 12, 2015

Simple User Defined Functions (UDF) in Hive