Tuesday, November 24, 2015

insert values directly to hive tables

INTRODUCTION

Usually, we will be loading the data into hive table from HDFS file system while creating the table.
This post will help to insert values into hive table directly.

ARRAY COLUMN

Create Table Query
DROP TABLE IF EXISTS student;
CREATE TABLE student(id STRING, name STRING, class STRING, subjects ARRAY<STRING>);

The latest hive version will not support adding the values directly using below query.
INSERT INTO student (id,name,class,subjects) VALUES ('100','Ranjith','Mathematics',ARRAY('Mathematics','Biology','Physiscs','Chemistry')

But we can use STACK function to insert here.
INSERT INTO TABLE student
SELECT STACK(3,
'100','Ranjith','Mathematics',ARRAY('Mathematics','Biology','Physiscs','Chemistry'),
'200','Sekar','Computer Science',ARRAY('Mathematics','Computer Science','Physiscs','Chemistry'),
'300','Bala','Commerce',ARRAY('Political Science','Accounts','Commerce')
)
FROM empinfo
LIMIT 3;

Result

MAP COLUMN

Create Table Query
DROP TABLE IF EXISTS stud_mark1;
CREATE TABLE stud_mark1(id STRING, name STRING, class STRING, mark MAP<STRING,INT>);

Insert query.
INSERT INTO TABLE stud_mark1
SELECT STACK(2,
'100','Ranjith','Mathematics',map("Mathematics",78,"Biology",88,"Physics",89,"Chemistry",90),
'200','Sekar','Science',map("Botany",71,"Zoolagy",98,"Physics",79,"Chemistry",89)
)
FROM empinfo
LIMIT 2;

Result

Thursday, November 12, 2015

Simple User Defined Functions (UDF) in Hive

INTRODUCTION

In this article, we are going see how can we create our own UDF in Hive.
Hive API enables to create our own function by means of extending its API Classes.

In this post, I am going to write a UDF using org.apache.hadoop.hive.ql.exec.UDF Class.

SOFTWARES & TOOLS

  1. Eclipse IDE (Mars2)
  2. Java 7
  3. Maven

DATABASE & TABLES

Create a new database
Query
CREATE DATABASE IF NOT EXISTS ranjith;

Create a new table by pointing the HDFS Location: "/ranjith/hive/data/emp/empinfo/"
Query
USE ranjith;
DROP TABLE IF EXISTS empinfo;
CREATE EXTERNAL TABLE empinfo(empid STRING, firstname STRING, lastname STRING, dob STRING, designation STRING, doj STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION "/ranjith/hive/data/emp/empinfo/";

Check the available data.

USECASE

Calculate the year of experience of the employee using their joining date.

IMPLEMENTATION

Create UDF


import org.apache.hadoop.hive.ql.exec.UDF
public class EmpExperienceUDFString extends UDF {

 public String evaluate(Text empid, Text doj) {
 // write you logic here
 }
}

Note the below points
  1. Your UDF class should extend
  2. Your UDF class should have mandatory evaluate method since hive will look for this method.
  3. You can return any String, Map or List from the evaluate method.

Create JAR & Copy to HDFS

  1. Create Jar using eclipse IDE or
  2. Go the project folder and run command: mvn jar:jar
  3. Assume your jar name is: HiveUDF-1.0.jar
  4. Copy the Jar to HDFS location: /ranjith/hive/jars/

Run UDF

Now we will how to run it, first enter the below commands

ADD JAR ranjith/hive/jars/HiveUDF-1.0.jar;
CREATE TEMPORARY FUNCTION emp_exp_string AS 'jbr.hiveudf.EmpExperienceUDFString';
SELECT emp_exp_string(empid,doj) from empinfo;

Line 1: add the jar to the classpath
Line 2: creating a temporary function name for your UDF.
Line 3: Get the output of the UDF by calling the temporary function name.

Now Map-Reduce job will run and display the output as below


Monday, November 9, 2015

Hive Commands

1. View table or schema details

Command
hive> DESCRIBE FORMATTED <tablename>;
hive> DESCRIBE EXTENDED <tablename>;

2. Alter Table

Command
hive> ALTER TABLE employee RENAME TO emp;
hive> ALTER TABLE emp ADD COLUMNS(salary INT);
hive> ALTER TABLE emp CHANGE salary sal INT;
hive> ALTER TABLE emp CHANGE sal sal STRING;

3. Add Jars into Hive Classpath

Command
hive> ADD JAR <jarname.jar>; - look for current directory
hive> ADD JAR /test/jars/<jarname.jar>;
hive> LIST JARS; - displays all the jars in the classpath

4. Load data to hive table from subdirectories

Command
hive> SET mapred.input.dir.recursive=true;
hive> SET hive.mapred.supports.subdirectories=true;
hive> SET mapred.input.dir.recursive=true;

Create Table in Hive

INTRODUCTION

Hive is a Meta-store where we can query and see the data which is stored in HDFS (Hadoop Distributed File System). In this article, we are going to see how we can create tables in Hive.
There are two types tables can be created in Hive. They are
  1. Internal Tables
  2. External Tables
Before we see more details on these external and internal tables, let me create a database. The syntax of Hive commands is same as MySQL.
CREATE DATABASE IF NOT EXISTS ranjith;
USE ranjith;

INTERNAL TABLES

  • Internal tables are created as a directory into the warehouse.
Example
CREATE TABLE emp(empid STRING, firstname STRING,
   lastname STRING, designation STRING, salary STRING,
   temporary_address STRING, permanent_address STRING,
   phone STRING, mobileno1 STRING, mobileno2 STRING);

EXTERNAL TABLES

  1. External tables are created by specifying the location in HDFS.
  2. Table will be created with ‘EXTERNAL’ keyword
Example: Use HDFS Location
CREATE EXTERNAL TABLE emp(empid STRING, firstname STRING,
   lastname STRING, designation STRING, salary STRING,
   temporary_address STRING, permanent_address STRING,
   phone STRING, mobileno1 STRING, mobileno2 STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION "/user/ranjith/hive/data/09Nov2015/";

Example: Use AWS S3 Location
... LOCATION "s3://user/ranjith/hive/data/09Nov2015/";
or
... LOCATION "s3n://user/ranjith/hive/data/09Nov2015/";