Tuesday, November 24, 2015

insert values directly to hive tables

INTRODUCTION

Usually, we will be loading the data into hive table from HDFS file system while creating the table.

This post will help to insert values into hive table directly.

ARRAY COLUMN

Create Table Query

DROP TABLE IF EXISTS student;

CREATE TABLE student(id STRING, name STRING, class STRING, subjects ARRAY<STRING>);

The latest hive version will not support adding the values directly using below query.

INSERT INTO student (id,name,class,subjects) VALUES ('100','Ranjith','Mathematics',ARRAY('Mathematics','Biology','Physiscs','Chemistry')

But we can use STACK function to insert here.

INSERT INTO TABLE student

SELECT STACK(3,

'100','Ranjith','Mathematics',ARRAY('Mathematics','Biology','Physiscs','Chemistry'),

'200','Sekar','Computer Science',ARRAY('Mathematics','Computer Science','Physiscs','Chemistry'),

'300','Bala','Commerce',ARRAY('Political Science','Accounts','Commerce')

)

FROM empinfo

LIMIT 3;

Result

MAP COLUMN

Create Table Query

DROP TABLE IF EXISTS stud_mark1;

CREATE TABLE stud_mark1(id STRING, name STRING, class STRING, mark MAP<STRING,INT>);

Insert query.

INSERT INTO TABLE stud_mark1

SELECT STACK(2,

'100','Ranjith','Mathematics',map("Mathematics",78,"Biology",88,"Physics",89,"Chemistry",90),

'200','Sekar','Science',map("Botany",71,"Zoolagy",98,"Physics",79,"Chemistry",89)

)

FROM empinfo

LIMIT 2;

Result

Thursday, November 12, 2015

Simple User Defined Functions (UDF) in Hive

INTRODUCTION

In this article, we are going see how can we create our own UDF in Hive.

Hive API enables to create our own function by means of extending its API Classes.

In this post, I am going to write a UDF using org.apache.hadoop.hive.ql.exec.UDF Class.

SOFTWARES & TOOLS

Eclipse IDE (Mars2)
Java 7
Maven

DATABASE & TABLES

Create a new database

Query

CREATE DATABASE IF NOT EXISTS ranjith;

Create a new table by pointing the HDFS Location: "/ranjith/hive/data/emp/empinfo/"

Query

USE ranjith;

DROP TABLE IF EXISTS empinfo;

CREATE EXTERNAL TABLE empinfo(empid STRING, firstname STRING, lastname STRING, dob STRING, designation STRING, doj STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE

LOCATION "/ranjith/hive/data/emp/empinfo/";

Check the available data.

USECASE

Calculate the year of experience of the employee using their joining date.

IMPLEMENTATION

Create UDF

import org.apache.hadoop.hive.ql.exec.UDF

public class EmpExperienceUDFString extends UDF {

public String evaluate(Text empid, Text doj) {

// write you logic here

}

Note the below points

Your UDF class should extend
Your UDF class should have mandatory evaluate method since hive will look for this method.
You can return any String, Map or List from the evaluate method.

Create JAR & Copy to HDFS

Create Jar using eclipse IDE or
Go the project folder and run command: mvn jar:jar
Assume your jar name is: HiveUDF-1.0.jar
Copy the Jar to HDFS location: /ranjith/hive/jars/

Run UDF

Now we will how to run it, first enter the below commands

ADD JAR ranjith/hive/jars/HiveUDF-1.0.jar;

CREATE TEMPORARY FUNCTION emp_exp_string AS 'jbr.hiveudf.EmpExperienceUDFString';

SELECT emp_exp_string(empid,doj) from empinfo;

Line 1: add the jar to the classpath

Line 2: creating a temporary function name for your UDF.

Line 3: Get the output of the UDF by calling the temporary function name.

Now Map-Reduce job will run and display the output as below

Complete Source Code

Monday, November 9, 2015

Hive Commands

1. View table or schema details

Command

hive> DESCRIBE FORMATTED <tablename>;

hive> DESCRIBE EXTENDED <tablename>;

2. Alter Table

Command

hive> ALTER TABLE employee RENAME TO emp;

hive> ALTER TABLE emp ADD COLUMNS(salary INT);

hive> ALTER TABLE emp CHANGE salary sal INT;

hive> ALTER TABLE emp CHANGE sal sal STRING;

3. Add Jars into Hive Classpath

Command

hive> ADD JAR <jarname.jar>; - look for current directory

hive> ADD JAR /test/jars/<jarname.jar>;

hive> LIST JARS; - displays all the jars in the classpath

4. Load data to hive table from subdirectories

Command

hive> SET mapred.input.dir.recursive=true;

hive> SET hive.mapred.supports.subdirectories=true;

hive> SET mapred.input.dir.recursive=true;

Create Table in Hive

INTRODUCTION

Hive is a Meta-store where we can query and see the data which is stored in HDFS (Hadoop Distributed File System). In this article, we are going to see how we can create tables in Hive.

There are two types tables can be created in Hive. They are

Internal Tables
External Tables

Before we see more details on these external and internal tables, let me create a database. The syntax of Hive commands is same as MySQL.

CREATE DATABASE IF NOT EXISTS ranjith;

USE ranjith;

INTERNAL TABLES

Internal tables are created as a directory into the warehouse.

Example

CREATE TABLE emp(empid STRING, firstname STRING,

lastname STRING, designation STRING, salary STRING,

temporary_address STRING, permanent_address STRING,

phone STRING, mobileno1 STRING, mobileno2 STRING);

EXTERNAL TABLES

External tables are created by specifying the location in HDFS.
Table will be created with ‘EXTERNAL’ keyword

Example: Use HDFS Location

CREATE EXTERNAL TABLE emp(empid STRING, firstname STRING,

lastname STRING, designation STRING, salary STRING,

temporary_address STRING, permanent_address STRING,

phone STRING, mobileno1 STRING, mobileno2 STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE

LOCATION "/user/ranjith/hive/data/09Nov2015/";

Example: Use AWS S3 Location

... LOCATION "s3://user/ranjith/hive/data/09Nov2015/";

... LOCATION "s3n://user/ranjith/hive/data/09Nov2015/";

Java & J2EE Technology Blog

Pages

Tuesday, November 24, 2015

insert values directly to hive tables

ARRAY COLUMN

MAP COLUMN

Thursday, November 12, 2015

Simple User Defined Functions (UDF) in Hive

INTRODUCTION

SOFTWARES & TOOLS

DATABASE & TABLES

USECASE

IMPLEMENTATION

Create UDF

Create JAR & Copy to HDFS

Run UDF

Monday, November 9, 2015

Hive Commands

1. View table or schema details

2. Alter Table

3. Add Jars into Hive Classpath

4. Load data to hive table from subdirectories

Create Table in Hive

INTERNAL TABLES

EXTERNAL TABLES

Pages