Tuesday, November 24, 2015

insert values directly to hive tables

INTRODUCTION

Usually, we will be loading the data into hive table from HDFS file system while creating the table.
This post will help to insert values into hive table directly.

ARRAY COLUMN

Create Table Query
DROP TABLE IF EXISTS student;
CREATE TABLE student(id STRING, name STRING, class STRING, subjects ARRAY<STRING>);

The latest hive version will not support adding the values directly using below query.
INSERT INTO student (id,name,class,subjects) VALUES ('100','Ranjith','Mathematics',ARRAY('Mathematics','Biology','Physiscs','Chemistry')

But we can use STACK function to insert here.
INSERT INTO TABLE student
SELECT STACK(3,
'100','Ranjith','Mathematics',ARRAY('Mathematics','Biology','Physiscs','Chemistry'),
'200','Sekar','Computer Science',ARRAY('Mathematics','Computer Science','Physiscs','Chemistry'),
'300','Bala','Commerce',ARRAY('Political Science','Accounts','Commerce')
)
FROM empinfo
LIMIT 3;

Result

MAP COLUMN

Create Table Query
DROP TABLE IF EXISTS stud_mark1;
CREATE TABLE stud_mark1(id STRING, name STRING, class STRING, mark MAP<STRING,INT>);

Insert query.
INSERT INTO TABLE stud_mark1
SELECT STACK(2,
'100','Ranjith','Mathematics',map("Mathematics",78,"Biology",88,"Physics",89,"Chemistry",90),
'200','Sekar','Science',map("Botany",71,"Zoolagy",98,"Physics",79,"Chemistry",89)
)
FROM empinfo
LIMIT 2;

Result

Thursday, November 12, 2015

Simple User Defined Functions (UDF) in Hive

INTRODUCTION

In this article, we are going see how can we create our own UDF in Hive.
Hive API enables to create our own function by means of extending its API Classes.

In this post, I am going to write a UDF using org.apache.hadoop.hive.ql.exec.UDF Class.

SOFTWARES & TOOLS

  1. Eclipse IDE (Mars2)
  2. Java 7
  3. Maven

DATABASE & TABLES

Create a new database
Query
CREATE DATABASE IF NOT EXISTS ranjith;

Create a new table by pointing the HDFS Location: "/ranjith/hive/data/emp/empinfo/"
Query
USE ranjith;
DROP TABLE IF EXISTS empinfo;
CREATE EXTERNAL TABLE empinfo(empid STRING, firstname STRING, lastname STRING, dob STRING, designation STRING, doj STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION "/ranjith/hive/data/emp/empinfo/";

Check the available data.

USECASE

Calculate the year of experience of the employee using their joining date.

IMPLEMENTATION

Create UDF


import org.apache.hadoop.hive.ql.exec.UDF
public class EmpExperienceUDFString extends UDF {

 public String evaluate(Text empid, Text doj) {
 // write you logic here
 }
}

Note the below points
  1. Your UDF class should extend
  2. Your UDF class should have mandatory evaluate method since hive will look for this method.
  3. You can return any String, Map or List from the evaluate method.

Create JAR & Copy to HDFS

  1. Create Jar using eclipse IDE or
  2. Go the project folder and run command: mvn jar:jar
  3. Assume your jar name is: HiveUDF-1.0.jar
  4. Copy the Jar to HDFS location: /ranjith/hive/jars/

Run UDF

Now we will how to run it, first enter the below commands

ADD JAR ranjith/hive/jars/HiveUDF-1.0.jar;
CREATE TEMPORARY FUNCTION emp_exp_string AS 'jbr.hiveudf.EmpExperienceUDFString';
SELECT emp_exp_string(empid,doj) from empinfo;

Line 1: add the jar to the classpath
Line 2: creating a temporary function name for your UDF.
Line 3: Get the output of the UDF by calling the temporary function name.

Now Map-Reduce job will run and display the output as below


Monday, November 9, 2015

Hive Commands

1. View table or schema details

Command
hive> DESCRIBE FORMATTED <tablename>;
hive> DESCRIBE EXTENDED <tablename>;

2. Alter Table

Command
hive> ALTER TABLE employee RENAME TO emp;
hive> ALTER TABLE emp ADD COLUMNS(salary INT);
hive> ALTER TABLE emp CHANGE salary sal INT;
hive> ALTER TABLE emp CHANGE sal sal STRING;

3. Add Jars into Hive Classpath

Command
hive> ADD JAR <jarname.jar>; - look for current directory
hive> ADD JAR /test/jars/<jarname.jar>;
hive> LIST JARS; - displays all the jars in the classpath

4. Load data to hive table from subdirectories

Command
hive> SET mapred.input.dir.recursive=true;
hive> SET hive.mapred.supports.subdirectories=true;
hive> SET mapred.input.dir.recursive=true;

Create Table in Hive

INTRODUCTION

Hive is a Meta-store where we can query and see the data which is stored in HDFS (Hadoop Distributed File System). In this article, we are going to see how we can create tables in Hive.
There are two types tables can be created in Hive. They are
  1. Internal Tables
  2. External Tables
Before we see more details on these external and internal tables, let me create a database. The syntax of Hive commands is same as MySQL.
CREATE DATABASE IF NOT EXISTS ranjith;
USE ranjith;

INTERNAL TABLES

  • Internal tables are created as a directory into the warehouse.
Example
CREATE TABLE emp(empid STRING, firstname STRING,
   lastname STRING, designation STRING, salary STRING,
   temporary_address STRING, permanent_address STRING,
   phone STRING, mobileno1 STRING, mobileno2 STRING);

EXTERNAL TABLES

  1. External tables are created by specifying the location in HDFS.
  2. Table will be created with ‘EXTERNAL’ keyword
Example: Use HDFS Location
CREATE EXTERNAL TABLE emp(empid STRING, firstname STRING,
   lastname STRING, designation STRING, salary STRING,
   temporary_address STRING, permanent_address STRING,
   phone STRING, mobileno1 STRING, mobileno2 STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION "/user/ranjith/hive/data/09Nov2015/";

Example: Use AWS S3 Location
... LOCATION "s3://user/ranjith/hive/data/09Nov2015/";
or
... LOCATION "s3n://user/ranjith/hive/data/09Nov2015/";

Tuesday, October 27, 2015

Eclipse MARS Plugins Installation


PHP - Plugin Installation

  1. Help->Install New Software
  2. Select Mars - http://download.eclipse.org/releases/mars and type PHP in text box
  3. Select checkbox on Programming languages->PHP Development Tools(PDT)
  4. Next -> Next -> Accept and Finish
         Installing Software.....................
  1. Restart Eclipse

Python - Plugin Installation

  1. Help->Install New Software->
Name: Python
Location: http://pydev.org/updates
  1. Click Next and Finish.
Now you should be fine to run a first Python Program.
  1. Go to Window->PyDev->Interpreters->Python Interpreter
  2. Click New Button at top right
Interpreter Name: Python
Interpreter Executable: C:\Python27\python.exe
  1. Click OK

could not resolve archetype org.apache.maven.archetypes maven-archetype-quickstart


INTRODUCTION

This post will help you to resolve the error which will occur during the creation of Project using Maven.
Error
could not resolve archetype org.apache.maven.archetypes maven-archetype-quickstart

ENVIRONMENT VARIABLES SETTINGS

Add the below environment variables.
  1. MAVEN_HOME=c:\maven-3.3.3
  2. M2_HOME==c:\maven-3.3.3
  3. CLASSPATH = %MAVEN_HOME%\lib;
  4. PATH=%MAVEN_HOME%\bin;

PROXY SETTINGS

Set this if you are connected with any vpn.
  1. Go to c:\maven-3.3.3\conf\settings.xml
  2. Add Your proxy details
<proxy>
<id>myproxy</id>
<active>true</active>
<protocol>http</protocol>
<host>www.myweb.com</host>
<port>80</port>
</proxy>

ECLIPSE SETTINGS

  1. Open Run Command and type . (just dot), you will be directed to your local user path like c:\Users\<username>
  2. Delete .m2 folder permanently
  3. Open Eclipse
  4. Go to Windows->Preferences->Maven->User Settings
    1. Global Settings: c:\maven-3.3.3\conf\settings.xml
    2. User Settings: c:\maven-3.3.3\conf\settings.xml
  5. Click Update Settings & Reindex
  6. Restart the Eclipse
Now try to create a Maven Eclipse and Maven Project will be created successfully. Please reply your comments if you still facing any error.

Read Content of Bzip file

INTRODUCTION

This post will help to unzip bz2 file which contains a list of XML files and generate JSON out of it.

Consider the content in the XML file as single line string.

SOFTWARES & TOOLS

  1. Eclipse.
  2. Commons-compress-1.10.jar
  3. Jackson-all-1.9.0.jar

My Input files are: Employees1.xml.bz2 & Employees2.xml.bz2
My Output files are: Employees1.json & Employees2.json

IMPLEMENTATION


  1. package jbr.json_xmlbzip;  
  2.   
  3. import java.io.BufferedInputStream;  
  4. import java.io.BufferedReader;  
  5. import java.io.File;  
  6. import java.io.FileInputStream;  
  7. import java.io.FileNotFoundException;  
  8. import java.io.FileOutputStream;  
  9. import java.io.IOException;  
  10. import java.io.InputStreamReader;  
  11. import java.util.ArrayList;  
  12. import java.util.List;  
  13. import java.util.Map;  
  14. import java.util.TreeMap;  
  15.   
  16. import javax.xml.parsers.DocumentBuilder;  
  17. import javax.xml.parsers.DocumentBuilderFactory;  
  18. import javax.xml.parsers.ParserConfigurationException;  
  19. import javax.xml.xpath.XPath;  
  20. import javax.xml.xpath.XPathConstants;  
  21. import javax.xml.xpath.XPathExpressionException;  
  22. import javax.xml.xpath.XPathFactory;  
  23.   
  24. import org.apache.commons.compress.compressors.CompressorException;  
  25. import org.apache.commons.compress.compressors.CompressorInputStream;  
  26. import org.apache.commons.compress.compressors.CompressorStreamFactory;  
  27. import org.codehaus.jackson.JsonGenerationException;  
  28. import org.codehaus.jackson.map.JsonMappingException;  
  29. import org.codehaus.jackson.map.ObjectMapper;  
  30. import org.w3c.dom.Document;  
  31. import org.w3c.dom.Node;  
  32. import org.w3c.dom.NodeList;  
  33. import org.xml.sax.SAXException;  
  34.   
  35. public class JsonFromBzipXml {  
  36.   
  37.   public static FileInputStream fis;  
  38.   public static DocumentBuilderFactory docBuilderFactory;  
  39.   public static DocumentBuilder docBuilder;  
  40.   public static Document xmlDocument;  
  41.   public static XPath xpathObj;  
  42.   
  43.   public void loadConfig(File inputFile) {  
  44.     try {  
  45.       fis = new FileInputStream(inputFile);  
  46.       docBuilderFactory = DocumentBuilderFactory.newInstance();  
  47.       docBuilder = docBuilderFactory.newDocumentBuilder();  
  48.       xmlDocument = docBuilder.parse(fis);  
  49.       xpathObj = XPathFactory.newInstance().newXPath();  
  50.     } catch (FileNotFoundException fe) {  
  51.       fe.printStackTrace();  
  52.     } catch (ParserConfigurationException pe) {  
  53.       pe.printStackTrace();  
  54.     } catch (IOException ie) {  
  55.       ie.printStackTrace();  
  56.     } catch (SAXException se) {  
  57.       se.printStackTrace();  
  58.     }  
  59.   }  
  60.   
  61.   /** 
  62.    * Start the Json Conversion 
  63.    *  
  64.    * @param args 
  65.    * @throws IOException 
  66.    * @throws CompressorException 
  67.    */  
  68.   public static void main(String[] args) throws IOException, CompressorException, XPathExpressionException {  
  69.     JsonFromBzipXml converter = new JsonFromBzipXml();  
  70.     converter.generateJson();  
  71.   }  
  72.   
  73.   public void generateJson() throws XPathExpressionException {  
  74.     Map<String, List<Map<String, String>>> employees = new TreeMap<>();  
  75.     Map<String, String> employee = null;  
  76.   
  77.     File xmlFile = null;  
  78.     FileInputStream fileInputStream = null;  
  79.     BufferedInputStream bufferedInputStream = null;  
  80.     CompressorInputStream compressorInputStream = null;  
  81.     BufferedReader bufferedReader = null;  
  82.   
  83.     try {  
  84.       File[] bzipFiles = new File(JsonConstants.INPUT_BZIP_DIRECTORY).listFiles();  
  85.   
  86.       for (File bzipFile : bzipFiles) {  
  87.   
  88.         List<Map<String, String>> empList = new ArrayList<>();  
  89.   
  90.         // Unzip bzip Files  
  91.         fileInputStream = new FileInputStream(bzipFile);  
  92.         bufferedInputStream = new BufferedInputStream(fileInputStream);  
  93.         compressorInputStream = new CompressorStreamFactory().createCompressorInputStream(bufferedInputStream);  
  94.         bufferedReader = new BufferedReader(new InputStreamReader(compressorInputStream));  
  95.   
  96.         String line = null;  
  97.   
  98.         // System.out.println(bzipFile);  
  99.   
  100.         while ((line = bufferedReader.readLine()) != null) {  
  101.           employee = new TreeMap<>();  
  102.           xmlFile = new File(bzipFile.getName().replace(".bz2"""));  
  103.           xmlFile.createNewFile();  
  104.           FileOutputStream fileOutputStream = new FileOutputStream(xmlFile);  
  105.           fileOutputStream.write(line.getBytes());  
  106.           // System.out.println(line);  
  107.   
  108.           // Load the xml file  
  109.           loadConfig(xmlFile);  
  110.   
  111.           System.out.println("\n===Retrieve all Elements of an Employee====");  
  112.           String xpath = "/Employees/Employee";  
  113.           System.out.println("xpath: " + xpath);  
  114.           Node node = (Node) xpathObj.compile(xpath).evaluate(xmlDocument, XPathConstants.NODE);  
  115.           if (null != node) {  
  116.             NodeList nodeList = node.getChildNodes();  
  117.   
  118.             for (int i = 0null != nodeList && i < nodeList.getLength(); i++) {  
  119.               Node subNode = nodeList.item(i);  
  120.               if (subNode.getNodeType() == Node.ELEMENT_NODE)  
  121.                 System.out.println(nodeList.item(i).getNodeName() + " : " + subNode.getFirstChild().getNodeValue());  
  122.               employee.put(nodeList.item(i).getNodeName(), subNode.getFirstChild().getNodeValue());  
  123.             }  
  124.           }  
  125.   
  126.           fileOutputStream.close();  
  127.         } // end while  
  128.   
  129.         empList.add(employee);  
  130.         employees.put("employees", empList);  
  131.         toJsonFromMap(employees, JsonConstants.OUTPUT_JSON_DIRECTORY + xmlFile.getName().replace(".xml"".json"));  
  132.   
  133.       } // end for  
  134.   
  135.     } catch (IOException ioException) {  
  136.       ioException.printStackTrace();  
  137.     } catch (CompressorException ce) {  
  138.       ce.printStackTrace();  
  139.     }  
  140.   }  
  141.   
  142.   public static void toJsonFromMap(Map map, String outputFile) {  
  143.     try {  
  144.       new ObjectMapper().writeValue(new File(outputFile), map);  
  145.     } catch (JsonGenerationException e) {  
  146.       e.printStackTrace();  
  147.     } catch (JsonMappingException e) {  
  148.       e.printStackTrace();  
  149.     } catch (IOException e) {  
  150.       e.printStackTrace();  
  151.     }  
  152.   }  
  153.   
  154.   public static String toJsonFromList1(List list) {  
  155.     String value = "";  
  156.     try {  
  157.       value = new ObjectMapper().writeValueAsString(list);  
  158.     } catch (JsonGenerationException e) {  
  159.       e.printStackTrace();  
  160.     } catch (JsonMappingException e) {  
  161.       e.printStackTrace();  
  162.     } catch (IOException e) {  
  163.       e.printStackTrace();  
  164.     }  
  165.   
  166.     return value;  
  167.   }  
  168.   
  169.   public interface JsonConstants {  
  170.     String INPUT_BZIP_DIRECTORY = "input/bzip";  
  171.     String OUTPUT_JSON_DIRECTORY = "output/bzip-jsons/";  
  172.   }  
  173. }  



OUTPUT


Console


===Retrieve all Elements of an Employee====
xpath: /Employees/Employee
firstname : Ranjith
lastname : Sekar
age : 30
address : Chennai
email : ranjith@gmail.com

===Retrieve all Elements of an Employee====
xpath: /Employees/Employee
firstname : Ranjith
lastname : Sekar
age : 30
address : Chennai
email : ranjith@gmail.com

JSON File


{"employees":[{"address":"Chennai","age":"30","email":"ranjith@gmail.com","firstname":"Ranjith","lastname":"Sekar"}]}