How to Use Sqoop in Java Program

How to use Sqoop in Java Program?

You can run sqoop from inside your java code by including the sqoop jar in your classpath and calling the Sqoop.runTool() method. You would have to create the required parameters to sqoop programmatically as if it were the command line (e.g. --connect etc.).

Please pay attention to the following:

  • Make sure that the sqoop tool name (e.g. import/export etc.) is the first parameter.
  • Pay attention to classpath ordering - The execution might fail because sqoop requires version X of a library and you use a different version. Ensure that the libraries that sqoop requires are not overshadowed by your own dependencies. I've encountered such a problem with commons-io (sqoop requires v1.4) and had a NoSuchMethod exception since I was using commons-io v1.2.
  • Each argument needs to be on a separate array element. For example, "--connect jdbc:mysql:..." should be passed as two separate elements in the array, not one.
  • The sqoop parser knows how to accept double-quoted parameters, so use double quotes if you need to (I suggest always). The only exception is the fields-delimited-by parameter which expects a single char, so don't double-quote it.
  • I'd suggest splitting the command-line-arguments creation logic and the actual execution so your logic can be tested properly without actually running the tool.
  • It would be better to use the --hadoop-home parameter, in order to prevent dependency on the environment.
  • The advantage of Sqoop.runTool() as opposed to Sqoop.Main() is the fact that runTool() return the error code of the execution.

Hope that helps.

final int ret = Sqoop.runTool(new String[] { ... });
if (ret != 0) {
throw new RuntimeException("Sqoop failed - return code " + Integer.toString(ret));
}

RL

Java Program imports data using sqoop

It's writing to the local file system instead of HDFS because the default file system is local unless otherwise configured. You can configure this to be HDFS using SqoopOptions - see this question / answer for an example:

  • How can I execute Sqoop in Java?

Specifically you need to locate and pass the location of your clusters core-site and hdfs-site xml files:

Configuration config = new Configuration(); 
config.addResource(new Path("/usr/local/hadoop/conf/core-site.xml"));
config.addResource(new Path("/usr/local/hadoop/conf/hdfs-site.xml"));

how to import data using Sqoop java api?

I was needing to use Sqoop to import data from MySQL to Hive using Scala inside a Cloudera CDH 5.7 cluster, so I started by following this answer.

The problem was that it was not getting the right configurations when it was being executed on the server.

Executing Sqoop manually was something like this:

sqoop import --hive-import --connect "jdbc:mysql://host/db" \
--username "username" --password "password" --table "viewName" \
--hive-table "outputTable" -m 1 --check-column "dateColumnName" \
--last-value "lastMinDate" --incremental append

So at the end I chose to execute it as an external process using Scala's sys.process.ProcessBuilder. It does not require any SBT dependency for running this way. Finally the runner was implemented this way:

import sys.process._

def executeSqoop(connectionString: String, username: String, password: String,
viewName: String, outputTable: String,
dateColumnName: String, lastMinDate: String) = {
// To print every single line the process is writing into stdout and stderr respectively
val sqoopLogger = ProcessLogger(
normalLine => log.debug(normalLine),
errorLine => errorLine match {
case line if line.contains("ERROR") => log.error(line)
case line if line.contains("WARN") => log.warning(line)
case line if line.contains("INFO") => log.info(line)
case line => log.debug(line)
}
)

// Create Sqoop command, every parameter and value must be a separated String into the Seq
val command = Seq("sqoop", "import", "--hive-import",
"--connect", connectionString,
"--username", username,
"--password", password,
"--table", viewName,
"--hive-table", outputTable,
"-m", "1",
"--check-column", dateColumnName,
"--last-value", lastMinDate,
"--incremental", "append")

// result will contain the exit code of the command
val result = command ! sqoopLogger
if (result != 0) {
log.error("The Sqoop process did not finished successfully")
} else {
log.info("The Sqoop process finished successfully")
}
}


Related Topics



Leave a reply



Submit