Reading/Writing files to HDFS from Windwos server - linux

I want to write files to HDFS from windows server. Hadoop cluster is on Linux.
I tried researching everywhere I got a java code that can be run using "hadoop jar"
Can somebody help me to understand how can I run HDFS file write java code from windows? What is required on Windows box? Even a proper link will do.

You need only to code a simple java program and run it like a normal .jar file.
In the project you need to import the hadoop library.
This is a working example maven project (I tested it on my cluster):
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
public class WriteFileToHdfs {
public static void main(String[] args) throws IOException, URISyntaxException {
String dataNameLocation = "hdfs://[your-namenode-ip]:[the-port-where-hadoop-is-listening]/";
Configuration configuration = new Configuration();
FileSystem hdfs = FileSystem.get( new URI( dataNameLocation ), configuration );
Path file = new Path(dataNameLocation+"/myFile.txt");
FSDataOutputStream out = hdfs.create(file);
out.writeUTF("Some text ...");
out.close();
hdfs.close();
}
}
Remember to put the dependencies to your pom.xml and the instruction to build the manifest file for the main class:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
<mainClass>your.cool.package.WriteFileToHdfs</mainClass>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<phase>install</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
<mainClass>${mainClass}</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
Just lunch the program with the command:
java -jar nameOfTheJarFile.jar
Of course you need to edit the code with your package name and namenode ip address.

Related

Cucumber No features found with JUnit5

I am trying to setup Cucumber in my project. I am following the same configuration from my previous projects but I still have issues with running the tests. Now I am starting to suspect that the issue might be that this project is using JUnit 5 instead of 4. I have added junit4 to the build options as well to be able to execute the #RunWith annotation with jUnit4, but I still get the same error ( No features found at classpath ) .
The runner class is as follows:
import io.cucumber.junit.Cucumber;
import io.cucumber.junit.CucumberOptions;
import io.cucumber.junit.CucumberOptions.SnippetType;
import org.junit.runner.RunWith;
#RunWith(Cucumber.class)
#CucumberOptions(features = "classpath:resources", plugin = {"pretty", "html:target/reports/cucumber/html",
"json:target/cucumber.json", "usage:target/usage.jsonx",
"junit:target/junit.xml"}, snippets = SnippetType.CAMELCASE)
public class TestCucumberRunner {
}
The structure of the folders is following:
Here is the pom configuration:
As far as I can see, the #RunWith annotation is imported from junit4 and not 5, so why is this issue happening?
I also tried adding the feature file in the same folder with the runner, as well as adding the exact path in the feature option, but still the same error.
You can run Cucumber tests with Junit 5 and via maven. I searched a lot before finding the right configuration.
The important steps :
add maven-surefire-plugin in you plugins pom, so cucumber tests can bu run from mvn test
use the same structure for features in your test resources as your cucumber java steps (if your test class is in com.example.usescase, locate your feature in resources/com/example/usecase )
add cucumber launcher on the root folder of your java tests. I can be annotated with just #Cucumber
Courtesy to https://github.com/bonigarcia , I really found how to make it work thanks to its repository https://github.com/bonigarcia/mastering-junit5/tree/master/junit5-cucumber
With Junit5, you just need to write runner like below :
#Suite
#SelectClasspathResource("Features Folder")
public class Runner {
}
For using tags, you can put the tags properties in junit-platform.properties.
You can refer for pom dependencies - https://github.com/cucumber/cucumber-java-skeleton/blob/main/pom.xml
I was facing a lot of issues. I followed above and could run my cucumber tests with Junit5 without any issues.
There might be some problems with the step definitions as well (cann't tell exactly by looking at the info), looks like that Cucumber cannot find your feature file step definitions.
please have a look on cucumber documentation
You need to specify the path to your step definitions (glue path) correctly.
Usually cucumber jvm will search in the package (or sub-packages) of the runner class. However, you can also mention explicitly by the following way:
#CucumberOptions(glue = ["", "", ""])
Setting up Cucumber with JUnit 5 has not been documented very well (yet). The trick is to use the cucumber-junit-platform-engine as described in https://cucumber.io/docs/cucumber/api/.
For example:
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-java</artifactId>
<version>6.6.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-junit</artifactId>
<version>6.6.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-junit-platform-engine</artifactId>
<version>6.6.1</version>
<scope>test</scope>
</dependency>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<properties>
<configurationParameters>
cucumber.plugin=pretty,html:target/site/cucumber-pretty.html
cucumber.publish.quiet=true
cucumber.publish.enabled=false
</configurationParameters>
</properties>
</configuration>
</plugin>
</plugins>
</build>
Now use the maven-surefire-plugin to inject Cucumber parameters, since the 'old' JUnit 4 #CucumberOptions annotation won't have any effect anymore.
More Cucumber configuration options can be found here: https://github.com/cucumber/cucumber-jvm/tree/main/junit-platform-engine#configuration-options
Your Java entry point for your Cucumber tests will now look like this:
#RunWith(Cucumber.class)
public class BDDEntryPointTest {
/*
Entry point class for Cucumber test.
It will automatically scan for
1. *.feature files in src/test/resources
2. Step definitions in java files under in src/test/java
*/
}
I had similar issues with junit5 and I got it resolved by removing these three dependencies from pom
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>5.7.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-junit</artifactId>
<version>${cucumber.version}</version>
<scope>test</scope>
</dependency>
and by keeping these ones
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<version>5.7.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-java</artifactId>
<version>${cucumber.version}</version>
<scope>test</scope>
</dependency>
and then your runner class will be just
#Cucumber
public class AcceptanceIT {
}
and step defs would be . No #Test annotations
#Given("I log {string}")
public void logSomething(String teststr ) {
System.out.println("sample text:"+ teststr);
}
Note I am using maven-failsafe here . The runner class name might different if you use other plugin like maven-surefire or use any other mechanism. Here is my maven-failsafe config
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-failsafe-plugin</artifactId>
<version>3.0.0-M5</version>
<executions>
<execution>
<id>integration-test</id>
<goals>
<goal>integration-test</goal>
</goals>
</execution>
<execution>
<id>verify</id>
<goals>
<goal>verify</goal>
</goals>
</execution>
</executions>
<configuration>
<failIfNoTests>false</failIfNoTests>
<testSourceDirectory>${project.basedir}/src/test/java</testSourceDirectory>
<includes>
<include>**/*IT.java</include>
</includes>
</configuration>
</plugin>

Maven: how to set thread count for testng

I'm using testng to run tests in parallel. Xml file contains thread-count parameter.
<suite name="Lalala" parallel="tests" thread-count="3" preserve-order="true">
But I want to set the thread-count value from POM file. I tried
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>6.3.1</version>
</dependency>
and
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19</version>
<configuration>
<parallel>classes</parallel>
<threadCount>10</threadCount>
<suiteXmlFiles>
<suiteXmlFile>src/test/resources/${suite}.xml</suiteXmlFile>
</suiteXmlFiles>
<workingDirectory>target/</workingDirectory>
</configuration>
</plugin>
But thread count still equals 1
Is there some way to add thread-count from Pom file??
You may need to remove thread-count from your suite definition in your XML file as it will override any -threadcount parameter that Maven Surefire is passing to TestNG (see Command Line Parameters under Running TestNG).
From local testing it appears that threadCount and suiteXmlFiles aren't compatible and from the Maven Surefire Plugin documentation for suiteXmlFiles is states:
Note that suiteXmlFiles is incompatible with several other parameters of this plugin, like includes/excludes.
I believe that threadCount is another of the incompatible "other parameters".
Some of the same options available in TestNG XML files are also available when configuring the Maven Surefire Plugin so it looks like you will have to "port" your TestNG XML to Maven Surefire Plugin Configuration XML.
In my local testing I found that I could simply omit suiteXmlFiles and the plugin found and ran my tests with the specified threadCount. Depending on your TestNG XML your solution might take a bit more work.
I dont try to do this, but this configuration should work.
I'm not sure, but to use this you should use surefire plugin with version 2.19+. Also I recommend to not use surefire-specific element names in section (like <parallel>, <threadCount>, <groups> etc) when you use TestNG. The better choose is to use <properties> section with set of <property> values. Those values will be passed to testNG command line. Behavior for such properties are clearly described in TestNG documentation
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19</version>
<dependencies>
<dependency>
<groupId>org.apache.maven.surefire</groupId>
<artifactId>surefire-testng</artifactId>
<version>2.19</version>
</dependency>
</dependencies>
<configuration>
<suiteXmlFiles>
<suiteXmlFile>suites/my-suite.xml</suiteXmlFile>
</suiteXmlFiles>
<!-- DONT USE THIS
<parallel>methods</parallel>
<threadCount>5</threadCount>
-->
<properties>
<property>
<name>parallel</name>
<value>methods</value>
</property>
<property>
<name>threadcount</name>
<value>5</value>
</property>
<property>
<name>dataproviderthreadcount</name>
<value>3</value>
</property>
</properties>
</plugin>

Using maven jaxb2 plugin, modular compilation using episode and catalog throws Malformed URL error

My project contains A.xsd which imports schema as follows from B.xsd which is part of another project:
<xsd:import namespace="http://com.test.schema/common/Context" schemaLocation="http://com.test.schema/common/Context/B.xsd"/>
I am trying to use the episode from the project which contains B.xsd so that classes related to B.xsd do not get re-generated when A.xsd is parsed. So I referred this and this to come up with the following configuration:
Here is the pom.xml
<dependencies>
<dependency>
<groupId>com.bar.foo</groupId>
<artifactId>schema-b</artifactId>
<version>1.2</version>
</dependency>
<dependencies>
.
.
.
.
.
<build>
<plugins>
<dependency>
<plugin>
<groupId>org.jvnet.jaxb2.maven2</groupId>
<artifactId>maven-jaxb2-plugin</artifactId>
<version>0.13.0</version>
<executions>
<execution>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<extension>true</extension>
<episodes>
<episode>
<groupId>com.bar.foo</groupId>
<artifactId>schema-b</artifactId>
</episode>
</episodes>
<catalog>src/main/resources/catalog.cat</catalog>
<schemas>
<schema>
<fileset>
<directory>${basedir}/src/main/schemas</directory>
<includes>
<include>A.xsd</include>
<include>...</include>
<include>...</include>
</includes>
</fileset>
</schema>
</schemas>
<bindingDirectory>${basedir}/src/main/schemas</bindingDirectory>
<bindingIncludes>
<include>*.xjb</include>
</bindingIncludes>
<args>
<arg>-Xannotate</arg>
</args>
<plugins>
<plugin>
<groupId>org.jvnet.jaxb2_commons</groupId>
<artifactId>jaxb2-basics</artifactId>
<version>0.6.0</version>
</plugin>
<plugin>
<groupId>org.jvnet.jaxb2_commons</groupId>
<artifactId>jaxb2-basics-annotate</artifactId>
<version>0.6.0</version>
</plugin>
</plugins>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
Here is the catalog file:
PUBLIC "http://com.test.schema/common/Context" "maven:com.bar.foo:schema-a:jar::1.2!"
There is some configuration in the xjb file to make sure that XmlRootElement is written into some generated classes:
<jxb:bindings xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
jxb:version="2.1" xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc"
xmlns:annox="http://annox.dev.java.net" extensionBindingPrefixes="xjc">
<jxb:globalBindings>
<xjc:simple />
</jxb:globalBindings>
<jxb:bindings
schemaLocation="A.xsd">
<jxb:bindings node="//xsd:complexType[#name='ADataType']">
<jxb:class name="AData" />
<annox:annotate>
<annox:annotate annox:class="javax.xml.bind.annotation.XmlRootElement"
name="AData" />
</annox:annotate>
</jxb:bindings>
</jxb:bindings>
Inspite of providing the episode to the xjc execution and the location of the schema for B.xsd in the catalog file, the classes for B.xsd is getting generated.
The issue is that the maven artifact referred to by the catalog file is not being picked up. I see the following error in the maven build logs:
Malformed URL on system identifier: maven:com.bar.foo:schema-b:jar::1.2!
PUBLIC: http://com.test.schema/common/Context
maven:com.bar.foo:schema-a:jar::1.2!
Can anyone help tell me why am I hitting this malformed URL error for the artifact that contains B.xsd? Any help will be really appreciated.
Disclaimer: I'm the author of the maven-jaxb2-plugin.
First, you're probably mixing A and B here. You're saying A imports B but then you're using the schema-a artifact as episode. If B is imported, you should use schema-b as episode to not regenerate B stuff when compiling A.
But I think this is probably just a minor mistake in the question.
You have two aspects here - episodes and catalogs.
Episodes allow you to skip generation of classes you've generated somewhere else already. So if you use schema-b artifact when compiling schema-a then XJC should not generate classes for schema-b. You don't need catalogs for that, it's independent.
Sometimes XJC still generates few leftovers - even if you use an episode. I often get ObjectFactory and maybe some enums or top-level-elements generated. I believe this is an issue in XJC, there's nothing I can do in the maven-jaxb2-plugin about it.
So as a workaround I just use maven-antrun-plugin to delete unnecessary generated things.
If you get all of the B stuff generated then you should check if schema-b artifact really have the episode file generated. Check if you have META-INF/sun-jaxb.episode inside the JAR. See this answer for some trivia on the episode file.
So if you correctly configure the correct episode artifact you should not get B things generated, you don't need catalogs for this.
What you need the catalog for is to avoid downloading http://com.test.schema/common/Context/B.xsd when compiling. You can use catalogs to point to anothe location. I think your problem here is that you refer http://com.test.schema/common/Context to maven:com.bar.foo:schema-a:jar::1.2! which obviously does not point to the schema resource.
If you have an import like
<xsd:import namespace="http://com.test.schema/common/Context" schemaLocation="http://com.test.schema/common/Context/B.xsd"/>
Then you should probably rewrite it as follows:
PUBLIC "http://com.test.schema/common/Context" "maven:com.bar.foo:schema-b:jar::1.2!/common/Context/B.xsd"
Assuming schema-b artifact contains your schema under /common/Context/B.xsd. Note that it maps namespace, not schema location.
You can also use REWRITE_SYSTEM to rewrite schema location. For example:
REWRITE_SYSTEM "http://com.test.schema" "maven:com.bar.foo:schema-b:jar::1.2!"
If you have an URL like http://com.test.schema/common/Context/B.xsd, it will be rewritten to maven:com.bar.foo:schema-b:jar::1.2!/common/Context/B.xsd. This will point to the resource /common/Context/B.xsd inside your schema-b JAR.
Another hint - if you use schema-b as dependency in your project, you can omit the version.
Here's an example of catalog from a real world project:
https://github.com/highsource/ogc-schemas/blob/master/schemas/src/main/resources/ogc/catalog.cat
It contains rewrites like:
REWRITE_SYSTEM "http://schemas.opengis.net" "maven:org.jvnet.ogc:ogc-schemas:jar::!/ogc"

appassembler maven plugin doesn't set "execute" permissions on generated script

The AppAssembler Maven plugin does a great job of generating distribution for me. One last problem is that the generated Shell script does not have execution permissions so I need to set them manually.
I am on Linux RedHat
Does anybody know of a clean way to set them automatically?
The only way to do this is to process the file with another maven plugin like Antrun or Assembly after running AppAssembler.
This issue (see link below) has been brought up on the AppAssembler project issue tracker and it was rejected as Won't Fix.
Issue: MAPPASM-54
I think it can be set in your assembly.xml, in the fileSet tag:
<fileSets>
<fileSet>
<directory>src/resources/bin</directory>
<lineEnding>keep</lineEnding>
<useDefaultExcludes>true</useDefaultExcludes>
<outputDirectory>bin</outputDirectory>
<includes>
<include>*.bat</include>
<include>*.sh</include>
</includes>
<fileMode>744</fileMode>
</fileSet>
...
Since Maven 3.0.3 all plugins are executed in the order they are in your pom.xml. So setting the executeable flag in a platform independet manner is as easy as using the maven-enforcer-plugin right after your appassembler plugin.
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.3.1</version>
<executions>
<execution>
<id>enforce-beanshell</id>
<phase>package</phase>
<goals>
<goal>enforce</goal>
</goals>
<configuration>
<rules>
<evaluateBeanshell>
<condition>
import java.io.File;
print("set executable for file ${basedir}/dist/bin/mql");
new File("${basedir}/dist/bin/mql").setExecutable(true,false);
true;
</condition>
</evaluateBeanshell>
</rules>
<fail>false</fail>
</configuration>
</execution>
</executions>
</plugin>

Where to get a full list of pre-defined variables in gmaven-plugin?

Where to get a complete list of variables available in Groovy scripts executed under gmaven-plugin in Maven? Besides that, maybe someone knows where to find Gmaven documentation?
I'm aware about project and settings. I assume there are some others..
The page http://docs.codehaus.org/display/GMAVEN/Executing+Groovy+Code lists:
Default Variables
By default a few variables are bound into the scripts environment:
project The maven project, with auto-resolving properties
pom Alias for project
session The executing MavenSession
settings The executing Settings
log A SLF4J Logger instance
ant An AntBuilder instance for easy access to Ant tasks
fail() A helper to throw MojoExecutionException
This snippet in your pom should give you a better idea of what's available while running the script. Most of the interesting bits are probably in the binding.project, an instance of MavenProject.
<build>
<plugins>
<plugin>
<groupId>org.codehaus.groovy.maven</groupId>
<artifactId>gmaven-plugin</artifactId>
<executions>
<execution>
<phase>generate-resources</phase>
<goals>
<goal>execute</goal>
</goals>
<configuration>
<properties>
<hello>world</hello>
</properties>
<source>
println this.binding.variables
println project.properties
println settings.properties
</source>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

Resources