Overriding whoAmI in HDFS - linux

I'm going through the definitive hadoop book. In there the author describes how one can override the whoAmI mechanism by defining a hadoop.job.ugi. Well, I'm not getting anywhere with it...
I'm using hduser#ubuntu (my box name).
I have created a localhost.conf to override the default conf:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>hadoop.job.ugi</name>
<value>user, supergroup</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
but when I run hadoop fs -conf localhost.conf -mkdir newDir followed by hadoop fs -conf localhost.conf -ls . I see that the directory newDir is created by hduser and not by user
I must be missing a setting....
Thanks in advance.

Related

How to fix issue with nutch readseg not dumping any content?

I am using Nutch 1.4 local on iOS, to crawl a website and nutch
readseg dump does not return any relevant information.
What am I missing?
I am trying to extract 'category' as new metadata from url.
I am using replace to extract substring from the url. I am able to
run the code and index the documents in Google Cloud Search. But
it is not capturing the category.
To debug this end to end I like to verify that the correct value
is extracted by nutch in category metadata. I verified that regex
is correct with a regex tester. I want to log metadata
values - url, category in the log or stdout. I so not see any
pertinent information in hadoop.log even in DEBUG.
$bin/nutch readseg -dump TestCrawl/segments/* segmentAllContent
SegmentReader: dump segment: TestCrawl/segments/20190128171825
SegmentReader: done
logs/hadoop.log -
2019-01-29 11:40:02,275 INFO segment.SegmentReader -
SegmentReader:
dump segment: TestCrawl/segments/20190128171825 .
2019-01-29 11:40:02,463 WARN util.NativeCodeLoader - Unable to
load
native-hadoop library for your platform... using builtin-java
classes where applicable.
log4j.properties
log4j.logger.org.apache.nutch=DEBUG
nutch-site.xml
<property>
<name>index.replace.regexp</name>
<value>
urlmatch=.*mycompany\.com\/([a-zA-Z0-9-]+)
url:category=$1
</value>
</property>
<property>
<name>urlmeta.tags</name>
<value>title,category</value>
<description>
test
</description>
</property>
<property>
<name>index.parse.md</name>
<value>*</value>
<description> test </description
</property>
The readseg -dump command only writes everything contained in the segment as plain text to the output directory segmentAllContent. It does not run the indexer and consequently does not call the plugin index-replace. You may use the command bin/nutch indexchecker to check whether the plugin is configured properly.
Please note that the plugin index-replace is not available in Nutch 1.4, it has been added with Nutch 1.11.
Example how to use the indexchecker to check the index-replace plugin:
% bin/nutch indexchecker \
-Dplugin.includes='protocol-okhttp|parse-html|index-(basic|replace|static)' \
-Dindexingfilter.order='org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.staticfield.StaticFieldIndexer org.apache.nutch.indexer.replace.ReplaceIndexer' \
-Dindex.static='category:unknown' \
-Dindex.replace.regexp=$'hostmatch=localhost\ncategory=/.+/intranet/' \
http://localhost/
...
host : localhost
id : http://localhost/
title : Apache2 Ubuntu Default Page: It works
category : intranet
url : http://localhost/
...
the plugin index-static is configured to add a field "category" with value "unknown"
the plugin index-replace changes the value to "intranet" if the hostname is "localhost" (the $'...' notation is expands \n)

Maven: how to set thread count for testng

I'm using testng to run tests in parallel. Xml file contains thread-count parameter.
<suite name="Lalala" parallel="tests" thread-count="3" preserve-order="true">
But I want to set the thread-count value from POM file. I tried
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>6.3.1</version>
</dependency>
and
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19</version>
<configuration>
<parallel>classes</parallel>
<threadCount>10</threadCount>
<suiteXmlFiles>
<suiteXmlFile>src/test/resources/${suite}.xml</suiteXmlFile>
</suiteXmlFiles>
<workingDirectory>target/</workingDirectory>
</configuration>
</plugin>
But thread count still equals 1
Is there some way to add thread-count from Pom file??
You may need to remove thread-count from your suite definition in your XML file as it will override any -threadcount parameter that Maven Surefire is passing to TestNG (see Command Line Parameters under Running TestNG).
From local testing it appears that threadCount and suiteXmlFiles aren't compatible and from the Maven Surefire Plugin documentation for suiteXmlFiles is states:
Note that suiteXmlFiles is incompatible with several other parameters of this plugin, like includes/excludes.
I believe that threadCount is another of the incompatible "other parameters".
Some of the same options available in TestNG XML files are also available when configuring the Maven Surefire Plugin so it looks like you will have to "port" your TestNG XML to Maven Surefire Plugin Configuration XML.
In my local testing I found that I could simply omit suiteXmlFiles and the plugin found and ran my tests with the specified threadCount. Depending on your TestNG XML your solution might take a bit more work.
I dont try to do this, but this configuration should work.
I'm not sure, but to use this you should use surefire plugin with version 2.19+. Also I recommend to not use surefire-specific element names in section (like <parallel>, <threadCount>, <groups> etc) when you use TestNG. The better choose is to use <properties> section with set of <property> values. Those values will be passed to testNG command line. Behavior for such properties are clearly described in TestNG documentation
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19</version>
<dependencies>
<dependency>
<groupId>org.apache.maven.surefire</groupId>
<artifactId>surefire-testng</artifactId>
<version>2.19</version>
</dependency>
</dependencies>
<configuration>
<suiteXmlFiles>
<suiteXmlFile>suites/my-suite.xml</suiteXmlFile>
</suiteXmlFiles>
<!-- DONT USE THIS
<parallel>methods</parallel>
<threadCount>5</threadCount>
-->
<properties>
<property>
<name>parallel</name>
<value>methods</value>
</property>
<property>
<name>threadcount</name>
<value>5</value>
</property>
<property>
<name>dataproviderthreadcount</name>
<value>3</value>
</property>
</properties>
</plugin>

How do I enable page compression for Jboss EAP6.2

How can I enable page/gzip compression for JBoss EAP 6.2?
I have found information online for older JBoss AS, but nothing for EAP.
you can do this by:
file : standalone.xml (JBoss AS 7+ )
Position : right after
<extensions>
</extensions>
<system-properties>
<property name="org.apache.coyote.http11.Http11Protocol.COMPRESSION" value="on"/>
<property name="org.apache.coyote.http11.Http11Protocol.COMPRESSION_MIME_TYPES" value="text/javascript,text/css,text/html,text/xml,text/json"/> <!-- add other content types you want to gzip -->
</system-properties>

Configuring log4j with jboss-as-7.1.1

I have been reading a lot on this forum,jboss docs and on the internet to successfully get log4j configuration to work with jboss as 7.1.1, I do not want to use the logging subsystem in jboss. I want to use my own log4j configuration. My jboss server is configured in standalone mode. The following is what I did to get log4j configured based on docs :
Define a jboss-deployment-structure.xml as per https://docs.jboss.org/author/display/AS71/How+To#HowTo-HowdoIuselog4j.propertiesorlog4j.xmlinsteadofusingtheloggingsubsystemconfiguration%3F, and I added this in the META-INF directory of my EAR
I added a log4j.xml as is, and packaged inside a jar in the lib directory of my ear.
Removed the logging subsystem and the extension module="org.jboss.as.logging" from standalone.xml.
I did not change the logging.properties that is provided as a startup parameter in the startup.sh as I've read that is the logging that the jboss server will use before the subsystem kicks in.
Inspite of doing all of this I cannot get the application to log as per my log4j configuration.
My reason to use my own log4j configuration instead of the logging subsytem is to be able to use a custom rolling file appender for size-rotating-file-handler, as I want the rotated files to have a timestamp attached to the file name.
Help appreciated,
Ok, so I created a class MyHandler.java that extends from the SizeRotatingFileHandler.java and I override the preWrite method. MyHandler.java is in a package a.b.c. I create a sub directory under modules a/b/c and inside the c directory I add a jar that has just the Myhandler.class file. I add a module.xml
<?xml version="1.0" encoding="UTF-8"?>
<module xmlns="urn:jboss:module:1.0" name="a.b.c">
<resources>
<resource-root path="RotatingHandler.jar"/>
<!-- Insert resources here -->
</resources>
<dependencies>
<module name="org.jboss.logging"/>
</dependencies>
</module>
Then in the standalone.xml I added an entry for the custom handler
<custom-handler name="FILE3" class="a.b.c.MyHandler" module="a.b.c">
<level name="INFO"/>
<formatter>
<pattern-formatter pattern="%d{HH:mm:ss,SSS} %-5p [%c] (%t) %s%E%n"/>
</formatter>
<file relative-to="jboss.server.log.dir" path="srveree.log"/>
</custom-handler>
When I start Jboss it says it cant find the class a.b.c.MyHandler. How do I resolve this error ?
UPDATE : I resolved this error. There was a problem in the package structure inside the module. However, I am still going back to the original question of configuring log4j with jboss-as-7.1.1.Final.
I was able to configure log4j with jboss. It turns out that you need to add exculsions seperately for each of the sub-deployments inside your main deployment. For example, I have an ear that has jar and war files budled inside. So I added seperate entries for each of the them in the jboss-deployment-structure.xml and it worked.
<sub-deployment name="your-subdeployment.jar">
<exclusions>
<module name="org.apache.log4j"/>
<module name="org.slf4j" />
<module name="org.apache.commons.logging"/>
<module name="org.log4j"/>
<module name="org.jboss.logging"/>
</exclusions>
</sub-deployment>
One thing you didn't indicate was whether or not you added a log4j library in your EAR/lib folder as well. I assume you probably have though or you should be seeing other errors.
I'm not sure whether or not log4j would have a log4j.xml inside a JAR inside your EAR to be honest. I would think that the EAR/META-INF would be a more appropriate place for your log4j configuration file.
There is no real reason to remove the logging subsystem in this case. Also, I'm not trying to convince you to use it, but you could create a custom handler fairly easily to do what you're looking to do. You could base it on the SizeRotatingFileHandler and then just add a suffix on the rename.
Consider configuration in your standalone.xml at
<subsystem xmlns="urn:jboss:domain:logging:1.1">
Here you can set
<console-handler name="CONSOLE">
<level name="DEBUG"/>
...
<logger category="com.myCompany">
<level name="DEBUG"/>
</logger>
...
Means: your logger in you classes which are in package com.myCompany should log from level DEBUG.

BlazeDS: where is log file stored on server?

If I have the following in my services-config.xml file for setting up BlazeDS log file on a linux server, where does it save the log file? Or, does the output show up by default in Flash Builder 4.6 (e.g. no further info in log file)?
I've been trying to figure this out reading
http://livedocs.adobe.com/blazeds/1/blazeds_devguide/help.html?content=services_logging_3.html
but haven't been able to figure it out. I must be missing something obvious. Any advice appreciated.
<logging>
<target class="flex.messaging.log.ConsoleTarget" level="Error">
<properties>
<prefix>[BlazeDS] </prefix>
<includeDate>true</includeDate>
<includeTime>true</includeTime>
<includeLevel>true</includeLevel>
<includeCategory>true</includeCategory>
</properties>
<filters>
<pattern>Endpoint.*</pattern>
<pattern>Service.*</pattern>
<pattern>Configuration</pattern>
</filters>
</target>
</logging>
Is there a way I can specify a location for the log file to be written?
Taken from the link you provided:
Setting the logging target
By default, the server writes log messages to System.out. In the class
attribute of the target element, you can specify
flex.messaging.log.ConsoleTarget (default) to log messages to the
standard output, or the flex.messaging.log.ServletLogTarget to log
messages to the default logging mechanism for servlets for your
application server.
So you either have to configure logging in your application server (for Tomcat: http://tomcat.apache.org/tomcat-7.0-doc/logging.html) or use something like log4j in your servlet.
services-config.xml should then look something like this:
<target class="flex.messaging.log.ServletLogTarget" level="warn">
<properties>
<prefix>[BlazeDS] </prefix>
<includeDate>true</includeDate>
<includeTime>true</includeTime>
<includeLevel>true</includeLevel>
<includeCategory>true</includeCategory>
</properties>
<filters>
<pattern>Endpoint.*</pattern>
<pattern>Service.*</pattern>
<pattern>Message.*</pattern>
<pattern>DataService.*</pattern>
<pattern>Configuration</pattern>
</filters>
</target>
</logging>
Sidenote: We use log4j and spring-flex, which provides org.springframework.flex.core.CommonsLoggingTarget to handle BlazeDS output.
services-config.xml
<logging>
<target class="org.springframework.flex.core.CommonsLoggingTarget" level="debug">
<properties>
<categoryPrefix>blazeds</categoryPrefix>
</properties>
</target>
</logging>
log4j.properties
log4j.appender.myAppLog=org.apache.log4j.RollingFileAppender
log4j.appender.myAppLog.File=${catalina.base}/logs/myAppLog.txt
log4j.appender.myBlazeLog=org.apache.log4j.RollingFileAppender
log4j.appender.myBlazeLog.File=${catalina.base}/logs/myBlazeLog.txt
log4j.rootLogger=DEBUG,myAppLog
log4j.logger.blazeds=ALL,myBlazeLog

Resources