Related
I am new to Spark and have a fun task in hand where I have to read a bunch of files from S3, which have some xml content in them.
These files are compressed (Gzip) but do not have that extension.
I read some questions on this here where people suggest to extend the default codec in Spark and force a different extension.
But in my case, there is no extension and the files are named in some 16 digit UUID format such as 2c7358ca472ad91057da84adfba.
You can use newAPIHadoopFile (instead of textFile) with a custom/modified TextInputFormat which forces the use of the GzipCodec.
Instead of calling sparkContext.textFile,
// gzip compressed but no .gz extension:
sparkContext.textFile("s3://mybucket/uuid")
we can use the underlying sparkContext.newAPIHadoopFile which allows us to specify how to read the input:
import org.apache.hadoop.mapreduce.lib.input.GzipInputFormatWithoutExtention
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.io.{LongWritable, Text}
sparkContext
.newAPIHadoopFile(
"s3://mybucket/uuid",
classOf[GzipInputFormatWithoutExtention], // This is our custom reader
classOf[LongWritable],
classOf[Text],
new Configuration(sparkContext.hadoopConfiguration)
)
.map { case (_, text) => text.toString }
The usual way of calling newAPIHadoopFile would be with TextInputFormat. This is the part which wraps how the file is read and where the compression codec is chosen based on the file extension.
Let's call it GzipInputFormatWithoutExtention and implement it as follow as an extension of TextInputFormat (this is a Java file and let's put it in package src/main/java/org/apache/hadoop/mapreduce/lib/input):
package org.apache.hadoop.mapreduce.lib.input;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import com.google.common.base.Charsets;
public class GzipInputFormatWithoutExtention extends TextInputFormat {
public RecordReader<LongWritable, Text> createRecordReader(
InputSplit split,
TaskAttemptContext context
) {
String delimiter =
context.getConfiguration().get("textinputformat.record.delimiter");
byte[] recordDelimiterBytes = null;
if (null != delimiter)
recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
// Here we use our custom `GzipWithoutExtentionLineRecordReader`
// instead of `LineRecordReader`:
return new GzipWithoutExtentionLineRecordReader(recordDelimiterBytes);
}
#Override
protected boolean isSplitable(JobContext context, Path file) {
return false; // gzip isn't a splittable codec (as opposed to bzip2)
}
}
In fact we have to go one level deeper and also replace the default LineRecordReader (Java) with our own (let's call it GzipWithoutExtentionLineRecordReader).
As it's quite difficult to inherit from LineRecordReader, we can copy LineRecordReader (in src/main/java/org/apache/hadoop/mapreduce/lib/input) and slightly modify (and simplify) the initialize(InputSplit genericSplit, TaskAttemptContext context) method by forcing the usage of the Gzip codec:
(the only changes compared to the original LineRecordReader have been given a comment explaining what's happening)
package org.apache.hadoop.mapreduce.lib.input;
import java.io.IOException;
import org.apache.hadoop.io.compress.*;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.Seekable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
#InterfaceAudience.LimitedPrivate({"MapReduce", "Pig"})
#InterfaceStability.Evolving
public class GzipWithoutExtentionLineRecordReader extends RecordReader<LongWritable, Text> {
private static final Logger LOG =
LoggerFactory.getLogger(GzipWithoutExtentionLineRecordReader.class);
public static final String MAX_LINE_LENGTH =
"mapreduce.input.linerecordreader.line.maxlength";
private long start;
private long pos;
private long end;
private SplitLineReader in;
private FSDataInputStream fileIn;
private Seekable filePosition;
private int maxLineLength;
private LongWritable key;
private Text value;
private boolean isCompressedInput;
private Decompressor decompressor;
private byte[] recordDelimiterBytes;
public GzipWithoutExtentionLineRecordReader(byte[] recordDelimiter) {
this.recordDelimiterBytes = recordDelimiter;
}
public void initialize(
InputSplit genericSplit,
TaskAttemptContext context
) throws IOException {
FileSplit split = (FileSplit) genericSplit;
Configuration job = context.getConfiguration();
this.maxLineLength = job.getInt(MAX_LINE_LENGTH, Integer.MAX_VALUE);
start = split.getStart();
end = start + split.getLength();
final Path file = split.getPath();
// open the file and seek to the start of the split
final FileSystem fs = file.getFileSystem(job);
fileIn = fs.open(file);
// This line is modified to force the use of the GzipCodec:
// CompressionCodec codec = new CompressionCodecFactory(job).getCodec(file);
CompressionCodecFactory ccf = new CompressionCodecFactory(job);
CompressionCodec codec = ccf.getCodecByClassName(GzipCodec.class.getName());
// This part has been extremely simplified as we don't have to handle
// all the different codecs:
isCompressedInput = true;
decompressor = CodecPool.getDecompressor(codec);
if (start != 0) {
throw new IOException(
"Cannot seek in " + codec.getClass().getSimpleName() + " compressed stream"
);
}
in = new SplitLineReader(
codec.createInputStream(fileIn, decompressor), job, this.recordDelimiterBytes
);
filePosition = fileIn;
if (start != 0) {
start += in.readLine(new Text(), 0, maxBytesToConsume(start));
}
this.pos = start;
}
private int maxBytesToConsume(long pos) {
return isCompressedInput
? Integer.MAX_VALUE
: (int) Math.max(Math.min(Integer.MAX_VALUE, end - pos), maxLineLength);
}
private long getFilePosition() throws IOException {
long retVal;
if (isCompressedInput && null != filePosition) {
retVal = filePosition.getPos();
} else {
retVal = pos;
}
return retVal;
}
private int skipUtfByteOrderMark() throws IOException {
int newMaxLineLength = (int) Math.min(3L + (long) maxLineLength,
Integer.MAX_VALUE);
int newSize = in.readLine(value, newMaxLineLength, maxBytesToConsume(pos));
pos += newSize;
int textLength = value.getLength();
byte[] textBytes = value.getBytes();
if ((textLength >= 3) && (textBytes[0] == (byte)0xEF) &&
(textBytes[1] == (byte)0xBB) && (textBytes[2] == (byte)0xBF)) {
LOG.info("Found UTF-8 BOM and skipped it");
textLength -= 3;
newSize -= 3;
if (textLength > 0) {
textBytes = value.copyBytes();
value.set(textBytes, 3, textLength);
} else {
value.clear();
}
}
return newSize;
}
public boolean nextKeyValue() throws IOException {
if (key == null) {
key = new LongWritable();
}
key.set(pos);
if (value == null) {
value = new Text();
}
int newSize = 0;
while (getFilePosition() <= end || in.needAdditionalRecordAfterSplit()) {
if (pos == 0) {
newSize = skipUtfByteOrderMark();
} else {
newSize = in.readLine(value, maxLineLength, maxBytesToConsume(pos));
pos += newSize;
}
if ((newSize == 0) || (newSize < maxLineLength)) {
break;
}
LOG.info("Skipped line of size " + newSize + " at pos " +
(pos - newSize));
}
if (newSize == 0) {
key = null;
value = null;
return false;
} else {
return true;
}
}
#Override
public LongWritable getCurrentKey() {
return key;
}
#Override
public Text getCurrentValue() {
return value;
}
public float getProgress() throws IOException {
if (start == end) {
return 0.0f;
} else {
return Math.min(1.0f, (getFilePosition() - start) / (float)(end - start));
}
}
public synchronized void close() throws IOException {
try {
if (in != null) {
in.close();
}
} finally {
if (decompressor != null) {
CodecPool.returnDecompressor(decompressor);
decompressor = null;
}
}
}
}
I configure apache spark --one master node and two worker node
I running this jar file and dataset following command:
./bin/spark-submit --class dmlab.main.MainDriver --master local[2] PAMAE-Spark.jar USCensus1990.csv 10 4000 5 4 1
This is successfully in job 3. but, error in job 4 is starting.
Please kindly help me.
package dmlab.main;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import org.apache.spark.SparkContext;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFlatMapFunction;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.rdd.RDD;
import org.apache.spark.storage.StorageLevel;
import dmlab.main.Algorithms;
import dmlab.main.FloatPoint;
import dmlab.main.PAMAE;
import scala.Tuple2;
import scala.Tuple2;
public final class PAMAE {
public static String eleDivider = ",";
public static JavaRDD<FloatPoint> readFile(JavaSparkContext sc, String
inputPath, int numOfCores)
{
JavaRDD<String> lines = sc.textFile(inputPath,numOfCores);
JavaRDD<FloatPoint> dataSet = lines.map(new
PAMAE.ParsePoint()).persist(StorageLevel.MEMORY_AND_DISK_SER());
return dataSet;
}
public static List<FloatPoint> PHASE_I(JavaSparkContext sc,
JavaRDD<FloatPoint> dataSet, int numOfClusters, int numOfSampledObjects, int
numOfSamples, int numOfCores)
{
List<FloatPoint> samples = dataSet.takeSample(true,
numOfSampledObjects*numOfClusters*10);
JavaRDD<FloatPoint> sampleSet = sc.parallelize(samples);
List<Tuple2<Integer, List<FloatPoint>>> candidateSet =
sampleSet.mapToPair(new PAMAE.Sampling(numOfClusters))
.groupByKey().mapValues(new PAMAE.PAM(numOfSampledObjects, numOfClusters))
.collect();
JavaRDD<Tuple2<Integer, List<FloatPoint>>> candidateSetRDD =
sc.parallelize(candidateSet).persist(StorageLevel.MEMORY_AND_DISK_SER());
List<Tuple2<Integer, Double>> costList = cluteringError(sc, dataSet,
candidateSetRDD);
List<Tuple2<Integer, List<FloatPoint>>> candidateList =
candidateSetRDD.collect();
int finalKey = -1;
double phaseIError = Double.MAX_VALUE;
for(int i=0; i<costList.size(); i++)
{
if(phaseIError > costList.get(i)._2())
{
phaseIError = costList.get(i)._2();
finalKey = costList.get(i)._1();
}
}
List<FloatPoint> bestSeed = null;
for(int i=0; i<candidateList.size(); i++)
{
if(candidateList.get(i)._1() == finalKey)
bestSeed = candidateList.get(i)._2();
}
System.out.println("PHASE I CLUSTERING ERROR : " + phaseIError+"\n");
candidateSetRDD.unpersist();
return bestSeed;
}
public static List<FloatPoint> PHASE_II(JavaSparkContext sc,
JavaRDD<FloatPoint> dataSet, List<FloatPoint> bestSeed, int numOfClusters,
int numOfSampledObjects, int numOfSamples, int numOfCores)
{
JavaRDD<FloatPoint> bestSeedRDD = sc.parallelize(bestSeed);
bestSeedRDD.persist(StorageLevel.MEMORY_AND_DISK_SER());
List<Tuple2<Integer, List<FloatPoint>>> temp = dataSet.mapToPair(new
PAMAE.AssignPoint(numOfCores))
.groupByKey().mapValues(new PAMAE.ModifiedWeiszfeld(bestSeedRDD)).collect();
ArrayList<FloatPoint> finalMedoids = new ArrayList<FloatPoint>();
for(int i=0; i<numOfClusters; i++)
{
int index = -1;
double minCost = Double.MAX_VALUE;
for(int j=0; j<numOfCores; j++)
{
if(minCost > temp.get(j)._2().get(i).getCost())
{ index = j;
minCost = temp.get(j)._2().get(i).getCost();}
}
finalMedoids.add(temp.get(index)._2().get(i));
temp.get(index)._2().get(i).toString());
}
bestSeedRDD.unpersist();
return finalMedoids;
}
public static List<Tuple2<Integer, Double>> cluteringError(JavaSparkContext
sc, JavaRDD<FloatPoint> dataSet, JavaRDD<Tuple2<Integer, List<FloatPoint>>>
medoids)
{
List<Tuple2<Integer, Double>> Error = dataSet.flatMapToPair(new
PAMAE.CostCaculator(medoids)).
reduceByKey(new
Function2<Double, Double, Double>() {
call(Double x, Double y) throws Exception {
return x+y;
}
}).collect();
return Error;
}
public static double FinalError(JavaSparkContext sc, JavaRDD<FloatPoint>
dataSet, List<FloatPoint> finalMedoids)
{
Tuple2<Integer,List<FloatPoint>> medoids = new Tuple2<Integer,
List<FloatPoint>>(1,finalMedoids);
List<Tuple2<Integer,List<FloatPoint>>> temp = new
ArrayList<Tuple2<Integer,List<FloatPoint>>>();
temp.add(medoids);
JavaRDD<Tuple2<Integer, List<FloatPoint>>> finalSetRDD =
sc.parallelize(temp).persist(StorageLevel.MEMORY_AND_DISK_SER());
List<Tuple2<Integer, Double>> finalError = cluteringError(sc, dataSet,
finalSetRDD);
return finalError.get(0)._2;
}
public static class ParsePoint implements Function<String, FloatPoint> {
#Override
public FloatPoint call(String line) {
String[] toks = line.toString().split(eleDivider);
FloatPoint pt = new FloatPoint(toks.length,-1);
for(int j=0; j<toks.length; j++)
pt.getValues()[j] = (Float.parseFloat(toks[j]));
return pt;
}
}
public static class Sampling implements PairFunction<FloatPoint, Integer,
FloatPoint> {
private int parallel = 0;
public Sampling(int parallel) {
// TODO Auto-generated constructor stub
this.parallel = parallel;
}
#Override
public Tuple2<Integer, FloatPoint> call(FloatPoint value) throws
Exception {
int key = (int)(Math.random()*parallel);
value.setKey(key);
return new Tuple2(key,value);
}
}
public static class PAM implements Function<Iterable<FloatPoint>,
List<FloatPoint>>{
private int sampleNumber = 0;
private int K = 0;
public PAM( int sampleNumber, int K) {
// TODO Auto-generated constructor stub
this.sampleNumber = sampleNumber;
this.K = K;
}
#Override
public List<FloatPoint> call(Iterable<FloatPoint> values) throws
Exception {
List<FloatPoint> sampledDataSet = new ArrayList<FloatPoint>();
for(FloatPoint pt : values)
sampledDataSet.add(pt);
HashSet<Integer> sampleIndex = new HashSet<Integer>();
int dataSize = sampledDataSet.size();
while(sampleIndex.size() != sampleNumber)
sampleIndex.add((int)(Math.random()*dataSize));
List<FloatPoint> samplePoints = new ArrayList<FloatPoint>();
for(Integer sample : sampleIndex)
samplePoints.add(sampledDataSet.get(sample));
sampledDataSet.clear();
float[][] preCalcResult = Algorithms.PreCalculate(samplePoints);
List<FloatPoint> sampleInit =
Algorithms.chooseInitialMedoids(samplePoints, K, preCalcResult, 0.5f);
Algorithms.PAM(samplePoints, sampleInit, preCalcResult);
return sampleInit;
}
}
public static class CostCaculator implements PairFlatMapFunction<FloatPoint,
Integer, Double>{
List<Tuple2<Integer, List<FloatPoint>>> Candidates;
public CostCaculator(JavaRDD<Tuple2<Integer, List<FloatPoint>>>
candidateSetRDD) {
Candidates = candidateSetRDD.collect();
}
#Override
public Iterable<Tuple2<Integer, Double>> call(FloatPoint pt)
throws Exception {
List<Tuple2<Integer, Double>> output = new ArrayList<Tuple2<Integer,
Double>>();
for(int i=0; i<Candidates.size(); i++)
{
int key = Candidates.get(i)._1();
List<FloatPoint> pts = Candidates.get(i)._2();
double min = Double.MAX_VALUE;
for(int j=0; j<pts.size(); j++)
{
double newCost = FunctionSet.distance(pt, pts.get(j));
if(min > newCost)
min = newCost;
}
output.add(new Tuple2<Integer, Double>(key, min));
}
return output;
}
}
public static class AssignPoint implements PairFunction<FloatPoint, Integer,
FloatPoint> {
private int coreNum=-1;
public AssignPoint(int coreNum) {
// TODO Auto-generated constructor stub
this.coreNum = coreNum;
}
#Override
public Tuple2<Integer, FloatPoint> call(FloatPoint value) throws
Exception {
int key = (int)(Math.random()*coreNum);
value.setKey(key);
return new Tuple2(key,value);
}
}
public static class ModifiedWeiszfeld implements
Function<Iterable<FloatPoint>, List<FloatPoint>>{
private List<FloatPoint> medoids = null;
public ModifiedWeiszfeld(JavaRDD<FloatPoint> bestSeed) {
this.medoids = bestSeed.collect();
}
#Override
public List<FloatPoint> call(Iterable<FloatPoint> values) throws
Exception {
List<FloatPoint> localDataSet = new ArrayList<FloatPoint>();
for(FloatPoint pt : values)
localDataSet.add(pt);
for(FloatPoint pt: medoids)
localDataSet.add(pt);
List<FloatPoint> finalMedoid = null;
finalMedoid = Algorithms.refinement(localDataSet, medoids, 0.01);
return finalMedoid;
}
}
}
i run this source code. this work has finished job 3. but , job 4 is starting .error is occuring.
I use gml (3.1.1) XSDs in XSD for my application. I want to download all gml XSDs in version 3.1.1 in for example zip file. In other words: base xsd is here and I want to download this XSD with all imports in zip file or something like zip file. Is there any application which supports that?
I've found this downloader but it doesn't works for me (I think that this application is not supporting relative paths in imports which occurs in gml.xsd 3.1.1). Any ideas?
QTAssistant's XSR (I am associated with it) has an easy to use function that allows one to automatically import and refactor XSD content as local files from all sorts of sources. In the process it'll update schema location references, etc.
I've made a simple screen capture of the steps involved in achieving a task like this which should demonstrate its usability.
Based on the solution of mschwehl, I made an improved class to achieve the fetch. It suited well with the question. See https://github.com/mfalaize/schema-fetcher
You can achieve this using SOAP UI.
Follow these steps :
Create a project using the WSDL.
Choose your interface and open in interface viewer.
Navigate to the tab 'WSDL Content'.
Use the last icon under the tab 'WSDL Content' : 'Export the entire WSDL and included/imported files to a local directory'.
select the folder where you want the XSDs to be exported to.
Note: SOAPUI will remove all relative paths and will save all XSDs to the same folder.
I have written a simple java-main that does the job and change to relative url's
package dl;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.Authenticator;
import java.net.PasswordAuthentication;
import java.net.URI;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.Set;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class SchemaPersister {
private static final String EXPORT_FILESYSTEM_ROOT = "C:/export/xsd";
// some caching of the http-responses
private static Map<String,String> _httpContentCache = new HashMap<String,String>();
public static void main(String[] args) {
try {
new SchemaPersister().doIt();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private void doIt() throws Exception {
// // if you need an inouse-Proxy
// final String authUser = "xxxxx";
// final String authPassword = "xxxx"
//
// System.setProperty("http.proxyHost", "xxxxx");
// System.setProperty("http.proxyPort", "xxxx");
// System.setProperty("http.proxyUser", authUser);
// System.setProperty("http.proxyPassword", authPassword);
//
// Authenticator.setDefault(
// new Authenticator() {
// public PasswordAuthentication getPasswordAuthentication() {
// return new PasswordAuthentication(authUser, authPassword.toCharArray());
// }
// }
// );
//
Set <SchemaElement> allElements = new HashSet<SchemaElement>() ;
// URL url = new URL("file:/C:/xauslaender-nachrichten-administration.xsd");
URL url = new URL("http://www.osci.de/xauslaender141/xauslaender-nachrichten-bamf-abh.xsd");
allElements.add ( new SchemaElement(url));
for (SchemaElement e: allElements) {
System.out.println("processing " + e);
e.doAll();
}
System.out.println("done!");
}
class SchemaElement {
private URL _url;
private String _content;
public List <SchemaElement> _imports ;
public List <SchemaElement> _includes ;
public SchemaElement(URL url) {
this._url = url;
}
public void checkIncludesAndImportsRecursive() throws Exception {
InputStream in = new ByteArrayInputStream(downloadContent() .getBytes("UTF-8"));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(in);
List<Node> includeNodeList = null;
List<Node> importNodeList = null;
includeNodeList = getXpathAttribute(doc,"/*[local-name()='schema']/*[local-name()='include']");
_includes = new ArrayList <SchemaElement> ();
for ( Node element: includeNodeList) {
Node sl = element.getAttributes().getNamedItem("schemaLocation");
if (sl == null) {
System.out.println(_url + " defines one import but no schemaLocation");
continue;
}
String asStringAttribute = sl.getNodeValue();
URL url = buildUrl(asStringAttribute,_url);
SchemaElement tmp = new SchemaElement(url);
tmp.setSchemaLocation(asStringAttribute);
tmp.checkIncludesAndImportsRecursive();
_includes.add(tmp);
}
importNodeList = getXpathAttribute(doc,"/*[local-name()='schema']/*[local-name()='import']");
_imports = new ArrayList <SchemaElement> ();
for ( Node element: importNodeList) {
Node sl = element.getAttributes().getNamedItem("schemaLocation");
if (sl == null) {
System.out.println(_url + " defines one import but no schemaLocation");
continue;
}
String asStringAttribute = sl.getNodeValue();
URL url = buildUrl(asStringAttribute,_url);
SchemaElement tmp = new SchemaElement(url);
tmp.setSchemaLocation(asStringAttribute);
tmp.checkIncludesAndImportsRecursive();
_imports.add(tmp);
}
in.close();
}
private String schemaLocation;
private void setSchemaLocation(String schemaLocation) {
this.schemaLocation = schemaLocation;
}
// http://stackoverflow.com/questions/10159186/how-to-get-parent-url-in-java
private URL buildUrl(String asStringAttribute, URL parent) throws Exception {
if (asStringAttribute.startsWith("http")) {
return new URL(asStringAttribute);
}
if (asStringAttribute.startsWith("file")) {
return new URL(asStringAttribute);
}
// relative URL
URI parentUri = parent.toURI().getPath().endsWith("/") ? parent.toURI().resolve("..") : parent.toURI().resolve(".");
return new URL(parentUri.toURL().toString() + asStringAttribute );
}
public void doAll() throws Exception {
System.out.println("READ ELEMENTS");
checkIncludesAndImportsRecursive();
System.out.println("PRINTING DEPENDENCYS");
printRecursive(0);
System.out.println("GENERATE OUTPUT");
patchAndPersistRecursive(0);
}
public void patchAndPersistRecursive(int level) throws Exception {
File f = new File(EXPORT_FILESYSTEM_ROOT + File.separator + this.getXDSName() );
System.out.println("FILENAME: " + f.getAbsolutePath());
if (_imports.size() > 0) {
for (int i = 0; i < level; i++) {
System.out.print(" ");
}
System.out.println("IMPORTS");
for (SchemaElement kid : _imports) {
kid.patchAndPersistRecursive(level+1);
}
}
if (_includes.size() > 0) {
for (int i = 0; i < level; i++) {
System.out.print(" ");
}
System.out.println("INCLUDES");
for (SchemaElement kid : _includes) {
kid.patchAndPersistRecursive(level+1);
}
}
String contentTemp = downloadContent();
for (SchemaElement i : _imports ) {
if (i.isHTTP()) {
contentTemp = contentTemp.replace(
"<xs:import schemaLocation=\"" + i.getSchemaLocation() ,
"<xs:import schemaLocation=\"" + i.getXDSName() );
}
}
for (SchemaElement i : _includes ) {
if (i.isHTTP()) {
contentTemp = contentTemp.replace(
"<xs:include schemaLocation=\"" + i.getSchemaLocation(),
"<xs:include schemaLocation=\"" + i.getXDSName() );
}
}
FileOutputStream fos = new FileOutputStream(f);
fos.write(contentTemp.getBytes("UTF-8"));
fos.close();
System.out.println("File written: " + f.getAbsolutePath() );
}
public void printRecursive(int level) {
for (int i = 0; i < level; i++) {
System.out.print(" ");
}
System.out.println(_url.toString());
if (this._imports.size() > 0) {
for (int i = 0; i < level; i++) {
System.out.print(" ");
}
System.out.println("IMPORTS");
for (SchemaElement kid : this._imports) {
kid.printRecursive(level+1);
}
}
if (this._includes.size() > 0) {
for (int i = 0; i < level; i++) {
System.out.print(" ");
}
System.out.println("INCLUDES");
for (SchemaElement kid : this._includes) {
kid.printRecursive(level+1);
}
}
}
String getSchemaLocation() {
return schemaLocation;
}
/**
* removes html:// and replaces / with _
* #return
*/
private String getXDSName() {
String tmp = schemaLocation;
// Root on local File-System -- just grap the last part of it
if (tmp == null) {
tmp = _url.toString().replaceFirst(".*/([^/?]+).*", "$1");
}
if ( isHTTP() ) {
tmp = tmp.replace("http://", "");
tmp = tmp.replace("/", "_");
} else {
tmp = tmp.replace("/", "_");
tmp = tmp.replace("\\", "_");
}
return tmp;
}
private boolean isHTTP() {
return _url.getProtocol().startsWith("http");
}
private String downloadContent() throws Exception {
if (_content == null) {
System.out.println("reading content from " + _url.toString());
if (_httpContentCache.containsKey(_url.toString())) {
this._content = _httpContentCache.get(_url.toString());
System.out.println("Cache hit! " + _url.toString());
} else {
System.out.println("Download " + _url.toString());
Scanner scan = new Scanner(_url.openStream(), "UTF-8");
if (isHTTP()) {
this._content = scan.useDelimiter("\\A").next();
} else {
this._content = scan.useDelimiter("\\Z").next();
}
scan.close();
if (this._content != null) {
_httpContentCache.put(_url.toString(), this._content);
}
}
}
if (_content == null) {
throw new NullPointerException("Content of " + _url.toString() + "is null ");
}
return _content;
}
private List<Node> getXpathAttribute(Document doc, String path) throws Exception {
List <Node> returnList = new ArrayList <Node> ();
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
{
XPathExpression expr = xpath.compile(path);
NodeList nodeList = (NodeList) expr.evaluate(doc, XPathConstants.NODESET );
for (int i = 0 ; i < nodeList.getLength(); i++) {
Node n = nodeList.item(i);
returnList.add(n);
}
}
return returnList;
}
#Override
public String toString() {
if (_url != null) {
return _url.toString();
}
return super.toString();
}
}
}
I created a python tool to recursively download XSDs with relative paths in import tags (eg: <import schemaLocation="../../../../abc)
https://github.com/n-a-t-e/xsd_download
After downloading the schema you can use xmllint to validate an XML document
I am using org.apache.xmlbeans.impl.tool.SchemaResourceManager from the xmlbeans project. This class is quick and easy to use.
for example:
SchemaResourceManager manager = new SchemaResourceManager(new File(dir));
manager.process(schemaUris, emptyArray(), false, true, true);
manager.writeCache();
This class has a main method that documents the different options available.
I have a composite column (Int32Type,BytesType,AsciiType) that I need to read its value (based on criteria), modify it and save it back (something like manual counter column).
The composite column that I'm querying might exist or it may not.
What is the best way to do that in Hector?
I cannot vouch the following solution is the best but it does the basic functionality like creating composite columns. It basically does reading and writing which essentially inline to "I need to read its value (based on criteria), modify it and save it back (something like manual counter column)." . But I think with this sample codes, it should be able to serve as a basic and improve here and there and so that it become the best. :-) Will test it thoroughly when there is free time. With that said, the following is my suggestion.
package com.hector.dataTypes;
import java.util.Iterator;
import me.prettyprint.cassandra.serializers.ByteBufferSerializer;
import me.prettyprint.cassandra.serializers.CompositeSerializer;
import me.prettyprint.cassandra.serializers.IntegerSerializer;
import me.prettyprint.cassandra.serializers.StringSerializer;
import me.prettyprint.cassandra.service.CassandraHostConfigurator;
import me.prettyprint.cassandra.service.ColumnSliceIterator;
import me.prettyprint.cassandra.service.ThriftCluster;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.beans.AbstractComposite.ComponentEquality;
import me.prettyprint.hector.api.beans.ColumnSlice;
import me.prettyprint.hector.api.beans.Composite;
import me.prettyprint.hector.api.beans.HColumn;
import me.prettyprint.hector.api.ddl.ColumnFamilyDefinition;
import me.prettyprint.hector.api.ddl.ColumnType;
import me.prettyprint.hector.api.ddl.ComparatorType;
import me.prettyprint.hector.api.exceptions.HectorException;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;
import me.prettyprint.hector.api.query.QueryResult;
import me.prettyprint.hector.api.query.SliceQuery;
import org.apache.cassandra.db.marshal.Int32Type;
import org.apache.cassandra.utils.ByteBufferUtil;
import com.google.common.base.Joiner;
/**
*
* #author jasonw
*
*/
public class CompositeExample
{
private String m_node;
private String m_keyspace;
private String m_column_family;
private ThriftCluster m_cassandraCluster;
private CassandraHostConfigurator m_cassandraHostConfigurator;
private Mutator<String> mutator;
private SliceQuery<String, Composite, String> sliceQuery;
public CompositeExample(String p_node, String p_keyspace, String p_column_family, String p_cluster)
{
m_node = p_node;
m_keyspace = p_keyspace;
m_column_family = p_column_family;
m_cassandraHostConfigurator = new CassandraHostConfigurator(m_node);
m_cassandraCluster = new ThriftCluster(p_cluster, m_cassandraHostConfigurator);
Cluster cluster = HFactory.getOrCreateCluster(p_cluster, m_cassandraHostConfigurator);
Keyspace keyspace = HFactory.createKeyspace(m_keyspace, cluster);
mutator = HFactory.createMutator(keyspace, StringSerializer.get());
sliceQuery = HFactory.createSliceQuery(keyspace, StringSerializer.get(), CompositeSerializer.get(), StringSerializer.get());
}
public boolean createCompositeColumn(String... p_new_columns)
{
try
{
ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition(m_keyspace, m_column_family, ComparatorType.COMPOSITETYPE);
cfDef.setColumnType(ColumnType.STANDARD);
cfDef.setComparatorTypeAlias("(".concat(Joiner.on(",").join(p_new_columns)).concat(")"));
cfDef.setKeyValidationClass("UTF8Type");
cfDef.setDefaultValidationClass("UTF8Type");
m_cassandraCluster.addColumnFamily(cfDef, true);
return true;
}
catch (HectorException e)
{
e.printStackTrace();
}
return false;
}
public boolean saveColumn(String p_field_one, String p_field_two, String p_field_three)
{
try
{
Composite c = new Composite();
c.addComponent(Int32Type.instance.fromString(p_field_one), ByteBufferSerializer.get());
c.addComponent(ByteBufferUtil.bytes(p_field_two), ByteBufferSerializer.get());
c.addComponent(p_field_three, StringSerializer.get());
HColumn<Composite, String> col = HFactory.createColumn(c, "composite_value", CompositeSerializer.get(), StringSerializer.get());
mutator.addInsertion("key", m_column_family, col);
mutator.execute();
return true;
}
catch (HectorException e)
{
e.printStackTrace();
}
return false;
}
public boolean readColumn(String p_key, int p_column_number, ComponentEquality p_equality, int p_value)
{
if (p_column_number < 0 || p_column_number > 2)
{
return false;
}
try
{
sliceQuery.setColumnFamily(m_column_family);
sliceQuery.setKey(p_key);
Composite start = new Composite();
start.addComponent(0, p_value, p_equality);
Composite end = new Composite();
end.addComponent(0, p_value, ComponentEquality.GREATER_THAN_EQUAL);
sliceQuery.setRange(start, end, false, 1000);
QueryResult<ColumnSlice<Composite, String>> qr = sliceQuery.execute();
System.out.println("size = " + qr.get().getColumns().size());
Iterator<HColumn<Composite, String>> iter = qr.get().getColumns().iterator();
while (iter.hasNext())
{
HColumn<Composite, String> column = iter.next();
System.out.print(column.getName().get(0, IntegerSerializer.get()));
System.out.print(":");
System.out.print(column.getName().get(1, StringSerializer.get()));
System.out.print(":");
System.out.print(column.getName().get(2, StringSerializer.get()));
System.out.println("=" + column.getValue());
}
return true;
}
catch (HectorException e)
{
e.printStackTrace();
}
catch (Exception why)
{
why.printStackTrace();
}
return false;
}
public static void main(String[] args)
{
boolean isSuccess = false;
String node_ip = "192.168.0.1";
String keyspace_name = "mykeyspace";
String column_family_name = "compositecf";
String cluster_name = "Test Cluster";
CompositeExample test1 = new CompositeExample(node_ip, keyspace_name, column_family_name, cluster_name);
isSuccess = test1.createCompositeColumn("Int32Type", "BytesType", "AsciiType");
if (!isSuccess)
{
System.err.println("failed to create cf");
System.exit(-1);
}
isSuccess = test1.saveColumn("1027", "blablabla", "this is ascii field");
if (!isSuccess)
{
System.err.println("failed to write");
System.exit(-1);
}
isSuccess = test1.readColumn("key", 0, ComponentEquality.EQUAL, 1027);
if (!isSuccess)
{
System.err.println("failed to read");
System.exit(-1);
}
}
}
Composite col = new Composite(yourInt, yourBytes, yourString);
ColumnSlice<Composite, valueType> result = HFactory.createSliceQuery(keyspace, keySerializer, compositeSerializer, intSerializer)
.setColumnFamily(columnFamily)
.setKey(key)
.setRange(col, col, false, 1)
.execute()
.get();
if (result.getColumns().isEmpty()) {
// do whatever you need to do if there's no value
} else {
int value = result.getColumns().get(0).getValue();
int newValue = //some modification to value
Mutator<keyType> mutator = HFactory.createMutator(keyspace, keySerializer);
HColumn<Composite, int> column = HFactory.createColumn(col, newValue, CompositeSerializer, intSerializer);
mutator.addInsertion(key, columnFamily, column);
mutator.execute();
}
i am having two cell phones and i want to exchange file between these two.
Device A invoke java app, it will scan available bluetooth device in range, show them into list and user can select one device and click send.
i have written below code, it is not working.
package hello;
import java.io.*;
import java.util.Vector;
import javax.bluetooth.*;
import javax.microedition.io.*;
import javax.microedition.io.StreamConnection.*;
import javax.microedition.lcdui.*;
import javax.microedition.midlet.MIDlet;
import javax.obex.*;
import javax.obex.ResponseCodes;
public class MyMidlet extends MIDlet implements CommandListener, DiscoveryListener
{
public Command cmdSend;
public Command cmdScan;
public TextBox myText;
public List devList;
public Form myForm;
private LocalDevice localDev;
private DiscoveryAgent dAgent;
private ServiceRecord servRecord;
private Vector myVector;
private ClientSession connection = null;
private String url = null;
private Operation op = null;
private boolean cancelInvoked = false;
public MyMidlet()
{
cmdSend = new Command("Send", 2, 0);
cmdScan = new Command("Scan", 5, 0);
}
public void startApp()
{
if(myText == null)
{
myText = new TextBox("Dummy Text", "Hello", 10, 0);
myText.addCommand(cmdScan);
myText.setCommandListener(this);
Display.getDisplay(this).setCurrent(myText);
}
}
public void pauseApp(){}
public void destroyApp(boolean flag) { }
public void commandAction(Command command, Displayable displayable)
{
if(command == cmdScan)
{
if(myForm == null) { myForm = new Form("Scanning"); }
else {
for(int i = 0; i < myForm.size(); i++) myForm.delete(i);
}
myForm.append("Scanning for bluetooth devices..");
Display.getDisplay(this).setCurrent(myForm);
if(devList == null)
{
devList = new List("Devices", 3);
devList.addCommand(cmdSend);
devList.setCommandListener(this);
} else
{
for(int j = 0; j < devList.size(); j++) devList.delete(j);
}
if(myVector == null) myVector = new Vector();
else myVector.removeAllElements();
try
{
if(localDev == null)
{
localDev = LocalDevice.getLocalDevice();
localDev.setDiscoverable(0x9e8b33);
dAgent = localDev.getDiscoveryAgent();
}
dAgent.startInquiry(0x9e8b33, this);
}
catch(BluetoothStateException bluetoothstateexception)
{
myForm.append("Please check your bluetooth is turn-on");
}
}
if(command == cmdSend)
{
myForm.setTitle("Sending");
for(int k = 0; k < myForm.size(); k++) myForm.delete(k);
myForm.append("Sending application..");
Display.getDisplay(this).setCurrent(myForm);
try
{
RemoteDevice remotedevice = (RemoteDevice)myVector.elementAt(devList.getSelectedIndex());
dAgent.searchServices(null, new UUID[] {new UUID(4358L)}, remotedevice, this);
return;
}
catch(BluetoothStateException bluetoothstateexception1)
{
myForm.append("could not open bluetooth: " + bluetoothstateexception1.toString());
}
}
}
public void deviceDiscovered(RemoteDevice remotedevice, DeviceClass deviceclass)
{
try
{
devList.append(remotedevice.getFriendlyName(false), null);
}
catch(IOException _ex)
{
devList.append(remotedevice.getBluetoothAddress(), null);
}
myVector.addElement(remotedevice);
}
public void servicesDiscovered(int i, ServiceRecord aservicerecord[])
{
servRecord = aservicerecord[0];
}
public void serviceSearchCompleted(int i, int j)
{
if(j != 1) myForm.append("service search not completed: " + j);
try
{
byte[] fileContent = "Raxit Sheth -98922 38248".getBytes();
String s=servRecord.getConnectionURL(0, false);
myForm.append("Debug 0");
connection = (ClientSession) Connector.open(s);
myForm.append("Debug1");
HeaderSet headerSet = connection.connect(null);
myForm.append("Debug1.1");
headerSet.setHeader(HeaderSet.NAME, "a.txt");
headerSet.setHeader(HeaderSet.TYPE, "text/plain");
headerSet.setHeader(HeaderSet.LENGTH, new Long(fileContent.length));
myForm.append("Debug1.2");
//op = connection.put(headerSet); throwing java.lang.IllegalArgument.Exception
op = connection.put(null);
myForm.append("Debug1.2.1");
op.sendHeaders(headerSet);
myForm.append("Debug1.3");
OutputStream out = op.openOutputStream();
myForm.append("Debug2");
//sending data
myForm.append("Debug3");
out.write(fileContent);
myForm.append("Debug4");
//int responseCode = op.getResponseCode();
//myForm.append("resp code="+responseCode);
out.close();
op.close();
connection.close();
myForm.append("Done");
//i was expecting this will send a.txt file with content Raxit Sheth -98922 38248
//to remote device's inbox/gallery/bluetooth folder
}
catch(Exception ex) { myForm.append(ex.toString()); }
}
public void inquiryCompleted(int i)
{
Display.getDisplay(this).setCurrent(devList);
}
}
Your problem is almost certainly the fact that you're starting your bluetooth scanning in the commandAction() method. This is a system lifecycle method, and needs to return quickly. Attempting to perform a blocking operations (such as bluetooth scanning) in this thread could tie up resources which the handset needs to do other things such as the actual scanning!
Refactor so that the scanning is performed in a new thread, then try again.