How to pass string as value in mapper? - string

I am trying to pass a string as value in the mapper, but getting error that it is not Writable. How to resolve?
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String TempString = value.toString();
String[] SingleRecord = TempString.split("\t");
//using Integer.parseInt to calculate profit
int Amount = Integer.parseInt(SingleRecord[7]);
int Asset = Integer.parseInt(SingleRecord[8]);
int SalesPrice = Integer.parseInt(SingleRecord[9]);
int Profit = Amount*(SalesPrice-Asset);
String ValueProfit = String.valueOf(Profit);
String ValueOne = String.valueOf(one);
custID.set(SingleRecord[2]);
data.set(ValueOne + ValueProfit);
context.write(custID, data);
}

Yahoo's tutorial says :
Objects which can be marshaled to or from files and across the network must obey a particular interface, called Writable, which allows Hadoop to read and write the data in a serialized form for transmission.
From Cloudera site :
The key and value classes must be serializable by the framework and hence must implement the Writable interface. Additionally, the key classes must implement the WritableComparable interface to facilitate sorting.
So you need an implementation of Writable to write it as a value in the context. Hadoop ships with a few stock classes such as IntWritable. The String counterpart you are looking for is the Text class. It can be used as :
context.write(custID, new Text(data));
OR
Text outValue = new Text();
val.set(data);
context.write(custID, outValue)
I case, you need specialized functionality in the value class, you may implement Writable (not a big deal after all). However seems like Text is just enough for you.

you havent set data in map function according to import text in above,and TextWritable is wrong just use Text as well.

Related

Map to hold multiple sets of key and values

I have a map1 which holds the information as
[40256942,6] [60246792,5]
Now that I want to prepare a map2 that holds information such as
itemNo, 40256942
qty, 6
itemNo, 60246792
qty, 5
to prepare final information as json
“partialArticlesInfo”: [{itemNo:”40256942”, availQty:”6”}, {itemNo:”60246792”, availQty:”5”}]
I am trying to iterate map1 to retrieve values and set that against the key. But I am getting only one entry which is last one. Is there any way , I get the new map with entries such as mentioned above
Map<String, String> partialArticlesInfo = new HashMap<String,String>();
Map<String, String> partialArticlesTempMap = null;
for (Map.Entry<String,String> entry : partialStockArticlesQtyMap.entrySet())
{
partialArticlesTempMap = new HashMap<String,String>();
partialArticlesTempMap.put("itemNo",entry.getKey());
partialArticlesTempMap.put("availQty",entry.getValue());
partialArticlesInfo.putAll(partialArticlesTempMap);
}
In Java (I'm assuming you're using Java, in the future it would be helpful to specify that) and every other language I know of, a map holds mappings between keys and values. Only one mapping is allowed per key. In your "map2", the keys are "itemNo" and "availQty". So what is happening is that your for loop sets the values for the first entry, and then is overwriting them with the data from the second entry, which is why that is the only one you see. Look at Java - Map and Map - Java 8 for more info.
I don't understand why you are trying to put the data into a map, you could just put it straight into JSON with something like this:
JSONArray partialArticlesInfo = new JSONArray();
for (Map.Entry<String,String> entry : partialStockArticlesQtyMap.entrySet()) {
JSONObject stockEntry = new JSONObject();
stockEntry.put("itemNo", entry.getKey());
stockEntry.put("availQty", entry.getValue());
partialArticlesInfo.put(stockEntry);
}
JSONObject root = new JSONObject();
root.put("partialArticlesInfo",partialArticlesInfo);
This will take "map1" (partialStockArticlesQtyMap in your code) and create a JSON object exactly like your example - no need to have map2 as an intermediate step. It loops over each entry in map1, creates a JSON object representing it and adds it to a JSON array, which is finally added to a root JSON object as "partialArticlesInfo".
The exact code may be slightly different depending on which JSON library you are using - check the docs for the specifics.
I agree with Brendan. Another solution would be otherwise to store in the Set or List objects like the following.
class Item {
Long itemNo;
int quantity;
public int hashCode() {
Long.hashCode(itemNo) + Integer.hashCode(quantity);
}
public int equals(Object other) {
other instanceOf Item && other.itemNo == this.itemNo && other.quantity = this.quantity;
}
}
}
then you can use the JsonArray method described by him to get the Json string in output
This means that adding new variables to the object won't require any more effort to generate the Json

NodaTime UnparsableValueException due to usage of "Z" in pattern

I am exchanging JSON messages between Java and C# (and vice-versa).
In Java I use a java.time.Instant (JSR-310) to represent a point in time on the global timeline. In order to create a human readable date/time string in JSON, I convert my Instant as follows:
private static final DateTimeFormatter FORMATTER = ofPattern("yyyy-MM-dd'T'HH:mm:ssZ").withZone(ZoneId.systemDefault());
which generates the following output:
2017-04-28T19:54:44-0500
Now, on the message consumer side of things (C#) I wrote a custom Newtonsoft.Json.JsonConverter, which extends the abstract JsonCreationConvert class that contains the following overridden ReadJson() method:
public override object ReadJson(JsonReader reader, Type objectType, object existingValue,
JsonSerializer serializer)
{
if (reader.TokenType == JsonToken.Null)
{
return null;
}
if (reader.TokenType == JsonToken.StartArray)
{
return JToken.Load(reader).ToObject<string[]>();
}
reader.DateParseHandling = DateParseHandling.None; // read NodaTime string Instant as is
serializer.Converters.Add(NodaConverters.InstantConverter);
// Load JObject from stream
var jObject = JObject.Load(reader);
// Create target object based on JObject
T target = Create(objectType, jObject);
// Populate the object properties
var writer = new StringWriter();
serializer.Serialize(writer, jObject);
using (var newReader = new JsonTextReader(new StringReader(writer.ToString())))
{
newReader.Culture = reader.Culture;
newReader.DateParseHandling = reader.DateParseHandling;
newReader.DateTimeZoneHandling = reader.DateTimeZoneHandling;
newReader.FloatParseHandling = reader.FloatParseHandling;
serializer.Populate(newReader, target);
}
return target;
}
Create() is an abstract method.
When I now convert this JSON string into a NodaTime.Instant (v2.0.0) by calling:
InstantPattern.General.Parse(creationTime).Value;
I get this exception:
NodaTime.Text.UnparsableValueException: The value string does not match a quoted string in the pattern. Value being parsed: '2017-04-28T19:54:44^-0500'. (^ indicates error position.)
If I pass a text literal "Z" (so no outputted offset "-0500" and Z is interpreted as 0 offset) the NodaTime.Serialization.JsonNet.NodaConverters.InstantConverter correctly reads without throwing an exception.
Looking into the GeneralPatternImpl I see:
internal static readonly InstantPattern GeneralPatternImpl = InstantPattern.CreateWithInvariantCulture("uuuu-MM-ddTHH:mm:ss'Z'");
Why does an InstantConverter require the offset to be a text literal? Is this happening because an Instant is agnostic to an offset? If this is the case, then why doesn't the InstantConverter just ignore the offset instead of throwing an exception? Do I need to write a custom converter to get around this problem?
That's like asking for 2017-04-28T19:54:44 to be parsed as a LocalDate - there's extra information that we'd silently be dropping. Fundamentally, your conversion from Instant to String in Java is "adding" information which isn't really present in the original instant. What you're ending up with is really an OffsetDateTime, not an Instant - it has more information than an Instant does.
You should decide what information you really care about. If you only care about the instant in time, then change your Java serialization to use UTC, and it should end up with Z in the serialized form, and all will be well. This is what I suggest you do - propagating irrelevant information is misleading, IMO.
If you actually care about the offset in the system default time zone, which your call to .withZone(ZoneId.systemDefault()) implies you do, then you should parse it as an OffsetDateTime on the .NET side of things. You can convert that to an Instant afterwards if you want to (just call ToInstant()).

Cassandra BoundStatement with Multiple Parameters and Multi-Partition Query

After reading "Asynchronous queries with the Java driver" article in the datastax blog, I was trying to implement a solution similar to the one in the section called - 'Case study: multi-partition query, a.k.a. “client-side SELECT...IN“'.
I currently have code that looks something like this:
public Future<List<ResultSet>> executeMultipleAsync(final BoundStatement statement, final Object... partitionKeys) {
List<Future<ResultSet>> futures = Lists.newArrayListWithExpectedSize(partitionKeys.length);
for (Object partitionKey : partitionKeys) {
Statement bs = statement.bind(partitionKey);
futures.add(executeWithRetry(bs));
}
return Futures.successfulAsList(futures);
}
But, I'd like to improve on that. In the cql query this BoundStatement holds, I'd like to have something that looks like this:
SELECT * FROM <column_family_name> WHERE <param1> = :p1_name AND param2 = :p2_name AND <partiotion_key_name> = ?;
I'd like the clients of this method to give me a BoundStatement with an already bound parameters (two parameters in this case) and a list of partition keys. In this case, all I need to do, is bind the partition keys and execute the queries. Unfortunately, when I bind the key to this statement I fail with an error - com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 0 of CQL type varchar, expecting class java.lang.String but class java.lang.Long provided. The problem is, that I try to bind the key to the first parameter and not the last. Which is a string and not a long.
I can solve this by either giving the partition parameter a name but then I'd have to get the name via method parameters, or by specifying it's index which again will require an additional method parameter. Either way, if I use the name or the index I have to bind it with a specific type. For instance: bs.setLong("<key_name>", partitionKey);. For some reason, I can't leave it to the BoundStatement to interpret the type of the last parameter.
I'd like to avoid passing the parameter name explicitly and bypass the type problem. Is there anything that can be done?
Thanks!
I've posted the same question in 'DataStax Java Driver for Apache Cassandra User Mailing List' and got an answer saying the functionality that I'm missing may be added in the next version (2.2) of the datastax java driver.
In JAVA-721 (to be introduced in 2.2) we are tentatively planning on
adding the following methods with the signature to BoundStatement:
public BoundStatement setObject(int i, V v) public
BoundStatement setObject(String name, V v)
and
You can emulate setObject in 2.1:
void setObject(BoundStatement bs, int position, Object object,
ProtocolVersion protocolVersion) {
DataType type = bs.preparedStatement().getVariables().getType(position);
ByteBuffer buffer = type.serialize(object, protocolVersion);
bs.setBytesUnsafe(position, buffer);
}
To avoid passing the parameter name, one thing you could do is look
for a position that isn't bound yet:
int findUnsetPosition(BoundStatement bs) {
int size = bs.preparedStatement().getVariables().size();
for (int i = 0; i < size; i++)
if (!bs.isSet(i))
return i;
throw new IllegalArgumentException("found no unset position");
}
I don't recommend it though, because it's ugly and unpredictable if
the user forgot to bind one of the non-PK variables.
The way I would do it is require the user to pass a callback that sets
the PK:
interface PKBinder<T> {
void bind(BoundStatement bs, T pk);
}
public <T> Future<List<ResultSet>> executeMultipleAsync(final BoundStatement statement, PKBinder<T> pkBinder, final T...
partitionKeys)
As a bonus, this will also work with composite partition keys.

How to override string serialization in ServiceStack.Text?

How come the following works to override Guid formatting:
ServiceStack.Text.JsConfig<Guid>.SerializeFn = guid => guid.ToString();
But doing this to force null strings to empty strings doesn't?
ServiceStack.Text.JsConfig<string>.SerializeFn = str => str ?? string.Empty;
I have this enabled:
ServiceStack.Text.JsConfig.IncludeNullValues = true;
I have also tried the String class rather than the string primitive. And the raw version named .RawSerializeFn
Is there a different work around?
String's are specially handled in ServiceStack.Text so you can't override their behavior with configuration.
Given you can't override it, the only solution I can see (other than submitting a pull-request) is to reflect over the model and populate null properties with empty strings.

Using *.resx files to store string value pairs

I have an application that requires mappings between string values, so essentially a container that can hold key values pairs. Instead of using a dictionary or a name-value collection I used a resource file that I access programmatically in my code. I understand resource files are used in localization scenarios for multi-language implementations and the likes. However I like their strongly typed nature which ensures that if the value is changed the application does not compile.
However I would like to know if there are any important cons of using a *.resx file for simple key-value pair storage instead of using a more traditional programmatic type.
There are two cons which I can think of out of the blue:
it requires I/O operation to read key/value pair, which may result in significant performance decrease,
if you let standard .Net logic to resolve loading resources, it will always try to find the file corresponding to CultureInfo.CurrentUICulture property; this could be problematic if you decide that you actually want to have multiple resx-es (i.e. one per language); this could result in even further performance degradation.
BTW. Couldn't you just create helper class or structure containing properties, like that:
public static class GlobalConstants
{
private const int _SomeInt = 42;
private const string _SomeString = "Ultimate answer";
public static int SomeInt
{
get
{
return _SomeInt;
}
}
public static string SomeString
{
get
{
return _SomeString;
}
}
}
You can then access these properties exactly the same way, as resource files (I am assuming that you're used to this style):
textBox1.Text = GlobalConstants.SomeString;
textBox1.Top = GlobalConstants.SomeInt;
Maybe it is not the best thing to do, but I firmly believe this is still better than using resource file for that...

Resources