Hadoop... Text.toString() conversion problems - text

I'm writing a simple program for enumerating triangles in directed graphs for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab symbol serves as a delimiter) I want my map function output the following pairs ([a, to_b], [b, from_a], [a_b, -1]):
public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
String line = value.toString();
String [] tokens = line.split(" ");
output.collect(new Text(tokens[0]), new Text("to_"+tokens[1]));
output.collect(new Text(tokens[1]), new Text("from_"+tokens[0]));
output.collect(new Text(tokens[0]+"_"+tokens[1]), new Text("-1"));
}
Now my reduce function is supposed to cross join all pairs that have both to_'s and from_'s
and to simply emit any other pairs whose keys contain "_".
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
String key_s = key.toString();
if (key_s.indexOf("_")>0)
output.collect(key, new Text("completed"));
else {
HashMap <String, ArrayList<String>> lists = new HashMap <String, ArrayList<String>> ();
while (values.hasNext()) {
String line = values.next().toString();
String[] tokens = line.split("_");
if (!lists.containsKey(tokens[0])) {
lists.put(tokens[0], new ArrayList<String>());
}
lists.get(tokens[0]).add(tokens[1]);
}
for (String t : lists.get("to"))
for (String f : lists.get("from"))
output.collect(new Text(t+"_"+f), key);
}
}
And this is where the most exciting stuff happens. tokens[1] yields an ArrayOutOfBounds exception. If you scroll up, you can see that by this point the iterator should give values like "to_a", "from_b", "to_b", etc... when I just output these values, everything looks ok and I have "to_a", "from_b". But split() don't work at all, moreover line.length() is always 1 and indexOf("") returns -1! The very same indexOf WORKS PERFECTLY for keys... where we have pairs whose keys contain "" and look like "a_b", "b_c"
I'm really puzzled with all this. MapReduce is supposed to save lives making everything simple. Instead I spent several hours to just localize this.

NOt sure if that's the problem by try changing this:
String [] tokens = line.split(" ");
to this:
String [] tokens = line.split("\t");

Related

Grails convert String to Map with comma in string values

I want convert string to Map in grails. I already have a function of string to map conversion. Heres the code,
static def StringToMap(String reportValues){
Map result=[:]
result=reportValues.replace('[','').replace(']','').replace(' ','').split(',').inject([:]){map,token ->
List tokenizeStr=token.split(':');
tokenizeStr.size()>1?tokenizeStr?.with {map[it[0]?.toString()?.trim()]=it[1]?.toString()?.trim()}:tokenizeStr?.with {map[it[0]?.toString()?.trim()]=''}
map
}
return result
}
But, I have String with comma in the values, so the above function doesn't work for me. Heres my String
[program_type:, subsidiary_code:, groupName:, termination_date:, effective_date:, subsidiary_name:ABC, INC]
my function returns ABC only. not ABC, INC. I googled about it but couldnt find any concrete help.
Generally speaking, if I have to convert a Stringified Map to a Map object I try to make use of Eval.me. Your example String though isn't quite right to do so, if you had the following it would "just work":
// Note I have added '' around the values.
​String a = "[program_type:'', subsidiary_code:'', groupName:'', termination_date:'', effective_date:'', subsidiary_name:'ABC']"
Map b = Eval.me(a)​
// returns b = [program_type:, subsidiary_code:, groupName:, termination_date:, effective_date:, subsidiary_name:ABC]
If you have control of the String then if you can create it following this kind of pattern, it would be the easiest solution I suspect.
In case it is not possible to change the input parameter, this might be a not so clean and not so short option. It relies on the colon instead of comma values.
​String reportValues = "[program_type:, subsidiary_code:, groupName:, termination_date:, effective_date:, subsidiary_name:ABC, INC]"
reportValues = reportValues[1..-2]
def m = reportValues.split(":")
def map = [:]
def length = m.size()
m.eachWithIndex { v, i ->
if(i != 0) {
List l = m[i].split(",")
if (i == length-1) {
map.put(m[i-1].split(",")[-1], l.join(","))
} else {
map.put(m[i-1].split(",")[-1], l[0..-2].join(","))
}
}
}
map.each {key, value -> println "key: " + key + " value: " + value}
BTW: Only use eval on trusted input, AFAIK it executes everything.
You could try messing around with this bit of code:
String tempString = "[program_type:11, 'aa':'bb', subsidiary_code:, groupName:, termination_date:, effective_date:, subsidiary_name:ABC, INC]"
List StringasList = tempString.tokenize('[],')
def finalMap=[:]
StringasList?.each { e->
def f = e?.split(':')
finalMap."${f[0]}"= f.size()>1 ? f[1] : null
}
println """-- tempString: ${tempString.getClass()} StringasList: ${StringasList.getClass()}
finalMap: ${finalMap.getClass()} \n Results\n finalMap ${finalMap}
"""
Above produces:
-- tempString: class java.lang.String StringasList: class java.util.ArrayList
finalMap: class java.util.LinkedHashMap
Results
finalMap [program_type:11, 'aa':'bb', subsidiary_code:null, groupName:null, termination_date:null, effective_date:null, subsidiary_name:ABC, INC:null]
It tokenizes the String then converts ArrayList by iterating through the list and passing each one again split against : into a map. It also has to check to ensure the size is greater than 1 otherwise it will break on f[1]

How to collect a string to a stack of characters in Java 8? [duplicate]

I would like to convert the string containing abc to a list of characters and a hashset of characters. How can I do that in Java ?
List<Character> charList = new ArrayList<Character>("abc".toCharArray());
In Java8 you can use streams I suppose.
List of Character objects:
List<Character> chars = str.chars()
.mapToObj(e->(char)e).collect(Collectors.toList());
And set could be obtained in a similar way:
Set<Character> charsSet = str.chars()
.mapToObj(e->(char)e).collect(Collectors.toSet());
You will have to either use a loop, or create a collection wrapper like Arrays.asList which works on primitive char arrays (or directly on strings).
List<Character> list = new ArrayList<Character>();
Set<Character> unique = new HashSet<Character>();
for(char c : "abc".toCharArray()) {
list.add(c);
unique.add(c);
}
Here is an Arrays.asList like wrapper for strings:
public List<Character> asList(final String string) {
return new AbstractList<Character>() {
public int size() { return string.length(); }
public Character get(int index) { return string.charAt(index); }
};
}
This one is an immutable list, though. If you want a mutable list, use this with a char[]:
public List<Character> asList(final char[] string) {
return new AbstractList<Character>() {
public int size() { return string.length; }
public Character get(int index) { return string[index]; }
public Character set(int index, Character newVal) {
char old = string[index];
string[index] = newVal;
return old;
}
};
}
Analogous to this you can implement this for the other primitive types.
Note that using this normally is not recommended, since for every access you
would do a boxing and unboxing operation.
The Guava library contains similar List wrapper methods for several primitive array classes, like Chars.asList, and a wrapper for String in Lists.charactersOf(String).
The lack of a good way to convert between a primitive array and a collection of its corresponding wrapper type is solved by some third party libraries. Guava, a very common one, has a convenience method to do the conversion:
List<Character> characterList = Chars.asList("abc".toCharArray());
Set<Character> characterSet = new HashSet<Character>(characterList);
Use a Java 8 Stream.
myString.chars().mapToObj(i -> (char) i).collect(Collectors.toList());
Breakdown:
myString
.chars() // Convert to an IntStream
.mapToObj(i -> (char) i) // Convert int to char, which gets boxed to Character
.collect(Collectors.toList()); // Collect in a List<Character>
(I have absolutely no idea why String#chars() returns an IntStream.)
The most straightforward way is to use a for loop to add elements to a new List:
String abc = "abc";
List<Character> charList = new ArrayList<Character>();
for (char c : abc.toCharArray()) {
charList.add(c);
}
Similarly, for a Set:
String abc = "abc";
Set<Character> charSet = new HashSet<Character>();
for (char c : abc.toCharArray()) {
charSet.add(c);
}
List<String> result = Arrays.asList("abc".split(""));
Create an empty list of Character and then make a loop to get every character from the array and put them in the list one by one.
List<Character> characterList = new ArrayList<Character>();
char arrayChar[] = abc.toCharArray();
for (char aChar : arrayChar)
{
characterList.add(aChar); // autoboxing
}
You can do this without boxing if you use Eclipse Collections:
CharAdapter abc = Strings.asChars("abc");
CharList list = abc.toList();
CharSet set = abc.toSet();
CharBag bag = abc.toBag();
Because CharAdapter is an ImmutableCharList, calling collect on it will return an ImmutableList.
ImmutableList<Character> immutableList = abc.collect(Character::valueOf);
If you want to return a boxed List, Set or Bag of Character, the following will work:
LazyIterable<Character> lazyIterable = abc.asLazy().collect(Character::valueOf);
List<Character> list = lazyIterable.toList();
Set<Character> set = lazyIterable.toSet();
Bag<Character> set = lazyIterable.toBag();
Note: I am a committer for Eclipse Collections.
IntStream can be used to access each character and add them to the list.
String str = "abc";
List<Character> charList = new ArrayList<>();
IntStream.range(0,str.length()).forEach(i -> charList.add(str.charAt(i)));
Using Java 8 - Stream Funtion:
Converting A String into Character List:
ArrayList<Character> characterList = givenStringVariable
.chars()
.mapToObj(c-> (char)c)
.collect(collectors.toList());
Converting A Character List into String:
String givenStringVariable = characterList
.stream()
.map(String::valueOf)
.collect(Collectors.joining())
To get a list of Characters / Strings -
List<String> stringsOfCharacters = string.chars().
mapToObj(i -> (char)i).
map(c -> c.toString()).
collect(Collectors.toList());

Convert String To Nullable Integer List

I'm wanting to parse a string into a nullable int list in C#
I'm able to convert it to int list bit not a nullable one
string data = "1,2";
List<int> TagIds = data.Split(',').Select(int.Parse).ToList();
say when data will be empty i want to handle that part!
Thanks
You can use following extension method:
public static int? TryGetInt32(this string item)
{
int i;
bool success = int.TryParse(item, out i);
return success ? (int?)i : (int?)null;
}
Then it's simple:
List<int?> TagIds = data.Split(',')
.Select(s => s.TryGetInt32())
.ToList();
I use that extension method always in LINQ queries if the format can be invalid, it's better than using a local variable and int.TryParse (E. Lippert gave an example, follow link).
Apart from that it may be better to use data.Split(new[]{','}, StringSplitOptions.RemoveEmptyEntries) instead which omits empty strings in the first place.

Parsing a CSV string and using elements of the string for computation- homework

I am very new to java, and this is homework. Any direction would be appreciated.
The assignment is to read an external text file and then parse that file to produce a new file.
The external file looks something like this:
2 //number of lines in the file
3,+,4,*,2,-.
5,*,2,T,1,+
I have to read this file and produce an output that takes the preceding int value and prints the following character (skipping the comma). So the output would look like this:
+++****--
*****TT+
I have tried to setup my code using two methods. The first to read the external file (passed as a parameter) which, as long as there is a next line, will call a second method, processLine, to parse the line. This is where I am lost. I can't figure out how this method should be structured so it reads the line and interprets the token values as either ints or chars, and then executes code based on those values.
I am only able to use what we have covered in class, so no external libraries, just the basics.
public static void numToImageRep(File input, File output) //rcv file
throws FileNotFoundException {
Scanner read = new Scanner(input);
while(read.hasNextLine()){ //read file line by line
String data = read.nextLine();
processLine(data); //pass line for processing
}
}
public static void processLine(String text){ //incomplete, all falls apart here.
Scanner process = new Scanner(text);
while(process.hasNext()){
if(process.hasNextInt()){
int multi = process.nextInt();
}
if(process.hasNext()==','){
}
}
this method can be a simple example that can do the job:
public static String processLine(String text){
String result = "";
String[] splitted = text.split(",");
int remaining = 0;
for(int i=0;i<splitted.length;i+=2)
{
remaining = (Integer.parseInt(splitted[i]));
while( remaining-- >0)
result += splitted[i+1];
}
return result;
}

Linq to split/analyse substrings

I have got a List of strings like:
String1
String1.String2
String1.String2.String3
Other1
Other1.Other2
Test1
Stuff1.Stuff1
Text1.Text2.Text3
Folder1.Folder2.FolderA
Folder1.Folder2.FolderB
Folder1.Folder2.FolderB.FolderC
Now I would like to group this into:
String1.String2.String3
Other1.Other2
Test1
Stuff1.Stuff1
Text1.Text2.Text3
Folder1.Folder2.FolderA
Folder1.Folder2.FolderB.FolderC
If
"String1" is in the next item "String1.String2" I will ignore the first one
and if the second item is in the third I will only take the third "String1.String2.String3"
and so on (n items). The string is structured like a node/path and could be split by a dot.
As you can see for the Folder example Folder2 has got two different Subfolder items so I would need both strings.
Do you know how to handle this with Linq? I would prefer VB.Net but C# is also ok.
Regards Athu
Dim r = input.Where(Function(e, i) i = input.Count - 1 OrElse Not input(i + 1).StartsWith(e + ".")).ToList()
Condition within Where method checks if element is last from input or is not followed by element, that contains current one.
That solution uses the fact, that input is List(Of String), so Count and input(i+1) are available on O(1) time.
LINQ isn't really the correct approach here, because you need to access more than one item at a time.
I would go with something like this:
public static IEnumerable<string> Filter(this IEnumerable<string> source)
{
string previous = null;
foreach(var current in source)
{
if(previous != null && !current.Contains(previous))
yield return previous;
previous = current;
}
yield return previous;
}
Usage:
var result = strings.Filter();
Pretty simple one. Try this:
var lst = new List<string> { /*...*/ };
var sorted =
from item in lst
where lst.Last() == item || !lst[lst.IndexOf(item) + 1].Contains(item)
select item;
the following simple line can do the trick, I'm not sure about the performance cost through
List<string> someStuff = new List<string>();
//Code to the strings here, code not added for brewity
IEnumerable<string> result = someStuff.Where(s => someStuff.Count(x => x.StartsWith(s)) == 1);

Resources