Groovy: indexes of substrings? - groovy

How do I find the indexes of all the occurances of a substring in a large string -
(so basically ,and extension of the "indexOf" function) . Any ideas?
Current situation:
def text = " --- --- bcd -- bcd ---"
def sub = "bcd"
text.indexOf(sub)
// = 9
I want something like:
def text = " --- --- bcd -- bcd ---"
def sub = "bcd"
text.indexesOf(sub)
// = [9,15]
Is there such a function? How should I implement it otherwise? (in a non trivial way)

You could write a new addition to the String metaClass like so:
String.metaClass.indexesOf = { match ->
def ret = []
def idx = -1
while( ( idx = delegate.indexOf( match, idx + 1 ) ) > -1 ) {
ret << idx
}
ret
}
def text = " --- --- bcd -- bcd ---"
def sub = "bcd"
text.indexesOf(sub)
There is nothing I know of that exists in groovy currently that gets you this for free though

This is a relatively easy approach:
String.metaClass.indexesOf = { arg ->
def result = []
int index = delegate.indexOf(arg)
while (index != -1) {
result.add(index);
index = delegate.indexOf(arg, index+1);
}
return result;
}
Note that this will find overlapping instances (i.e. "fooo".indexesOf("oo") will return [1, 2]). If you don't want this, replace index+1 with index+arg.length().

Related

Groovy script for streamsets to parse string of about 1500 characters

This is for streamsets, I am trying to write groovy script.
I have string of length 1500 chars. No delimiter. The pattern is first 4 characters are some code, next 4 characters are length of word followed by the word. Again it as 4 chars of some code and 4 chars of lenght of word followed by the word.
e.g.
22010005PHONE00010002IN00780004ROSE
When you decode,it will be like
2201 - code
0005 - Length of the word
PHONE - Word
0001 - code
0002 - Length of the word
IN - Word
0078 - code
0004 - Length of the word
ROSE - Word
and so on..
I need help on groovy script to create string if the code starts with 00.
Thus the final string would be INROSE.
I am trying using while loop and str:substring.
Any help is very much appreciated.
Thanks
def dtx_buf = record.value['TXN_BUFFER']
def fieldid = []
def fieldlen = []
def dtx_out = []
def i = 13
def j = 0
while (i < dtx_buf.size())
{
// values = record.value['TXN_BUFFER']
fieldid[j] = str.substring(values,j,4)
output.write(record)
}
Expected result "INROSE"
One way would be to write an Iterator that contains the rules for parsing the input:
class Tokeniser implements Iterator {
String buf
String code
String len
String word
// hasNext is true if there's still chars left in `buf`
boolean hasNext() { buf }
Object next() {
// Get the code and the remaining string
(code, buf) = token(buf)
// Get the length and the remaining string
(len, buf) = token(buf)
// Get the word (of the given length), and the remaining string
(word, buf) = token(buf, len as Integer)
// Return a map of the code and the word
[code: code, word: word]
}
// This splits the string into the first `length` chars, and the rest
private token(String input, int length = 4) {
[input.take(length), input.drop(length)]
}
}
Then, we can use this to do:
def result = new Tokeniser(buf: '22010005PHONE00010002IN00780004ROSE')
.findAll { it.code.startsWith('00') }
.word
.join()
And result is INROSE
Take 2
We can try another iterative method without an internal class, to see if that works any better in your environment:
def input = '22010005PHONE00010002IN00780004ROSE'
def pos = 0
def words = []
while (pos < input.length() - 8) {
def code = input.substring(pos, pos + 4)
def len = input.substring(pos + 4, pos + 8) as Integer
def word = input.substring(pos + 8, pos + 8 + len)
if (code.startsWith('00')) {
words << word
}
pos += 8 + len
}
def result = words.join()

little problem on code for finding substring within string scala

I am currently working on a small code that should allow to tell if a given substring is within a string. I checked all the other similar questions but everybody is using predefined functions. I need to build it from scratch… could you please tell me what I did wrong?
def substring(s: String, t: String): Boolean ={
var i = 0 // position on substring
var j = 0 // position on string
var result = false
var isSim = true
var n = s.length // small string size
var m = t.length // BIG string size
// m must always be bigger than n + j
while (m>n+j && isSim == true){
// j grows with i
// stopping the loop at i<n
while (i<n && isSim == true){
// if characters are similar
if (s(i)==t(j)){
// add 1 to i. So j will increase by one as well
// this will run the loop looking for similarities. If not, exit the loop.
i += 1
j = i+1
// exciting the loop if no similarity is found
}
// answer given if no similarity is found
isSim = false
}
}
// printing the output
isSim
}
substring("moth", "ramathaaaaaaa")
The problem consists of two subproblems of same kind. You have to check whether
there exists a start index j such that
for all i <- 0 until n it holds that substring(i) == string(j + i)
Whenever you have to check whether some predicate holds for some / for all elements of a sequence, it can be quite handy if you can short-circuit and exit early by using the return keyword. Therefore, I'd suggest to eliminate all variables and while-loops, and use a nested method instead:
def substring(s: String, t: String): Boolean ={
val n = s.length // small string size
val m = t.length // BIG string size
def substringStartingAt(startIndex: Int): Boolean = {
for (i <- 0 until n) {
if (s(i) != t(startIndex + i)) return false
}
true
}
for (possibleStartIndex <- 0 to m - n) {
if (substringStartingAt(possibleStartIndex)) return true
}
false
}
The inner method checks whether all s(j + i) == t(i) for a given j. The outer for-loop checks whether there exists a suitable offset j.
Example:
for (
(sub, str) <- List(
("moth", "ramathaaaaaaa"),
("moth", "ramothaaaaaaa"),
("moth", "mothraaaaaaaa"),
("moth", "raaaaaaaamoth"),
("moth", "mmoth"),
("moth", "moth"),
)
) {
println(sub + " " + " " + str + ": " + substring(sub, str))
}
output:
moth ramathaaaaaaa: false
moth ramothaaaaaaa: true
moth mothraaaaaaaa: true
moth raaaaaaaamoth: true
moth mmoth: true
moth moth: true
If you were allowed to use built-in methods, you could of course also write
def substring(s: String, t: String): Boolean = {
val n = s.size
val m = t.size
(0 to m-n).exists(j => (0 until n).forall(i => s(i) == t(j + i)))
}
I offer the following slightly more idiomatic Scala code, not because I think it will perform better than Andrey's code--I don't--but simply because it uses recursion and is, perhaps, slightly easier to read:
/**
* Method to determine if "sub" is a substring of "string".
*
* #param sub the candidate substring.
* #param string the full string.
* #return true if "sub" is a substring of "string".
*/
def substring(sub: String, string: String): Boolean = {
val p = sub.toList
/**
* Tail-recursive method to determine if "p" is a subsequence of "s"
*
* #param s the super-sequence to be tested (part of the original "string").
* #return as follows:
* (1) "p" longer than "s" => false;
* (2) "p" elements match the corresponding "s" elements (starting at the start of "s") => true
* (3) recursively invoke substring on "p" and the tail of "s".
*/
#tailrec def substring(s: Seq[Char]): Boolean = p.length <= s.length && (
s.startsWith(p) || (
s match {
case Nil => false
case _ :: z => substring(z)
}
)
)
p.isEmpty || substring(string.toList)
}
If you object to using the built-in method startsWith then we could use something like:
(p zip s forall (t => t._1==t._2))
But we have to draw the line somewhere between creating everything from scratch and using built-in functions.

Incomprehensible technical interview

This was a question asked in a recent programming interview.
Given a string "str" and pair of "N" swapping indices, generate a lexicographically largest string. Swapping indices can be reused any number times.
Eg:
String = "abdc"
Indices:
(1,4)
(3,4)
Answer:
cdba, cbad, dbac,dbca
You should print only "dbca" which is lexicographically largest.
This might sound naive, but I completely fail to follow the question. Can someone please help me understand what the question means?
I think it's saying that, given the string mystring = "abdc", you are instructed to switch characters at the specified index pairs such that you produce the lexicographically "largest" string (i.e. such that if you lex-sorted all possible strings, it would end up at the last index). So you have two valid operations: (1) switch mystring[1] with mystring[4] ("abdc" --> "cbda"), and (2) switch mystring[3] with mystring[4] ("abdc" --> "abcd"). Also, you can multiply chain operations: either operation (1) followed by (2) ("abdc" --> "cbda" --> "cbad"), or vice versa ("abdc" --> "abcd" --> "dbca"), and so on and so forth ("abdc" --> "cbda" --> "cbad" --> "dbac").
Then you (reverse) lex-sort these and pop off the top index:
>>> allPermutations = ['abcd', 'cbad', 'abdc', 'cbda', 'dbca', 'dbac']
>>> lexSorted = sorted(allPermutations, reverse=True) # ['dbca', 'dbac', 'cbda', 'cbad', 'abdc', 'abcd']
>>> lexSorted.pop(0)
'dbca'
Based on the clarification by #ncemami I came up with this solution.
public static String swap(String str, Pair<Integer, Integer> p1, Pair<Integer, Integer> p2){
TreeSet<String> set = new TreeSet<>();
String s1 = swap(str, p1.getKey(), p1.getValue());
set.add(s1);
String s2 = swap(s1, p2.getKey(), p2.getValue());
set.add(s2);
String s3 = swap(str, p2.getKey(), p2.getValue());
set.add(s3);
String s4 = swap(s3, p1.getKey(), p1.getValue());
set.add(s4);
return set.last();
}
private static String swap(String str, int a, int b){
StringBuilder sb = new StringBuilder(str);
char temp1 = str.charAt(a);
char temp2 = str.charAt(b);
sb.setCharAt(a, temp2);
sb.setCharAt(b, temp1);
return sb.toString();
}
Here my Java solution:
String swapLexOrder(String str, int[][] pairs) {
Map<Integer, Set<Integer>> neighbours = new HashMap<>();
for (int[] pair : pairs) {
// It contains all the positions that are reachable from the index present in the pairs
Set<Integer> reachablePositionsL = neighbours.get(pair[0]);
Set<Integer> temp = neighbours.get(pair[1]); // We use it just to merge the two sets if present
if (reachablePositionsL == null) {
reachablePositionsL = (temp == null ? new TreeSet<>() : temp);
} else if (temp != null) {
// Changing the reference so every addition to "reachablePositionsL" will reflect on both positions
for (Integer index: temp) {
neighbours.put(index, reachablePositionsL);
}
reachablePositionsL.addAll(temp);
}
reachablePositionsL.add(pair[0]);
reachablePositionsL.add(pair[1]);
neighbours.put(pair[0], reachablePositionsL);
neighbours.put(pair[1], reachablePositionsL);
}
StringBuilder result = new StringBuilder(str);
for (Set<Integer> set : neighbours.values()) {
Iterator<Character> orderedCharacters = set.stream()
.map(i -> str.charAt(i - 1))
.sorted(Comparator.reverseOrder())
.iterator();
set.forEach(i -> result.setCharAt(i - 1, orderedCharacters.next()));
}
return result.toString();
}
Here an article that explain my the problem.
String = "abcd"
co_ord = [(1,4),(3,4)]
def find_combinations(co_ord, String):
l1 = []
for tup_le in co_ord:
l1.extend(tup_le)
l1 = [x-1 for x in l1]
l1 = list(set(l1))
l2 = set(range(len(String)))-set(l1)
return l1,int(''.join(str(i) for i in l2))
def perm1(lst):
if len(lst) == 0:
return []
elif len(lst) == 1:
return [lst]
else:
l = []
for i in range(len(lst)):
x = lst[i]
xs = lst[:i] + lst[i+1:]
for p in perm1(xs):
l.append([x]+p)
return l
lx, ly = find_combinations(co_ord, String)
final = perm1(lx)
print(final)
temp = []
final_list=[]
for i in final:
for j in i:
temp.append(String[j])
final_list.append(''.join(temp))
temp=[]
final_list = [ i[:ly] + String[ly] + i[ly:] for i in final_list]
print(sorted(final_list,reverse=True)[0])

How I can find the contents of inner nesting?

Lets say I have a string like this :
string = [+++[>>[--]]]abced
Now I want a someway to return a list that has: [[--],[>>],[+++]]. That is the contents of the deepest [ nesting followed by other nesting. I came up with this solution like this :
def string = "[+++[>>[--]]]"
loop = []
temp = []
string.each {
bool = false
if(it == "["){
temp = []
bool = true
}
else if( it != "]")
temp << it
if(bool)
loop << temp
}
println loop.reverse()
But this indeed takes the abced string after the last ] and put into the result!. But what I want is only [[--],[>>],[+++]]
Are there any groovy way of solving this?
You can use this, if you wouldn't mind using recursion
def sub(s , list){
if(!s.contains('[') && !s.contains('['))
return list
def clipped = s.substring(s.lastIndexOf('[')+1, s.indexOf(']'))
list.add(clipped)
s = s - "[$clipped]"
sub(s , list)
}
Calling
sub('''[+++[>>[--]]]abced''' , [])
returns a list of all subportions enclosed between braces.
['--', '>>', '+++']
If your brackets are symmetrical, you could just introduce a counter variable that holds the depth of the bracket nesting. Only depth levels above 0 are allowed in the output:
def string = "[+++[>>[--]]]abc"
loop = []
temp = []
depth = 0;
string.each {
bool = false
if(it == "["){
temp = []
bool = true
depth++;
}
else if (it == "]"){
depth--;
}
else if (depth > 0){
temp << it
}
if(bool){
loop << temp
}
}
println loop.reverse()
class Main {
private static final def pattern = ~/([^\[]*)\[(.+?)\][^\]]*/
static void main(String[] args) {
def string = "[+++[>>[--]]]abced"
def result = match(string)
println result
}
static def match(String val) {
def matcher = pattern.matcher(val);
if (matcher.matches()) {
return matcher.group(1) ? match(matcher.group(2)) + matcher.group(1) : match(matcher.group(2))
}
[val]
}
}
System.out
[--, >>, +++]
The capturing of the first group in the regex pattern could probably be improved. Right now the first group is any character that is not [ and if there are nothing in front of the first [ then the first group will contain an empty string.

Create comma separated string from 2 lists the groovy way

What I have so far is:
def imageColumns = ["products_image", "procuts_subimage1", "products_subimage2", "prodcuts_subimage3", "products_subimage4"]
def imageValues = ["1.jpg","2.jpg","3.jpg"]
def imageColumnsValues = []
// only care for columns with values
imageValues.eachWithIndex { image,i ->
imageColumnsValues << "${imageColumns[i]} = '${image}'"
}
println imageColumnValuePair.join(", ")
It works but I think it could be better. Wish there was a collectWithIndex ... Any suggestions?
There's no collectWithIndex, but you can achieve the same result with a little effort:
def imageColumns = ["products_image", "procuts_subimage1", "products_subimage2", "prodcuts_subimage3", "products_subimage4"]
def imageValues = ["1.jpg","2.jpg","3.jpg"]
def imageColumnsValues = [imageValues, 0..<imageValues.size()].transpose().collect { image, i ->
"${imageColumns[i]} = '${image}'"
}
println imageColumnsValues.join(", ")
This takes the list of items and a range of numbers from 0 size(list) - 1, and zips them together with transpose. Then you can just collect over that result.

Resources