I have been learning Groovy for a week, but I have a problem: I need to store the combinations of words to the corresponding ascii values. For example the combinations of the word "the" and the sum of the ascii values of each characters is 10(not exactly,but still for example).
There are, obviously, other words where the sum of ascii values is 10. I want a data structure where I can look up the value 10 and get the words where the sum of the ascii values is 10. I can't do this in Groovy Map, because the key value should be unique. How to do this in Groovy?
You could do something like this as well:
def words = [ 'the', 'het', 'love', 'groovy' ]
words.groupBy { ( it as char[] ).collect { it as int }.sum() }
That gives you the map:
[321:[the, het], 438:[love], 678:[groovy]]
Here's a small example that initiates a Map that returns an empty List in case a key is requested for the first time using the withDefault method:
def map = [:].withDefault { [] }
map[10] << 'the'
map[10] << 'as'
map[20] << 'from'
assert map[10] == ['the', 'as']
assert map[20] == ['from']
A map of Int to List of String should fix this. Simply append the words having the same sum, to the list corresponding to a key.
You need a multi-map data structure. One can be found in Apache Commons, another one in Guava.
In Groovy you can use a simple Map of int -> list, with default value for convenience.
Related
I want to write a code that regards strings with characters in different order as equal as long as same characters are placed in the strings. For example, suppose $a = "ksv", whenever somebody input string value "svk" or "kvs", I want to write a code that these strings are equivalent to $a. Here is an example,
#ans=("ksv", "kvs", "svk", "vsk",......);
if (#input[1] ~~ #ans) {
return 'EXACT_ANS';
}
#input[1] is what user will put the string. At first, I listed all of the different types of order as an array (just like the example) so that if one of elements in the array matches with #input[1], then I return it as a correct answer. However, this is quite a long and tedious work if I have a string with much longer length. Please give me any advice on this. Thank you^^
You want something of the form
if (normalize_string($input) eq normalize_string('ksv')) {
...
}
where normalize_string is a sub that returns the same string for all equivalent inputs, and returns different strings for inputs that aren't equivalent.
The exact definition of normalize_string will vary based on what you consider equivalent.
If you want to ignore duplicate characters (abbc is equivalent to abc):
sub normalize_string {
my %h;
++$h{$_} for split //, $_[0];
return join '', sort keys %h;
}
If the number of instances of each character is pertinent (abbc isn't equivalent to abc):
sub normalize_string {
return join '', sort split //, $_[0];
}
Of course, you can inline the normalized form when the parameter is a constant.
if (normalize_string($input) eq 'ksv') {
...
}
I am trying to write a function that takes a string txt and returns an int of that string's character's ascii numbers. It also takes a second argument, n, that is an int that specified the number of digits that each character should translate to. The default value of n is 3. n is always > 3 and the string input is always non-empty.
Example outputs:
string_to_number('fff')
102102102
string_to_number('ABBA', n = 4)
65006600660065
My current strategy is to split txt into its characters by converting it into a list. Then, I convert the characters into their ord values and append this to a new list. I then try to combine the elements in this new list into a number (e.g. I would go from ['102', '102', '102'] to ['102102102']. Then I try to convert the first element of this list (aka the only element), into an integer. My current code looks like this:
def string_to_number(txt, n=3):
characters = list(txt)
ord_values = []
for character in characters:
ord_values.append(ord(character))
joined_ord_values = ''.join(ord_values)
final_number = int(joined_ord_values[0])
return final_number
The issue is that I get a Type Error. I can write code that successfully returns the integer of a single-character string, however when it comes to ones that contain more than one character, I can't because of this type error. Is there any way of fixing this. Thank you, and apologies if this is quite long.
Try this:
def string_to_number(text, n=3):
return int(''.join('{:0>{}}'.format(ord(c), n) for c in text))
print(string_to_number('fff'))
print(string_to_number('ABBA', n=4))
Output:
102102102
65006600660065
Edit: without list comprehension, as OP asked in the comment
def string_to_number(text, n=3):
l = []
for c in text:
l.append('{:0>{}}'.format(ord(c), n))
return int(''.join(l))
Useful link(s):
string formatting in python: contains pretty much everything you need to know about string formatting in python
The join method expects an array of strings, so you'll need to convert your ASCII codes into strings. This almost gets it done:
ord_values.append(str(ord(character)))
except that it doesn't respect your number-of-digits requirement.
I want to find if two strings are anagrams or not..
I thought to sort them,and then check one by one but is there any algorithms for sorting stings? or another idea to make it? (simple ideas or code because i am a beginner )thanks
Strings are lists of characters in Haskell, so the standard sort simply works.
> import Data.List
> sort "hello"
"ehllo"
Your idea of sorting and then comparing sounds fine for checking anagrams.
I can give you and idea-(as I am not that much acquainted with haskell).
Take an array having 26 spaces.
Now for each character in the first string you increase certaing position in array.
If array A[26]={0,0,...0}
Now if you find 'a' then put A[1]=A[1]+1;
if 'b' then A[2]=A[2]+1;
Now in case of 2nd string for each character you decrease the values for each character found in the same array.(if you find 'a' decrease A[1] like A[1]=A[1]-1)
At last check if all the array elements are 0 or not. If 0 then definitely they are anagram else not an anagram.
Note: You may extend this for Capital letters similarly.
It is not necessary to count the crowd each letter.
Simply, you can sort your string and then check each element of two lists.
For example, you have this
"cinema" and "maneci"
It would be helpful to make your string into a list of characters.
['c','i','n','e','m','a'] and ['m','a','n','e','c','i']
Then , you can sort these list and you will check each character.
Note that you will have these cases :
example [] [] = True
example [] a = False
example a [] = False
example (h1:t1)(h2:t2) = if h1==h2 then _retroactively_ else False
In the Joy of Haskell "Finding Success and Failure", pp.11-14, the authors offer the following code which works:
import Data.List
isAnagram :: String -> String -> Bool
isAnagram word1 word2 = (sort word1) == (sort word2)
After importing your module (I imported practice.hs into Clash), you can enter two strings which, if they are anagrams, will return true:
*Practice> isAnagram "julie" "eiluj"
True
For any given String, for instance
val s = "abde"
how to insert a character c: Char at position 2, after b ?
Update
Which Scala collection to consider for multiple efficient insertions and deletions at random positions ? (Assuming that a String may be transformed into that collection.)
We can use the patch method on Strings in order to insert a String at a specific index:
"abde".patch(2, "c", 0)
// "abcde"
This:
drops 0 (third parameter) elements at index 2
inserts "c" at index 2
which in other words means patching 0 elements at index 2 with the string "c".
Try this
val (fst, snd) = s.splitAt(2)
fst + 'c' + snd
Rope data structure proves a valid alternative to String and StringBuffer for heavy manipulation in (very) large strings, especially in regard to insertions and deletions.
Scalaz includes class Rope[A] (see API and Rope.scala) and class WrappedRope[A] (see API) with a plethora of operations on rope strings.
Implementations in Java include http://ahmadsoft.org/ropes/. A benchmarking study for this Java implementation may be found at http://www.ibm.com/developerworks/library/j-ropes/ .
A publication on ropes as an alternative to strings may be found at http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.14.9450&rep=rep1&type=pdf
This is a coding exercise. Suppose I have to decide if one string is created by a cyclic shift of another. For example: cab is a cyclic shift of abc but cba is not.
Given two strings s1 and s2 we can do that as follows:
if (s1.length != s2.length)
return false
for(int i = 0; i < s1.length(); i++)
if ((s1.substring(i) + s1.substring(0, i)).equals(s2))
return true
return false
Now what if I have an array of strings and want to find all strings that are cyclic shift of one another? For example: ["abc", "xyz", "yzx", "cab", "xxx"] -> ["abc", "cab"], ["xyz", "yzx"], ["xxx"]
It looks like I have to check all pairs of the strings. Is there a "better" (more efficient) way to do that?
As a start, you can know if a string s1 is a rotation of a string s2 with a single call to contains(), like this:
public boolean isRotation(String s1, String s2){
String s2twice = s2+s2;
return s2twice.contains(s1);
}
Namely, if s1 is "rotation" and s2 is "otationr", the concat gives you "otationrotationr", which contains s1 indeed.
Now, even if we assume this is linear, or close to it (which is not impossible using Rabin-Karp, for instance), you are still left with O(n^2) pair comparisons, which may be too much.
What you could do is build an hashtable where the sorted word is the key, and the posting list contains all the words from your list that, if sorted, give the key (ie. key("bca") and key("cab") both should return "abc"):
private Map<String, List<String>> index;
/* ... */
public void buildIndex(String[] words){
for(String word : words){
String sortedWord = sortWord(word);
if(!index.containsKey(sortedWord)){
index.put(sortedWord, new ArrayList<String>());
}
index.get(sortedWord).add(word);
}
}
CAVEAT: The hashtable will contain, for each key, all the words that have exactly the same letters occurring the same amount of times (not just the rotations, ie. "abba" and "baba" will have the same key but isRotation("abba", "baba") will return false).
But once you have built this index, you can significantly reduce the number of pairs you need to consider: if you want all the rotations for "bca" you just need to sort("bca"), look it up in the hashtable, and check (using the isRotation method above, if you want) if the words in the posting list are the result of a rotation or not.
If strings are short compared to the number of strings in the list, you can do significantly better by rotating all strings to some normal form (lexicographic minimum, for example). Then sort lexicographically and find runs of the same string. That's O(n log n), I think... neglecting string lengths. Something to try, maybe.
Concerning the way to find the pairs in the table, there could be many better way, but what I came up as a first thought is to sort the table and apply the check per adjacent pair.
This is much better and simpler that checking every string with every other string in the table
Consider building an automaton for each string against which you wish to test.
Each automaton should have one entry point for each possible character in the string, and transitions for each character, plus an extra transition from the end to the start.
You could improve performance even further if you amalgated the automata.
I think a combination of the answers by Patrick87 and savinos would make a fair amount of sense. Specifically, in a Java-esque pseudo-code:
List<String> inputs = ["abc", "xyz", "yzx", "cab", "xxx"];
Map<String,List<String>> uniques = new Map<String,List<String>>();
for(String value : inputs) {
String normalized = normalize(value);
if(!uniques.contains(normalized)) {
unqiues.put(normalized, new List<String>());
}
uniques.get(normalized).add(value);
}
// you now have a Map of normalized strings to every string in the input
// that is "equal to" that normalized version
Normalizing the string, as stated by Patrick87 might be best done by picking the rotation of the string that results in the lowest lexographic ordering.
It's worth noting, however, that the "best" algorithm probably relies heavily on the inputs... the number of strings, the length of those string, how many duplicates there are, etc.
You can rotate all the strings to a normalized form using Booth's algorithm (https://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation) in O(s) time, where s is the length of the string.
You can then use the normalized form as a key in a HashMap (where the value is the set of rotations seen in the input). You can populate this HashMap in a single pass over the data. i.e., for each string
calculate the normalized form
check if the HashMap contains the normalized form as a key - if not insert the empty Set at this key
add the string to the Set in the HashMap
You then just need to output the values of the HashMap. This makes the total runtime of the algorithm O(n * s) - where n is the number of words and s is the average word length. The total space usage is also O(n * s).