Haskell: How to delete the same function and concatenate two lists

Haskell: How to delete the same function and concatenate two lists - haskell

I am the beginner of haskell. I want to delete some same functions in the same list and concatenate the two list get together.
For example:
db1 = ["David","worksfor.isa", "IBM" ]
db2 = ["David","isa'.worksfor'", "IBM"]
db3 = ["Tom","worksfor.isa", "IBM" ]
the program can be known that "isa'.worksfor' and "worksfor.isa" is the same String. And then use "Concat" to get the new db: db1 =["David","worksfor.isa", "IBM" ] and the others: db3 = ["Tom","worksfor.isa", "IBM" ]
(map (\(a,b,c) -> concat (map(\(a',b',c') -> if ( a b == b' a') then [] else [(a,b ++ "." ++ b',c')])))) ??????
I want to "split the string, if there are ' characters, reverse it, then remove ' characters and check for equivalence"

This should be a comment, but it is far too long:
I assume you find it hard to express yourself in English. I can relate to that; I find it hard myself. However, beyond English there are two other ways to communicate here:
Using precise technical terms.
Using several, diverse examples. A single example will not suffice, and several examples which are too similar give little information.
As for option 1, you are using the wrong terminology. It is not easy for me to see how can a list with 3 items can be considered a database (as hinted by the names db1, db2). Perhaps you wanted to use a list of triples?
[ ("David","isa'.worksfor'", "IBM") ]
You are not specific about what exactly do you want to concatenate, but the term concatenation always refers to an operation that must be "additive", i.e. length(x ++ y) == length(x) ++ length(y). This does not seem to be the case in your question.
Do you want a union of two databases (lists of triples) up to equivalence?
You want the program to understand that
"isa'.worksfor'" and "worksfor.isa" are the same string
But they are not. They might be equivalent strings. You can generally do that using a map operation, like you tried, but you should note that the character ' is not an operation over strings. So a b == b' a' does nothing close to what you want - it calls the function a on the variable b, and compares this with calling the function b' over the variable a'. I can only assume you want something like "split the string, if there are ' characters, reverse it, then remove ' characters and check for equivalence" but this is completely a guesswork.
To conclude:
Please explain in detail what is the general problem you are trying to solve. Try to find the precise terms; it is difficult, but this way you can learn.
Please add different examples of input and output
Please try to explain what have you tried and where are you stuck
As a last tip, maybe you want to solve this problem in a more forgiving language than Haskell (such as JavaScript, Python, Ruby, etc.)

Related

Prolog - List into String

im done of searching here for a solution but nothing so i decided to ask.
So, i have a query that returns in a list something like this:
List = [6-"read",3-"magazines",3-"music",1-"sport"].
And i need to perform a transformation so do i can get the list like this:
List = [read,magazines,music,sport].
or
List = ["read","magazines","music","sport"].
For that i think i should pass first the to a string to take out the numbers and the '-'. But im struggling with that.
Hope someone can help me! Thanks

This looks like homework, so I will not give you the full implementation. What you are looking for is pattern matching in a rule head:
fst_pair(X, pair(X,Y)).
The - operator is just an infix function symbol such that you could write
fst(X, X-Y).
Using this kind of pattern matching it should be easy to write a recursive predicate over the list. It must have a base case for the empty list and a step case for a head followed by the tail of the list:
fsts_list([], []).
fsts_list([ ... | TailFirsts ], [... | Pairs] ) :- % replace ... by some terms
% possibly insert some predicates here, depending on what you do above
fst_lists(TailFirsts, Pairs).
Happy solving!

Erlang: DRYing up string+binary concatenation

I am currently forming strings from strings and binaries like this:
X = string:join(io_lib:format("~s~s~s", ["something1", "something2",<<"something3">>]), "") %X is now something1something2something3
This seems painful and messy. Because in order to dry this up with another such string with a different number of "~n":
Y = string:join(io_lib:format("~s~s", ["something1", <<"something2">>]), "")
I essentially have to write a function that counts the size of the argument list, and forms ~n[that many times] and plugs it into this.
Is there a better way to be doing this?
Eshell V8.0.2 (abort with ^G)
1> F = <<"asdf">>,
1> string:join(io_lib:format("~s~s~s", ["something1", "something2", F]),"").
"something1something2asdf"
2> lists:flatten(["something1", "something2", F]).
[115,111,109,101,116,104,105,110,103,49,115,111,109,101,116,
104,105,110,103,50,<<"asdf">>]
3>

I'm confused as to why you need the call to io_lib:format at all. It's not doing any work in this case.
string:join(["something1","something2","something3"], "").
Would give you the same result. You can simplify even further if there's really no separator character (and taking advantage of the fact that strings are just lists in Erlang):
lists:flatten(["something1", "something2", "something3"]).
Update
I see now that you're working with a list of different data types. While a one-liner may look pretty, you can see that they're not always flexible. In your case, I would create some mapper functions to handle mapping different types to strings. Maybe something like:
-module(string_utils).
-export([concat/1]).
to_string(Value) when is_binary(Value) -> binary_to_list(Value);
to_string(Value) -> Value.
concat(List) ->
lists:flatten(lists:map(fun to_string/1, List)).
And then your calling code would be:
string_utils:concat(["something1", "something2", <<"something3">>]).

What makes a good name for a helper function?

Consider the following problem: given a list of length three of tuples (String,Int), is there a pair of elements having the same "Int" part? (For example, [("bob",5),("gertrude",3),("al",5)] contains such a pair, but [("bob",5),("gertrude",3),("al",1)] does not.)
This is how I would implement such a function:
import Data.List (sortBy)
import Data.Function (on)
hasPair::[(String,Int)]->Bool
hasPair = napkin . sortBy (compare `on` snd)
where napkin [(_, a),(_, b),(_, c)] | a == b = True
| b == c = True
| otherwise = False
I've used pattern matching to bind names to the "Int" part of the tuples, but I want to sort first (in order to group like members), so I've put the pattern-matching function inside a where clause. But this brings me to my question: what's a good strategy for picking names for functions that live inside where clauses? I want to be able to think of such names quickly. For this example, "hasPair" seems like a good choice, but it's already taken! I find that pattern comes up a lot - the natural-seeming name for a helper function is already taken by the outer function that calls it. So I have, at times, called such helper functions things like "op", "foo", and even "helper" - here I have chosen "napkin" to emphasize its use-it-once, throw-it-away nature.
So, dear Stackoverflow readers, what would you have called "napkin"? And more importantly, how do you approach this issue in general?

General rules for locally-scoped variable naming.
f , k, g, h for super simple local, semi-anonymous things
go for (tail) recursive helpers (precedent)
n , m, i, j for length and size and other numeric values
v for results of map lookups and other dictionary types
s and t for strings.
a:as and x:xs and y:ys for lists.
(a,b,c,_) for tuple fields.
These generally only apply for arguments to HOFs. For your case, I'd go with something like k or eq3.
Use apostrophes sparingly, for derived values.

I tend to call boolean valued functions p for predicate. pred, unfortunately, is already taken.

In cases like this, where the inner function is basically the same as the outer function, but with different preconditions (requiring that the list is sorted), I sometimes use the same name with a prime, e.g. hasPairs'.
However, in this case, I would rather try to break down the problem into parts that are useful by themselves at the top level. That usually also makes naming them easier.
hasPair :: [(String, Int)] -> Bool
hasPair = hasDuplicate . map snd
hasDuplicate :: Ord a => [a] -> Bool
hasDuplicate = not . isStrictlySorted . sort
isStrictlySorted :: Ord a => [a] -> Bool
isStrictlySorted xs = and $ zipWith (<) xs (tail xs)

My strategy follows Don's suggestions fairly closely:
If there is an obvious name for it, use that.
Use go if it is the "worker" or otherwise very similar in purpose to the original function.
Follow personal conventions based on context, e.g. step and start for args to a fold.
If all else fails, just go with a generic name, like f
There are two techniques that I personally avoid. One is using the apostrophe version of the original function, e.g. hasPair' in the where clause of hasPair. It's too easy to accidentally write one when you meant the other; I prefer to use go in such cases. But this isn't a huge deal as long as the functions have different types. The other is using names that might connote something, but not anything that has to do with what the function actually does. napkin would fall into this category. When you revisit this code, this naming choice will probably baffle you, as you will have forgotten the original reason that you named it napkin. (Because napkins have 4 corners? Because they are easily folded? Because they clean up messes? They're found at restaurants?) Other offenders are things like bob and myCoolFunc.
If you have given a function a name that is more descriptive than go or h, then you should be able to look at either the context in which it is used, or the body of the function, and in both situations get a pretty good idea of why that name was chosen. This is where my point #3 comes in: personal conventions. Much of Don's advice applies. If you are using Haskell in a collaborative situation, then coordinate with your team and decide on certain conventions for common situations.

Find all possible variations of a string of letters

Newb programmer here, I'm most familiar with Python but also learning C and Java, so either of 3 would be fine.
What I have is a string of letters, say:
ABXDEYGH
However say,
X is possible to be M and N.
Y is possible to be P and Q.
In this example, I would like basically to print all possible variations of this string of letters.
Like:
ABMDEPGH
ABNDEPGH
ABMDEQGH
ABNDEQGH
Any help would be appreciated. Thanks in advance

This boils down to a simple problem of permutations. What you care about is the part of the text that can change; the variables. The rest can be ignored, until you want to display it.
So your question can be more simply stated: What are all the possible permutations of 1 item from set X and another item from set Y? This is known as a cross-product, sometimes also simply called a product.
Here's a possible Python solution:
import itertools
x = set(['M', 'N'])
y = set(['P', 'Q'])
for items in itertools.product(x, y)
print 'AB{0}DE{1}GH'.format(*items)
Note that the print ''.format() command uses the "unpack arguments" notation described here.

why dont you write two loops. one to replace all possible characters with X and one for Y.
foreach(char c in charSet1){
// replaces X
foreach(char ch in charSet2){
// replace Y
}
}

Efficient mass string search problem

The Problem: A large static list of strings is provided. A pattern string comprised of data and wildcard elements (* and ?). The idea is to return all the strings that match the pattern - simple enough.
Current Solution: I'm currently using a linear approach of scanning the large list and globbing each entry against the pattern.
My Question: Are there any suitable data structures that I can store the large list into such that the search's complexity is less than O(n)?
Perhaps something akin to a suffix-trie? I've also considered using bi- and tri-grams in a hashtable, but the logic required in evaluating a match based on a merge of the list of words returned and the pattern is a nightmare, furthermore I'm not convinced its the correct approach.

I agree that a suffix trie is a good idea to try, except that the sheer size of your dataset might make it's construction use up just as much time as its usage would save. Theyre best if youve got to query them multiple times to amortize the construction cost. Perhaps a few hundred queries.
Also note that this is a good excuse for parallelism. Cut the list in two and give it to two different processors and have your job done twice as fast.

you could build a regular trie and add wildcard edges. then your complexity would be O(n) where n is the length of the pattern. You would have to replace runs of ** with * in the pattern first (also an O(n) operation).
If the list of words were I am an ox then the trie would look a bit like this:
(I ($ [I])
a (m ($ [am])
n ($ [an])
? ($ [am an])
* ($ [am an]))
o (x ($ [ox])
? ($ [ox])
* ($ [ox]))
? ($ [I]
m ($ [am])
n ($ [an])
x ($ [ox])
? ($ [am an ox])
* ($ [I am an ox]
m ($ [am]) ...)
* ($ [I am an ox]
I ...
...
And here is a sample python program:
import sys
def addWord(root, word):
add(root, word, word, '')
def add(root, word, tail, prev):
if tail == '':
addLeaf(root, word)
else:
head = tail[0]
tail2 = tail[1:]
add(addEdge(root, head), word, tail2, head)
add(addEdge(root, '?'), word, tail2, head)
if prev != '*':
for l in range(len(tail)+1):
add(addEdge(root, '*'), word, tail[l:], '*')
def addEdge(root, char):
if not root.has_key(char):
root[char] = {}
return root[char]
def addLeaf(root, word):
if not root.has_key('$'):
root['$'] = []
leaf = root['$']
if word not in leaf:
leaf.append(word)
def findWord(root, pattern):
prev = ''
for p in pattern:
if p == '*' and prev == '*':
continue
prev = p
if not root.has_key(p):
return []
root = root[p]
if not root.has_key('$'):
return []
return root['$']
def run():
print("Enter words, one per line terminate with a . on a line")
root = {}
while 1:
line = sys.stdin.readline()[:-1]
if line == '.': break
addWord(root, line)
print(repr(root))
print("Now enter search patterns. Do not use multiple sequential '*'s")
while 1:
line = sys.stdin.readline()[:-1]
if line == '.': break
print(findWord(root, line))
run()

If you don't care about memory and you can afford to pre-process the list, create a sorted array of every suffix, pointing to the original word, e.g., for ['hello', 'world'], store this:
[('d' , 'world'),
('ello' , 'hello'),
('hello', 'hello'),
('ld' , 'world'),
('llo' , 'hello'),
('lo' , 'hello'),
('o' , 'hello'),
('orld' , 'world'),
('rld' , 'world'),
('world', 'world')]
Use this array to build sets of candidate matches using pieces of the pattern.
For instance, if the pattern is *or*, find the candidate match ('orld' , 'world') using a binary chop on the substring or, then confirm the match using a normal globbing approach.
If the wildcard is more complex, e.g., h*o, built sets of candidates for h and o and find their intersection before the final linear glob.

You say you're currently doing linear search. Does this give you any data on the most frequently performed query patterns? e.g. is blah* much more common than bl?h (which i'd assume it was) among your current users?
With that kind of prior knowledge you can focus your indexing efforts on the commonly used cases and get them down to O(1), rather than trying to solve the much more difficult, and yet much less worthwhile, problem of making every possible query equally fast.

You can achieve a simple speedup by keeping counts of the characters in your strings. A string with no bs or a single b can never match the query abba*, so there is no point in testing it. This works much better on whole words, if your strings are made of those, since there are many more words than characters; plus, there are plenty of libraries that can build the indexes for you. On the other hand, it is very similar to the n-gram approach you mentioned.
If you do not use a library that does it for you, you can optimize queries by looking up the most globally infrequent characters (or words, or n-grams) first in your indexes. This allows you to discard more non-matching strings up front.
In general, all speedups will be based on the idea of discarding things that cannot possibly match. What and how much to index depends on your data. For example, if the typical pattern length is near to the string length, you can simply check to see if the string is long enough to hold the pattern.

There are plenty of good algorithms for multi-string search. Google "Navarro string search" and you'll see a good analysis of multi-string options. A number of algorithsm are extremely good for "normal" cases (search strings that are fairly long: Wu-Manber; search strings with characters that are modestly rare in the text to be searched: parallel Horspool). Aho-Corasick is an algorithm that guarantees a (tiny) bounded amount of work per input character, no matter how the input text is tuned to create worst behaviour in the search. For programs like Snort, that's really important, in the face of denial-of-service attacks. If you are interested in how a really efficient Aho-Corasick search can be implemented, take a look at ACISM - an Aho-Corasick Interleaved State Matrix.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string