I really need help in writing this function in Haskell, I don't even know where to start. Here are the specs:
Define a function flagpattern that takes a positive Int value greater than or equal to five and returns a String that can be displayed as the following `flag' pattern of dimension n, e.g.
Main> putStr (flagpattern 7)
#######
## ##
# # # #
# # #
# # # #
## ##
#######
Assuming you want a "X" enclosed in 4 lines, you need to write a function that given a coordinate (x,y) returns what character should be at that position:
coordinate n x y = if i == 0 then 'X' else ' '
(This version outputs only the leftmost X'es, modify it, remember indices start with 0)
Now you want them nicely arranged in a matrix, use a list comprehension, described in the linked text.
You should start from your problem definition:
main :: IO ()
main = putStr . flagPattern $ 7
Then, you should ask yourself about how much dots flag has:
flagPattern :: Int -> String
flagPattern = magic $ [1..numberOfDots]
Then, (hard) part of magic function should decide for each dot whether it is or #:
partOfMagic ...
| ... = "#" -- or maybe even "#\n" in some cases?
| otherwise = " "
Then, you can concatenate parts into one string and get the answer.
Start with the type signature.
flagpattern :: Int -> String
Now break the problem into subproblems. For example, suppose I told you to produce row 2 of a size 7 flag pattern. You would write:
XX XX
Or row 3 of a size 7 flag pattern would be
X X X X
So suppose we had a function that could produce a given row. Then we'd have
flagpattern :: Int -> String
flagpattern size = unlines (??? flagrow ???)
flagrow :: Int -> Int -> String
flagrow row size = ???
unlines takes a list of Strings and turns it into a single String with newlines between each element of the list. See if you can define flagrow, and get it working correctly for any given row and size. Then see if you can use flagrow to define flagpattern.
Related
Today I was working on this code which outputs the average value of a series of "arrays", the data is inputted in this format:
3 #Number of arrays to get the average from
2 3 0 #First array
4 5 0 #Second array
1 4 5 0 #Third array
I worked on a code that outputs the data, but realized that it prints it like this:
2 #Average (Int) of the 1st array
4 #Average (Int) of the 2nd array
3 #Average (Int) of the 3rd array
(Take into account that the 0 at the end of every array is not used when calculated the average, it only exists for means of indicating end of array)
My question is, How can I properly change my code so that I can output the data like this? :
2 4 3
Here is the code I've been working in:
sumList :: [Int] -> Int sumList [] = 0 sumList (u:v) = u + sumList v
funavg :: Int -> IO () funavg numitint = if numitint==0 then return ()
else do
arrs <- getLine
let arrnum = (map read(words arrs) :: [Int])
let total = sumList arrnum
let avg = div total ((length arrnum)-1)
print avg
funavg (numitint - 1)
main :: Prelude.IO () main = do
numits <- getLine
let numitint = read numits :: Int
funavg numitint
I've searched many documents and websites, but can't come with an ideal answer.
Using recursion is mandatory.
Any help is highly appreciated :D
print is equivalent to putStrLn . show and is provided for convenience to print a single value of any Show type.
print does not have a standard library companion which omits the newline, but putStrLn does: it's called putStr. Instead of print avg, consider
putStr $ show avg
I am trying to convert a string of varchar to ascii. Then i'm trying to make it so any number that's not 3 digits has a 0 in front of it. then i'm trying to add a 1 to the very beginning of the string and then i'm trying to make it a large number that I can apply math to it.
I've tried a lot of different coding techniques. The closest I've gotten is below:
s = 'Ak'
for c in s:
mgk = (''.join(str(ord(c)) for c in s))
num = [mgk]
var = 1
num.insert(0, var)
mgc = lambda num: int(''.join(str(i) for i in num))
num = mgc(num)
print(num)
With this code I get the output: 165107
It's almost doing exactly what I need to do but it's taking out the 0 from the ord(A) which is 65. I want it to be 165. everything else seems to be working great. I'm using '%03d'% to insert the 0.
How I want it to work is:
Get the ord() value from a string of numbers and letters.
if the ord() value is less than 100 (ex: A = 65, add a 0 to make it a 3 digit number)
take the ord() values and combine them into 1 number. 0 needs to stay in from of 65. then add a one to the list. so basically the output will look like:
1065107
I want to make sure I can take that number and apply math to it.
I have this code too:
s = 'Ak'
for c in s:
s = ord(c)
s = '%03d'%s
mgk = (''.join(str(s)))
s = [mgk]
var = 1
s.insert(0, var)
mgc = lambda s: int(''.join(str(i) for i in s))
s = mgc(s)
print(s)
but then it counts each letter as its own element and it will not combine them and I only want the one in front of the very first number.
When the number is converted to an integer, it
Is this what you want? I am kinda confused:
a = 'Ak'
result = '1' + ''.join(str(f'{ord(char):03d}') for char in a)
print(result) # 1065107
# to make it a number just do:
my_int = int(result)
Here I am trying to find the index of '-' followed by '}' in a String.
For an input like sustringIndex "abcd -} sad" it gives me an output of 10
which is giving me the entire string length.
Also if I do something like sustringIndex "abcd\n -} sad" it gives me 6
Why is that so with \n. What am I doing wrong. Please correct me I'm a noob.
substrIndex :: String -> Int
substrIndex ""=0
substrIndex (s:"") = 0
substrIndex (s:t:str)
| s== '-' && t == '}' = 0
| otherwise = 2+(substrIndex str)
Your program has a bug. You are checking every two characters. But, what if the - and } are in different pairs, for example S-}?
It will first check S and - are equal to - and } respectively.
Since they don't match, it will move on with } alone.
So, you just need to change the logic a little bit, like this
substrIndex (s:t:str)
| s == '-' && t == '}' = 0
| otherwise = 1 + (substrIndex (t:str))
Now, if the current pair doesn't match -}, then just skip the first character and proceed with the second character, substrIndex (t:str). So, if S- doesn't match, your program will proceed with -}. Since we dropped only one character we add only 1, instead of 2.
This can be shortened and written clearly, as suggested by user2407038, like this
substrIndex :: String -> Int
substrIndex [] = 0
substrIndex ('-':'}':_) = 0
substrIndex (_:xs) = 1 + substrIndex xs
I want to find the pattern from any position in any given string such that the pattern repeats for a threshold number of times at least.
For example for the string "a0cc0vaaaabaaaabaaaabaa00bvw" the pattern should come out to be "aaaab". Another example: for the string "ff00f0f0f0f0f0f0f0f0000" the pattern should be "0f".
In both cases threshold has been taken as 3 i.e. the pattern should be repeated for at least 3 times.
If someone can suggest an optimized method in R for finding a solution to this problem, please do share with me. Currently I am achieving this by using 3 nested loops, and it's taking a lot of time.
Thanks!
Use regular expressions, which are made for this type of stuff. There may be more optimized ways of doing it, but in terms of easy to write code, it's hard to beat. The data:
vec <- c("a0cc0vaaaabaaaabaaaabaa00bvw","ff00f0f0f0f0f0f0f0f0000")
The function that does the matching:
find_rep_path <- function(vec, reps) {
regexp <- paste0(c("(.+)", rep("\\1", reps - 1L)), collapse="")
match <- regmatches(vec, regexpr(regexp, vec, perl=T))
substr(match, 1, nchar(match) / reps)
}
And some tests:
sapply(vec, find_rep_path, reps=3L)
# a0cc0vaaaabaaaabaaaabaa00bvw ff00f0f0f0f0f0f0f0f0000
# "aaaab" "0f0f"
sapply(vec, find_rep_path, reps=5L)
# $a0cc0vaaaabaaaabaaaabaa00bvw
# character(0)
#
# $ff00f0f0f0f0f0f0f0f0000
# [1] "0f"
Note that with threshold as 3, the actual longest pattern for the second string is 0f0f, not 0f (reverts to 0f at threshold 5). In order to do this, I use back references (\\1), and repeat these as many time as necessary to reach threshold. I need to then substr the result because annoyingly base R doesn't have an easy way to get just the captured sub expressions when using perl compatible regular expressions. There is probably a not too hard way to do this, but the substr approach works well in this example.
Also, as per the discussion in #G. Grothendieck's answer, here is the version with the cap on length of pattern, which is just adding the limit argument and the slight modification of the regexp.
find_rep_path <- function(vec, reps, limit) {
regexp <- paste0(c("(.{1,", limit,"})", rep("\\1", reps - 1L)), collapse="")
match <- regmatches(vec, regexpr(regexp, vec, perl=T))
substr(match, 1, nchar(match) / reps)
}
sapply(vec, find_rep_path, reps=3L, limit=3L)
# a0cc0vaaaabaaaabaaaabaa00bvw ff00f0f0f0f0f0f0f0f0000
# "a" "0f"
find.string finds substring of maximum length subject to (1) substring must be repeated consecutively at least th times and (2) substring length must be no longer than len.
reps <- function(s, n) paste(rep(s, n), collapse = "") # repeat s n times
find.string <- function(string, th = 3, len = floor(nchar(string)/th)) {
for(k in len:1) {
pat <- paste0("(.{", k, "})", reps("\\1", th-1))
r <- regexpr(pat, string, perl = TRUE)
if (attr(r, "capture.length") > 0) break
}
if (r > 0) substring(string, r, r + attr(r, "capture.length")-1) else ""
}
and here are some tests. The last test processes the entire text of James Joyce's Ulysses in 1.4 seconds on my laptop:
> find.string("a0cc0vaaaabaaaabaaaabaa00bvw")
[1] "aaaab"
> find.string("ff00f0f0f0f0f0f0f0f0000")
[1] "0f0f"
>
> joyce <- readLines("http://www.gutenberg.org/files/4300/4300-8.txt")
> joycec <- paste(joyce, collapse = " ")
> system.time(result <- find.string2(joycec, len = 25))
user system elapsed
1.36 0.00 1.39
> result
[1] " Hoopsa boyaboy hoopsa!"
ADDED
Although I developed my answer before having seen BrodieG's, as he points out they are very similar to each other. I have added some features of his to the above to get the solution below and tried the tests again. Unfortunately when I added the variation of his code the James Joyce example no longer works although it does work on the other two examples shown. The problem seems to be in adding the len constraint to the code and may represent a fundamental advantage of the code above (i.e. it can handle such a constraint and such constraints may be essential for very long strings).
find.string2 <- function(string, th = 3, len = floor(nchar(string)/th)) {
pat <- paste0(c("(.", "{1,", len, "})", rep("\\1", th-1)), collapse = "")
r <- regexpr(pat, string, perl = TRUE)
ifelse(r > 0, substring(string, r, r + attr(r, "capture.length")-1), "")
}
> find.string2("a0cc0vaaaabaaaabaaaabaa00bvw")
[1] "aaaab"
> find.string2("ff00f0f0f0f0f0f0f0f0000")
[1] "0f0f"
> system.time(result <- find.string2(joycec, len = 25))
user system elapsed
0 0 0
> result
[1] "w"
REVISED The James Joyce test that was supposed to be testing find.string2 was actually using find.string. This is now fixed.
Not optimized (even it is fast) function , but I think it is more R way to do this.
Get all patterns of certains length > threshold : vectorized using mapply and substr
Get the occurrence of these patterns and extract the one with maximum occurrence : vectorized using str_locate_all.
Repeat 1-2 this for all lengths and tkae the one with maximum occurrence.
Here my code. I am creating 2 functions ( steps 1-2) and step 3:
library(stringr)
ss = "ff00f0f0f0f0f0f0f0f0000"
ss <- "a0cc0vaaaabaaaabaaaabaa00bvw"
find_pattern_length <-
function(length=1,ss){
patt = mapply(function(x,y) substr(ss,x,y),
1:(nchar(ss)-length),
(length+1):nchar(ss))
res = str_locate_all(ss,unique(patt))
ll = unlist(lapply(res,length))
list(patt = patt[which.max(ll)],
rep = max(ll))
}
get_pattern_threshold <-
function(ss,threshold =3 ){
res <-
sapply(seq(threshold,nchar(ss)),find_pattern_length,ss=ss)
res[,which.max(res['rep',])]
}
some tests:
get_pattern_threshold('ff00f0f0f0f0f0f0f0f0000',5)
$patt
[1] "0f0f0"
$rep
[1] 6
> get_pattern_threshold('ff00f0f0f0f0f0f0f0f0000',2)
$patt
[1] "f0"
$rep
[1] 18
Since you want at least three repetitions, there is a nice O(n^2) approach.
For each possible pattern length d cut string into parts of length d. In case of d=5 it would be:
a0cc0
vaaaa
baaaa
baaaa
baa00
bvw
Now look at each pairs of subsequent strings A[k] and A[k+1]. If they are equal then there is a pattern of at least two repetitions. Then go further (k+2, k+3) and so on. Finally you also check if suffix of A[k-1] and prefix of A[k+n] fit (where k+n is the first string that doesn't match).
Repeat it for each d starting from some upper bound (at most n/3).
You have n/3 possible lengths, then n/d strings of length d to check for each d. It should give complexity O(n (n/d) d)= O(n^2).
Maybe not optimal but I found this cutting idea quite neat ;)
For a bounded pattern (i.e not huge) it's best I think to just create all possible substrings first and then count them. This is if the sub-patterns can overlap. If not change the step fun in the loop.
pat="a0cc0vaaaabaaaabaaaabaa00bvw"
len=nchar(pat)
thr=3
reps=floor(len/2)
# all poss strings up to half length of pattern
library(stringr)
pat=str_split(pat, "")[[1]][-1]
str.vec=vector()
for(win in 2:reps)
{
str.vec= c(str.vec, rollapply(data=pat,width=win,FUN=paste0, collapse=""))
}
# the max length string repeated more than 3 times
tbl=table(str.vec)
tbl=tbl[tbl>=3]
tbl[which.max(nchar(names(tbl)))]
aaaabaa
3
NB Whilst I'm lazy and append/grow the str.vec here in a loop, for a larger problem I'm pretty sure the actual length of str.vec is predetermined by the length of the pattern if you care to work it out.
Here is my solution, it's not optimized (build vector with patterns <- c() ; pattern <- c(patterns, x) for example) and can be improve but simpler than yours, I think.
I can't understand which pattern exactly should (I just return the max) be returned but you can adjust the code to what you want exactly.
str <- "a0cc0vaaaabaaaabaaaabaa00bvw"
findPatternMax <- function(str){
nb <- nchar(str):1
length.patt <- rev(nb)
patterns <- c()
for (i in 1:length(nb)){
for (j in 1:nb[i]){
patterns <- c(patterns, substr(str, j, j+(length.patt[i]-1)))
}
}
patt.max <- names(which(table(patterns) == max(table(patterns))))
return(patt.max)
}
findPatternMax(str)
> findPatternMax(str)
[1] "a"
EDIT :
Maybe you want the returned pattern have a min length ?
then you can add a nchar.patt parameter for example :
nchar.patt <- 2 #For a pattern of 2 char min
nb <- nb[length.patt >= nchar.patt]
length.patt <- length.patt[length.patt >= nchar.patt]
If we have string A of length N and string B of length M, where M < N, can I quickly compute the minimum number of letters I have to remove from string A so that string B does not occur as a substring in A?
If we have tiny string lengths, this problem is pretty easy to brute force: you just iterate a bitmask from 0 to 2^N and see if B occurs as a substring in this subsequence of A. However, when N can go up to 10,000 and M can go up to 1,000, this algorithm obviously falls apart quickly. Is there a faster way to do this?
Example: A=ababaa B=aba. Answer=1.Removing the second a in A will result in abbaa, which does not contain B.
Edit: User n.m. posted a great counter example: aabcc and abc. We want to remove the single b, because removing any a or c will create another instance of the string abc.
Solve it with dynamic programming. Let dp[i][j] the minimum operator to make A[0...i-1] have a suffix of B[0...j-1] as well as A[0...i] doesn't contain B, dp[i][j] = Infinite to index the operator is impossible. Then
if(A[i-1]=B[i-1])
dp[i][j] = min(dp[i-1][j-1], dp[i-1][j])
else dp[i][j]=dp[i-1][j]`,
return min(A[N][0],A[N][1],...,A[N][M-1]);`
Can you do a graph search on the string A. This is probably too slow for large N and special input but it should work better than an exponential brute force algorithm. Maybe a BFS.
I'm not sure this question is still of someone interest, but I have an idea that maybe could work.
once we decided that the problem is not to find the substring, is to decide which letter is more convenient to remove from string A, the solution to me appears pretty simple: if you find an occurrence of B string into A, the best thing you can do is just remove a char that is inside the string, closed to the right bondary...let say the one previous the last. That's why if you have a substring that actually end how it starts, if you remove a char at the beginning you just remove one of the B occurencies, while you can actually remove two at once.
Algorithm in pseudo cose:
String A, B;
int occ_posit = 0;
N = B.length();
occ_posit = A.getOccurrencePosition(B); // pseudo function that get the first occurence of B into A and returns the offset (1° byte = 1), or 0 if no occurence present.
while (occ_posit > 0) // while there are B into A
{
if (B.firstchar == B.lastchar) // if B starts as it ends
{
if (A.charat[occ_posit] == A.charat[occ_posit+1])
A.remove[occ_posit - 1]; // no reason to remove A[occ_posit] here
else
A.remove[occ_posit]; // here we remove the last char, so we could remove 2 occurencies at the same time
}
else
{
int x = occ_posit + N - 1;
while (A.charat[x + 1] == A.charat[x])
x--; // find the first char different from the last one
A.remove[x]; // B does not ends as it starts, so if there are overlapping instances they overlap by more than one char. Removing the first that is not equal to the char following B instance, we kill both occurrencies at once.
}
}
Let's explain with an example:
A = "123456789000987654321"
B = "890"
read this as a table:
occ_posit: 123456789012345678901
A = "123456789000987654321"
B = "890"
first occurrence is at occ_posit = 8. B does not end as it starts, so it get into the second loop:
int x = 8 + 3 - 1 = 10;
while (A.charat[x + 1] == A.charat[x])
x--; // find the first char different from the last one
A.remove[x];
the while find that A.charat11 matches A.charat[10] (="0"), so x become 9 and then while exits as A.charat[10] does not match A.charat9. A then become:
A = "12345678000987654321"
with no more occurencies in it.
Let's try with another:
A = "abccccccccc"
B = "abc"
first occurrence is at occ_posit = 1. B does not end as it starts, so it get into the second loop:
int x = 1 + 3 - 1 = 3;
while (A.charat[x + 1] == A.charat[x])
x--; // find the first char different from the last one
A.remove[x];
the while find that A.charat4 matches A.charat[3] (="c"), so x become 2 and then while exits as A.charat[3] does not match A.charat2. A then become:
A = "accccccccc"
let's try with overlapping:
A = "abcdabcdabff"
B = "abcdab"
the algorithm results in: A = "abcdacdabff" that has no more occurencies.
finally, one letter overlap:
A = "abbabbabbabba"
B = "abba"
B end as it starts, so it enters the first if:
if (A.charat[occ_posit] == A.charat[occ_posit+1])
A.remove[occ_posit - 1]; // no reason to remove A[occ_posit] here
else
A.remove[occ_posit]; // here we remove the last char, so we could remove 2 occurencies at the same time
that lets the last "a" of B instance to be removed. So:
1° step: A= "abbbbabbabba"
2° step: A= "abbbbabbbba" and we are done.
Hope this helps
EDIT: pls note that the algotirhm must be corrected a little not to give error when you are close to the A end with your search, but this is just an easy programming issue.
Here's a sketch I've come up with.
First, if A contains any symbols that are not found in B, split up A into a bunch of smaller strings containing only those characters found in B. Apply the algorithm on each of the smaller strings, then glue them back together to get the total result. This really functions as an optimization.
Next, check if A contains any of B. If there isn't, you're done. If A = B, then delete all of them.
I think a relatively greedy algorithm may work.
First, mark all of the symbols in A which belong to at least one occurrence of B. Let A = aabcbccabcaa, B = abc. Bolding indicates these marked characters:
a abc bcc abc aa. If there's an overlap, mark all possible. This operation is naively approximately (A-B) operations, but I believe it can be done in around (A/B) operations.
Consider the deletion of each marked letter in A: a abc bcc abc aa.
Check whether the deletion of that marked letter decreases the number of marked letters. You only need to check the substrings which could possibly be affected by the deletion of the letter. If B has a length of 4, only the substrings starting at the following locations would need to be deleted if x were being checked:
-------x------
^^^^
Any further left or right will exist regardless of the presence of x.
For instance:
Marking the [a] in the following string: a [a]bc bcc abc aa.
Its deletion yields abcbccabcaa, which when marked produces abc bcc abc aa, which has an equal number of marked characters. Since only the relative number is required for this operation, it can be done in approximately 2B time for each selected letter. For each, assign the relative difference between the two. Pick an arbitrary one which is maximal and delete it. Repeat until done. Each pass is roughly up to 2AB operations, for a maximum of A passes, giving a total time of about 2A^2 B.
In the above example, these values are assigned:
aabcbccabcaa
033 333
So arbitrarily deleting the first marked b gives you: aacbccabcaa. If you repeat the process, you get:
aacbccabcaa
333
The final result is done.
I believe the algorithm is correctly minimal. I think it is true that whenever A requires only one deletion, the algorithm must be optimal. In that case, the letter which reduces the most possible matches (ie: all of them) should be best. I can come up with no such proof, though. I'd be interested in finding any counter-examples to optimality.
Find the indeces of each substring in the main string.
Then using a dynamic programming algorithm (so memoize intermediate values), remove each letter that is part of a substring from the main string, add 1 to the count, and repeat.
You can find the letters, because they are within the indeces of each match index + length of B.
A = ababaa
B = aba
count = 0
indeces = (0, 2)
A = babaa, aabaa, abbaa, abbaa, abaaa, ababa
B = aba
count = 1
(2nd abbaa is memoized)
indeces = (1), (1), (), (), (0), (0, 2)
answer = 1
You can take it a step further, and try to memoize the substring match indeces of substrings, but that might not actually be a performance gain.
Not sure on the exact bounds, but shouldn't take too long computationally.