Using BitTorrent's DHT to perform real-time keyword searches

Using BitTorrent's DHT to perform real-time keyword searches - bittorrent

I have an idea to implement a real-time keyword-based torrent search mechanism using the existing BitTorrent DHT, and I would like to know if it is feasible and realistic.
We have a torrent, and we would like to be able to find it from a keyword using the DHT only.
H is a hash function with a 20 bytes output
infohash is the info_hash of the torrent (20 bytes)
sub(hash, i) returns 2 bytes of hash starting at byte i (for example, sub(0x62616463666568676a696c6b6e6d706f72717473, 2) = 0x6463)
announce_peer(hash, port) publishes a fake peer associated with a fake info_hash hash. The IP of the fake peer is irrelevant and we use the port number to store data (2 bytes).
get_peers(hash) retrieves fake peers associated with fake info_hash hash. Let's consider that this function returns a list of port number only.
a ++ b means concatenate a and b (for example, 0x01 ++ 0x0203 = 0x010203)
Publication
id <- sub(infohash, 0)
announce_peer( H( 0x0000 ++ 0x00 ++ keyword ), id )
announce_peer( H( id ++ 0x01 ++ keyword ), sub(infohash, 2 ))
announce_peer( H( id ++ 0x02 ++ keyword ), sub(infohash, 4 ))
announce_peer( H( id ++ 0x03 ++ keyword ), sub(infohash, 6 ))
announce_peer( H( id ++ 0x04 ++ keyword ), sub(infohash, 8 ))
announce_peer( H( id ++ 0x05 ++ keyword ), sub(infohash, 10))
announce_peer( H( id ++ 0x06 ++ keyword ), sub(infohash, 12))
announce_peer( H( id ++ 0x07 ++ keyword ), sub(infohash, 14))
announce_peer( H( id ++ 0x08 ++ keyword ), sub(infohash, 16))
announce_peer( H( id ++ 0x09 ++ keyword ), sub(infohash, 18))
Search
ids <- get_peers(H( 0x0000 ++ 0x00 ++ keyword ))
foreach (id : ids)
{
part1 <- get_peers(H( id ++ 0x01 ++ keyword ))[0]
part2 <- get_peers(H( id ++ 0x02 ++ keyword ))[0]
part3 <- get_peers(H( id ++ 0x03 ++ keyword ))[0]
part4 <- get_peers(H( id ++ 0x04 ++ keyword ))[0]
part5 <- get_peers(H( id ++ 0x05 ++ keyword ))[0]
part6 <- get_peers(H( id ++ 0x06 ++ keyword ))[0]
part7 <- get_peers(H( id ++ 0x07 ++ keyword ))[0]
part8 <- get_peers(H( id ++ 0x08 ++ keyword ))[0]
part9 <- get_peers(H( id ++ 0x09 ++ keyword ))[0]
result_infohash <- id ++ part1 ++ part2 ++ ... ++ part9
print("search result:" ++ result_infohash)
}
I know there would be collisions with id (2 bytes only), but with relatively specific keywords it should work...
We could also build more specific keywords by concatenating several words in alphanumeric order. For example, if we have words A, B and C associated with a torrent, we could publish keywords A, B, C, A ++ B, A ++ C, B ++ C and A ++ B ++ C.
So, is this awful hack feasible :D ? I know that Retroshare is using BitTorrent's DHT.

It is unlikely to be practical because it does not even try to be efficient (number of lookups) or reliable (failure rate multiplied by number of lookups). And that is for a single keyword, not boolean queries which would blow up the lookup complexity even further.
Not to mention that it doesn't even solve the hard problems of distributed searching such as avoiding spam and censoring.
Additional problems are that each node could only publish one torrent under a keyword and it would require multiple nodes to somehow coordinate what they publish under which keyword before they run into the collision problem.
Of course you might be able to make it work in a handful of instances, but that is is irrelevant because uses of p2p protocols should be designed in a way such that they still work in that case that all nodes nodes used that feature in a similar fashion. Clearly a (m * n * 10)-fold [m = torrents per keyword, n = number of search terms] blowup of network traffic is not acceptable.
If you are seriously interested in distributed keyword search I recommend that you hit google scholar and arxiv and look for existing research, it is a non-trivial topic.
For bittorrent specifically you should also look beyond BEP 5. BEP 44 provides arbitrary data storage, BEPs 46, 49 and 51 describe additional building blocks and abstractions. But I would consider none of them sufficient for a realtime distributed multi-keyword search as one would expect it from a local database or an indexing website.

Related

My first haskell: best inline way to make a "natural language" listing of items? (like "1, 2, 3 and 4")

For my first line of Haskell I thought it'd be a nice case to produce a "natural listing" of items (of which the type supports show to get a string representation). By "natural listing" I mean summing up all items separated with , except the last one, which should read and lastitem. Ideally, I'd also like to not have a , before the "and".
To spice it up a bit (to show off the compactness of haskell), I wanted to have an "inline" solution, such that I can do
"My listing: " ++ ... mylist ... ++ ", that's our listing."
(Obviously for "production" making a function for that would be better in all ways, and allow for recursion naturally, but that's the whole point of my "inline" restriction for this exercise.)
For now I came up with:
main = do
-- hello
nicelist
nicelist = do
let is = [1..10]
putStrLn $ "My listing: " ++ concat [ a++b | (a,b) <- zip (map show is) (take (length is -1) $ repeat ", ") ++ [("and ", show $ last is)]] ++ ", that's our listing."
let cs = ["red", "green", "blue", "yellow"]
putStrLn $ "My listing: " ++ concat [ a++b | (a,b) <- zip (map show cs) (take (length cs -1) $ repeat ", ") ++ [("and ", show $ last cs)]] ++ ", that's our listing."
but this hardly seems optimal or elegant.
I'd love to hear your suggestions for a better solution.
EDIT:
Inspired by the comments and answer, I dropped the inline requirement and came up with the following, which seems pretty sleek. Would that be about as "haskellic" as we can get, or would there be improvements?
main = do
putStrLn $ "My listing: " ++ myListing [1..10] ++ ", that's the list!"
putStrLn $ "My listing: " ++ myListing ["red", "green", "blue", "yellow"] ++ ", that's the list!"
myListing :: (Show a) => [a] -> String
myListing [] = "<nothing to list>"
myListing [x] = "only " ++ (show x)
myListing [x, y] = (show x) ++ " and " ++ (show y)
myListing (h:t) = (show h) ++ ", " ++ myListing t

Here's how I would write it:
import Data.List
niceShow' :: [String] -> String
niceShow' [] = "<empty>"
niceShow' [a] = a
niceShow' [a, b] = a ++ " and " ++ b
niceShow' ls = intercalate ", " (init ls) ++ ", and " ++ last ls
niceShow :: [String] -> String
niceShow ls = "My listing: " ++ niceShow' ls ++ ", that's our listing."
niceList :: IO ()
nicelist = do
putStrLn $ niceShow $ show <$> [1..10]
putStrLn $ niceShow ["red", "green", "blue", "yellow"]
Steps:
Create niceShow to create your string
Replace list comprehensions with good old function calls
Know about intercalate and init
Add type signatures to top levels
Format nicely
niceShow can only be inlined if you know the size of the list beforehand, otherwise, you'd be skipping the edge cases.

Another way to state the rules for punctuating a list (without an Oxford comma) is this:
Append a comma after every element except the last two
Append “and” after the second-to-last element
Leave the final element unchanged
This can be implemented by zipping the list with a “pattern” list containing the functions to perform the modifications, which repeats on one end. We want something like:
repeat (<> ",") <> [(<> " and"), id]
But of course this is just an infinite list of the comma function, so it will never get past the commas and on to the “and”. One solution is to reverse both the pattern list and the input list, and use zipWith ($) to combine them. But we can avoid the repeated reversals by using foldr to zip “in reverse” (actually, just right-associatively) from the tail end of the input. Then the result is simple:
punctuate :: [String] -> [String]
punctuate = zipBack
$ [id, (<> " and")] <> repeat (<> ",")
zipBack :: [a -> b] -> [a] -> [b]
zipBack fs0 = fst . foldr
(\ x (acc, f : fs) -> (f x : acc, fs))
([], fs0)
Example uses:
> test = putStrLn . unwords . punctuate . words
> test "this"
this
> test "this that"
this and that
> test "this that these"
this, that and these
> test "this that these those them"
this, that, these, those and them
There are several good ways to generalise this:
zipBack is partial—it assumes the function list is infinite, or at least as long as the string list; consider different ways you could make it total, e.g. by modifying fs0 or the lambda
The punctuation and conjunction can be made into parameters, so you could use e.g. semicolons and “or”
zipBack could work for more general types of lists, Foldable containers, and functions (i.e. zipBackWith)
String could be replaced with an arbitrary Semigroup or Monoid
There’s also a cute specialisation possible—if you want to add the option to include an Oxford comma, its presence in the “pattern” (function list) depends on the length of the final list, because it should not be included for lists of 2 elements. Now, if only we could refer to the eventual result of a computation while computing it…

Histogram counting apostrophes as a word

I am to create a histogram which counts the top 20 most common words, excluding the top 20 in the world. This is the result i get below:
import Data.List(sort, group, sortBy)
toWordList = words
countCommonWords wordList = length (filter isCommon wordList)
where isCommon word = elem word commonWords
dropCommonWords wordList = filter isUncommon wordList
where isUncommon w = notElem w commonWords
commonWords = ["the","and","have","not","as","be","a","I","on", "you","to","in","it","with","do","of","that","for","he","at"]
countWords wordList = map (\w -> (head w, length w)) $group $ sort wordList
compareTuples (w1, n1) (w2, n2) = if n1 < n2 then LT else if n1> n2 then GT else EQ
sortWords wordList = reverse $ sortBy compareTuples wordList
toAsteriskBar x = (replicate (snd x) '*') ++ " -> " ++ (fst x) ++ "\n"
makeHistogram wordList = concat $ map toAsteriskBar (take 20 wordList)
--Do word list
text = "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only. there were a king with a large jaw and a queen with a plain face, on the throne of England; there were a king with a large jaw and a queen with a fair face, on the throne of France. In both countries it was clearer than crystal to the lords of the State preserves of loaves and fishes, that things in general were settled for ever of."
main = do
let wordlist = toWordList text
putStrLn "Report:"
putStrLn ("\t" ++ (show $ length wordlist) ++ " words")
putStrLn ("\t" ++ (show $ countCommonWords wordlist) ++ " common words")
putStrLn "\nHistogram of the most frequent words (excluding common words):\n"
putStr $ makeHistogram $ sortWords $ countWords $ dropCommonWords $ wordlist
Result:
Report:
186 words
71 common words
Histogram of the most frequent words (excluding common words):
************ -> was
***** -> were
**** -> we
** -> us,
** -> times,
** -> throne
** -> there
** -> season
** -> queen
** -> large
** -> king
** -> jaw
** -> its
** -> had
** -> going
** -> face,
** -> epoch
** -> direct
** -> before
** -> all
Does anybody know why the counter is counting any word with an apostrophe attached eg. us, as a whole word?

In Brief
toWordList = words
This is the function I'd modify to sanitize your words. For example, toWordList = map (filter isAlpha) . words so you get only those characters in words that are alphabetical instead of all blocks of characters that are divided by spaces (which is what words does). EDIT: isAlpha is from the Data.Char module which you'd need to import. Edited the above snippet to add map too.
Epilog
Moving forward, I'm just going to make some code comments because why not.
import Data.List(sort, group, sortBy)
Yay, using pre-existing code. You will probably also want comparing from Data.Ord.
countCommonWords wordList = length (filter isCommon wordList)
where isCommon word = elem word commonWords
dropCommonWords wordList = filter isUncommon wordList
where isUncommon w = notElem w commonWords
These operations are O(n * m) wherenis the length of wordList andmis the length ofcommonWords`. You could make this faster by using a Set if you desire.
commonWords = ["the","and","have","not","as","be","a","I"
,"on","you","to","in","it","with","do","of","that"
,"for","he","at"]
countWords wordList = map (\w -> (head w, length w)) $ group $ sort wordList
A similar performance comment here. A common method is to use Data.Map.insertWith to keep a counter for each word.
compareTuples (w1, n1) (w2, n2) = if n1 < n2 then LT else if n1> n2 then GT else EQ
This is more easily spelled compareTuples = comparing fst

Datatypes that are represented by a choice of one of many other datatypes

I am attempting to generate a deck for a toy implementation of Exploding Kittens.
Say I had the following types:
data ActionCard = Skip
| Attack
| Shuffle
| Nope
| Future
| Defuse
| Favor
deriving Enum
data BasicCard = TacoCat
| MommaCat
| Catermelon
| BearCat
| PotatoCat
| BikiniCat
| RainboRalphingCat
deriving Enum
data Card = ActionCard | BasicCard | BombCard
type Deck = [Card]
and a deck generator function like so:
generateDeck :: Int -> Deck
generateDeck players = (concat (map (replicate 5) [TacoCat ..]))
++ (replicate 2 Nope)
++ (replicate 4 Skip)
++ (replicate 4 Attack)
++ (replicate 4 Shuffle)
++ (replicate 4 Future)
++ (replicate 1 Defuse)
++ (replicate 4 Favor)
++ (replicate (players + 1) BombCard)
This fails with:
Couldn't match expected type ‘[BasicCard]’
with actual type ‘a7 -> [a7]’
Probable cause: ‘replicate’ is applied to too few arguments
In the first argument of ‘(+)’, namely
‘replicate (length $ _players state)’
In the second argument of ‘(++)’, namely
‘(replicate (length $ _players state) + 1 BombCard)’
(and similiar errors for hte other non basic cards)
That makes sense on one level as (concat (map (replicate 5) [TacoCat ..])) returns a [BasicCard], however I would have expected the function signature to force a more generic type?
How do allow for Card to be either an ActionCard, a BasicCard, or a BombCard?

data Card = ActionCard | BasicCard | BombCard
This creates a new datatype Card with three constructors called ActionCard, BasicCard and BombCard. This has nothing to do with the other two datatypes that are called ActionCard or BasicCard; the namespace of types and constructors is distinct.
What you want to do is to define Card as either being an Action comprising an ActionCard, or a Basic BasicCard, or a BombCard:
data Card = Action ActionCard | Basic BasicCard | BombCard
then you can make your Deck by wrapping each card type in its correct constructor:
generateDeck :: Int -> Deck
generateDeck players = basics ++ actions ++ bombs
where
cats = concatMap (replicate 5 . Basic) [TacoCat ..]
actions = map Action . concat $
[ replicate 2 Nope
, replicate 4 Skip
, replicate 4 Attack
, replicate 4 Shuffle
, replicate 4 Future
, replicate 1 Defuse
, replicate 4 Favor
]
bombs = replicate (players + 1) BombCard

mapping multiple functions in haskell

I'm working on a way of representing memory in Haskell that looks like this...
data MemVal = Stored Value | Unbound
deriving Show
type Memory = ([Ide],Ide -> MemVal)
As an Identifier is called its added to the list of Identifiers. If an error occurs in the program I want to be able to recall the identifiers used up to date. So far I have this...
display :: Memory -> String
display m = "Memory = " ++ show (map (snd m) (fst m)) ++ " "
But was wondering if there were a way to map the name of the identifier to (fst m) as well as the function (snd m) so the output will be similar to...
Memory = [sum = stored Numeric 1, x = stored Boolean true]
Thank you.

You probably want something like this
display :: Memory -> String
display (ides, mem) =
"Memory = [" ++ unwords (map (\x -> x ++ "=" ++ mem x) ides) ++ "]"

I'm guessing this is what you are after:
import Data.List (intercalate)
display (Memory ids f) = "Memory = [" ++ (intercalates ", " assigns) ++ "]"
where assigns = [ show i ++ " = " ++ show (f i) | i <- ids ]
Here assigns is a list like:
[ "sum = stored Numeric 1", "x = stored Boolean true", ...]
and intercalate ", " assigns joins the strings together.
I've used destructuring to avoid having to refer to fst ... and snd ...

Is there a better way of doing this in Haskell?

I have written the following to assist grand kids with their home schooling work and to keep mind working by learning how to program (I thought haskell sounded awesome).
main :: IO ()
main = do
putStrLn "Please enter the dividend :"
inputx <- getLine
putStrLn "Please enter the divisor :"
inputy <- getLine
let x = (read inputx) :: Int
let y = (read inputy) :: Int
let z = x `div` y
let remain = x `mod` y
putStrLn ( "Result: " ++ show x ++ " / " ++ show y ++ " = " ++ show z ++ " remainder " ++ show remain )
putStrLn ( "Proof: (" ++ show y ++ " x " ++ show z ++ ") = " ++ show (y * z) ++ " + " ++ show remain ++ " = " ++ show ((y * z) + remain))
putStrLn ( "Is this what you had? ")
Is their a neater/nicer/better/more compact way of doing this?

It would benefit from a key principle: separate your pure code from your IO as much as possible. This will let your programs scale up and keep main breif. Lots of let in a big main isn't a very functional approach and tends to get much messier as your code grows.
Using a type signature and readLn which is essentially fmap read getLine helps cut down some cruft. (If you're not familiar with fmap, visit the question How do functors work in haskell?. fmap is a very flexible tool indeed.)
getInts :: IO (Int, Int)
getInts = do
putStrLn "Please enter the dividend :"
x <- readLn
putStrLn " Please enter the divisor :"
y <- readLn
return (x,y)
Now the processing. If I were doing more with this kind of data, or more frequently, I'd be using a record type to store the dividend, divisor, quotient and remainder, so bear that in mind for the future, but it's an overkill here.
I'm hackishly returning a list rather than a tuple, so I can use map to show them all:
sums :: (Int, Int) -> [Int]
sums (x,y) = [x, y, q, r, y * q, y * q + r] where
q = x `div` y
r = x `mod` y
The final piece of the jigsaw is the output. Again I prefer to generate this outside IO and then I can just mapM_ putStrLn on it later to print each line. I'd prefer this to take the record type, but I'm tolerating a list of strings as input instead since I'm assuming I've already shown them all.
explain :: [String] -> [String]
explain [x,y,q,r,yq,yq_r] =
[ concat ["Result: ", x, " / ", y, " = ", q, " remainder ", r]
, concat ["Proof: (", y, " x ", q, ") + ", r, " = ", yq, " + ", r, " = ", yq_r]
, "Is this what you had? "]
Now we can write main as
main = do (x,y) <- getInts
let ns = map show ( sums (x,y) )
es = explain ns
mapM_ putStrLn es
or even more succinctly, by piping together the functions explain . map show . sums, and applying that to the output of getInts using fmap:
main :: IO ()
main = fmap (explain . map show . sums) getInts
>>= mapM_ putStrLn
You might notice that I added a +r in the proof to make = always mean =, which is the correct mathematical usage, and mirror's Haskell's meaning for =.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string