How to search Lua table values - search

I have a project that calls for a relational database like structure in an environment where an actual database isn't possible. The language is restricted to Lua, which is far from being my strongest language. I've got a table of tables with a structure like this:
table={
m:r={
x=1
y=1
displayName="Red"
}
m:y={
x=1
y=2
displayName="Yellow"
}
}
Building, storing and retrieving the table is straightforward enough. Where I'm running into issues is searching it. For the sake of clarity, if I could use SQL I'd do this:
SELECT * FROM table WHERE displayName="Red"
Is there a Lua function that will let me search this way?

The straightforward way is to iterate through all elements and find one that matches your criteria:
local t={
r={
x=1,
y=1,
displayName="Red",
},
y={
x=1,
y=2,
displayName="Yellow",
},
}
for key, value in pairs(t) do
if value.displayName == 'Red' then
print(key)
end
end
This should print 'r'.
This may be quite slow on large tables. To speed up this process, you may keep track of the references in a hash that will provide much faster access. Something like this may work:
local cache = {}
local function findValue(key)
if cache[key] == nil then
local value
-- do a linear search iterating through table elements searching for 'key'
-- store the result if found
cache[key] = value
end
return cache[key]
end
If the elements in the table change their values, you'll need to invalidate the cache when the values are updated or removed.

There are no built-in functions for searching tables. There are many ways to go about it which vary in complexity and efficiency.
local t = {
r={displayname="Red", name="Ruby", age=15, x=4, y=10},
y={displayname="Blue", name="Trey", age=22, x=3, y=2},
t={displayname="Red", name="Jack", age=20, x=2, y=3},
h={displayname="Red", name="Tim", age=25, x=2, y=33},
v={displayname="Blue", name="Bonny", age=10, x=2, y=0}
}
In Programming in Lua they recommend building a reverse table for efficient look ups.
revDisplayName = {}
for k,v in pairs(t) do
if revDisplayName[v.displayname] then
table.insert(revDisplayName[v.displayname], k)
else
revDisplayName[v] = {k}
end
end
Then you can match display names easily
for _, rowname in pairs(revDisplayName["Red"]) do
print(t[rowname].x, t[rowname].y)
end
There is code for creating SQL-like queries in Lua, on Lua tables, in Beginning Lua Programming if you want to build complex queries.
If you just want to search through a few records for matches, you can abstract the searching using an iterator in Lua
function allmatching(tbl, kvs)
return function(t, key)
repeat
key, row = next(t, key)
if key == nil then
return
end
for k, v in pairs(kvs) do
if row[k] ~= v then
row = nil
break
end
end
until row ~= nil
return key, row
end, tbl, nil
end
which you can use like so:
for k, row in allmatching(t, {displayname="Red", x=2}) do
print(k, row.name, row.x, row.y)
end
which prints
h Tim 2 33
t Jack 2 3

Related

Why is my merge sort algorithm not working?

I am implementing the merge sort algorithm in Python. Previously, I have implemented the same algorithm in C, it works fine there, but when I implement in Python, it outputs an unsorted array.
I've already rechecked the algorithm and code, but to my knowledge the code seems to be correct.
I think the issue is related to the scope of variables in Python, but I don't have any clue for how to solve it.
from random import shuffle
# Function to merge the arrays
def merge(a,beg,mid,end):
i = beg
j = mid+1
temp = []
while(i<=mid and j<=end):
if(a[i]<a[j]):
temp.append(a[i])
i += 1
else:
temp.append(a[j])
j += 1
if(i>mid):
while(j<=end):
temp.append(a[j])
j += 1
elif(j>end):
while(i<=mid):
temp.append(a[i])
i += 1
return temp
# Function to divide the arrays recursively
def merge_sort(a,beg,end):
if(beg<end):
mid = int((beg+end)/2)
merge_sort(a,beg,mid)
merge_sort(a,mid+1,end)
a = merge(a,beg,mid,end)
return a
a = [i for i in range(10)]
shuffle(a)
n = len(a)
a = merge_sort(a, 0, n-1)
print(a)
To make it work you need to change merge_sort declaration slightly:
def merge_sort(a,beg,end):
if(beg<end):
mid = int((beg+end)/2)
merge_sort(a,beg,mid)
merge_sort(a,mid+1,end)
a[beg:end+1] = merge(a,beg,mid,end) # < this line changed
return a
Why:
temp is constructed to be no longer than end-beg+1, but a is the initial full array, if you managed to replace all of it, it'd get borked quick. Therefore we take a "slice" of a and replace values in that slice.
Why not:
Your a luckily was not getting replaced, because of Python's inner workings, that is a bit tricky to explain but I'll try.
Every variable in Python is a reference. a is a reference to a list of variables a[i], which are in turn references to a constantant in memory.
When you pass a to a function it makes a new local variable a that points to the same list of variables. That means when you reassign it as a=*** it only changes where a points. You can only pass changes outside either via "slices" or via return statement
Why "slices" work:
Slices are tricky. As I said a points to an array of other variables (basically a[i]), that in turn are references to a constant data in memory, and when you reassign a slice it goes trough the slice element by element and changes where those individual variables are pointing, but as a inside and outside are still pointing to same old elements the changes go through.
Hope it makes sense.
You don't use the results of the recursive merges, so you essentially report the result of the merge of the two unsorted halves.

Scala string manipulation

I have the following Scala code :
val res = for {
i <- 0 to 3
j <- 0 to 3
if (similarity(z(i),z(j)) < threshold) && (i<=j)
} yield z(j)
z here represents Array[String] and similarity(z(i),z(j)) calculates similarity between two strings.
This problems works like that similarity is calculated between 1st string and all the other strings and then similarity is calculated between 2nd string and all other strings except for first and then similarity for 3rd string and so on.
My requirement is that if 1st string matches with 3rd, 4th and 8th string, then
all these 3 strings shouldn't participate in loops further and loop should jump to 2nd string, then 5th string, 6th string and so on.
I am stuck at this step and don't know how to proceed further.
I am presuming that your intent is to keep the first String of two similar Strings (eg. if 1st String is too similar to 3rd, 4th, and 8th Strings, keep only the 1st String [out of these similar strings]).
I have a couple of ways to do this. They both work, in a sense, in reverse: for each String, if it is too similar to any later Strings, then that current String is filtered out (not the later Strings). If you first reverse the input data before applying this process, you will find that the desired outcome is produced (although in the first solution below the resulting list is itself reversed - so you can just reverse it again, if order is important):
1st way (likely easier to understand):
def filterStrings(z: Array[String]) = {
val revz = z.reverse
val filtered = for {
i <- 0 to revz.length if !revz.drop(i+1).exists(zz => similarity(zz, revz(i)) < threshold)
} yield revz(i)
filtered.reverse // re-reverses output if order is important
}
The 'drop' call is to ensure that each String is only checked against later Strings.
2nd option (fully functional, but harder to follow):
val filtered = z.reverse.foldLeft((List.empty[String],z.reverse)) { case ((acc, zt), zz) =>
(if (zt.tail.exists(tt => similarity(tt, zz) < threshold)) acc else zz :: acc, zt.tail)
}._1
I'll try to explain what is going on here (in case you - or any readers - aren't use to following folds):
This uses a fold over the reversed input data, starting from the empty String (to accumulate results) and the (reverse of the) remaining input data (to compare against - I labeled it zt for "z-tail").
The fold then cycles through the data, checking each entry against the tail of the remaining data (so it doesn't get compared to itself or any earlier entry)
If there is a match, just the existing accumulator (labelled acc) will be allowed through, otherwise, add the current entry (zz) to the accumulator. This updated accumulator is paired with the tail of the "remaining" Strings (zt.tail), to ensure a reducing set to compare against.
Finally, we end up with a pair of lists: the required remaining Strings, and an empty list (no Strings left to compare against), so we take the first of these as our result.
If I understand correctly, you want to loop through the elements of the array, comparing each element to later elements, and removing ones that are too similar as you go.
You can't (easily) do this within a simple loop. You'd need to keep track of which items had been filtered out, which would require another array of booleans, which you update and test against as you go. It's not a bad approach and is efficient, but it's not pretty or functional.
So you need to use a recursive function, and this kind of thing is best done using an immutable data structure, so let's stick to List.
def removeSimilar(xs: List[String]): List[String] = xs match {
case Nil => Nil
case y :: ys => y :: removeSimilar(ys filter {x => similarity(y, x) < threshold})
}
It's a simple-recursive function. Not much to explain: if xs is empty, it returns the empty list, else it adds the head of the list to the function applied to the filtered tail.

Most efficient interval type search in Elixir

I am starting my journey with Elixir and am looking for some advice on how best to approach a particular problem.
I have a data set that needs to be searched as quickly as possible. The data consists of two numbers that form an enclosed band and some meta data associated with each band.
For example:
From,To,Data
10000,10999,MetaData1
11000,11999,MetaData2
12000,12499,MetaData3
12500,12999,MetaData4
This data set could have upwards of 100,000 entries.
I have a struct defined that models the data, along with a parser that creates an Elixir list in-memory representation.
defmodule Band do
defstruct from: 0, to: 0, metadata: 0
end
The parser returns a list of the Band struct. I defined a find method that uses a list comprehension
defp find_metadata(bands, number) do
match? = fn(x) -> x.from <= number and x.to >= number end
[match | _ ] = for band <- bands, match?.(band), do: band
{ :find, band }
end
Based on my newbie knowledge, using the list comprehension will require a full traversal of the list. To avoid scanning the full list, I have used search trees in other languages.
Is there an algorithm/mechanism/approach available in Elixir that would a more efficient approach for this type of search problem?
Thank you.
If the bands are mutually exclusive you could structure them into a tree sorted by from. Searching through that tree should take log(n) time. Something like the following should work:
defmodule Tree do
defstruct left: nil, right: nil, key: nil, value: nil
def empty do
nil
end
def insert(tree, value = {key, _}) do
cond do
tree == nil -> %Tree{left: empty, right: empty, key: key, value: value}
key < tree.key -> %{tree | left: insert(tree.left, value)}
true -> %{tree | right: insert(tree.right, value)}
end
end
def find_interval(tree, value) do
cond do
tree == nil -> nil
value < tree.key -> find_interval(tree.left, value)
between(tree.value, value) -> tree.value
true -> find_interval(tree.right, value)
end
end
def between({left, right}, value) do
value >= left and value <= right
end
end
Note that you can also use Ranges to store the "bands" as you call them. Also note that the tree isn't balanced. A simple scheme to (probably) achieve a balanced tree is to shuffle the intervals before inserting them. Otherwise you'd need to have a more complex implementation that balances the tree. You can look at erlang's gb_trees for inspiration.

Ordered iteration in map string string

In the Go blog, this is how to print the map in order.
http://blog.golang.org/go-maps-in-action
import "sort"
var m map[int]string
var keys []int
for k := range m {
keys = append(keys, k)
}
sort.Ints(keys)
for _, k := range keys {
fmt.Println("Key:", k, "Value:", m[k])
}
but what if I have the string keys like var m map[string]string
I can't figure out how to print out the string in order(not sorted, in order of string creation in map container)
The example is at my playground http://play.golang.org/p/Tt_CyATTA3
as you can see, it keeps printing the jumbled strings, so I tried map integer values to map[string]string but I still could not figure out how to map each elements of map[string]string.
http://play.golang.org/p/WsluZ3o4qd
Well, the blog mentions that iteration order is randomized:
"...When iterating over a map with a range loop, the iteration order is not specified and is not guaranteed to be the same from one iteration to the next"
The solution is kind of trivial, you have a separate slice with the keys ordered as you need:
"...If you require a stable iteration order you must maintain a separate data structure that specifies that order."
So, to work as you expect, create an extra slice with the correct order and the iterate the result and print in that order.
order := []string{"i", "we", "he", ....}
func String(result map[string]string) string {
for _, v := range order {
if present in result print it,
}
... print all the Non-Defined at the end
return stringValue
}
See it running here: http://play.golang.org/p/GsDLXjJ0-E

whats another way to write python3 zip [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Ive been working on a code that reads lines in a file document and then the code organizes them. However, i got stuck at one point and my friend told me what i could use. the code works but it seems that i dont know what he is doing at line 7 and 8 FROM THE BOTTOM. I used #### so you guys know which lines it is.
So, essentially how can you re-write those 2 lines of codes and why do they work? I seem to not understand dictionaries
from sys import argv
filename = input("Please enter the name of a file: ")
file_in=(open(filename, "r"))
print("Number of times each animal visited each station:")
print("Animal Id Station 1 Station 2")
animaldictionary = dict()
for line in file_in:
if '\n' == line[-1]:
line = line[:-1]
(a, b, c) = line.split(':')
ac = (a,c)
if ac not in animaldictionary:
animaldictionary[ac] = 0
animaldictionary[ac] += 1
alla = []
for key, value in animaldictionary:
if key not in alla:
alla.append(key)
print ("alla:",alla)
allc = []
for key, value in animaldictionary:
if value not in allc:
allc.append(value)
print("allc", allc)
for a in sorted(alla):
print('%9s'%a,end=' '*13)
for c in sorted(allc):
ac = (a,c)
valc = 0
if ac in animaldictionary:
valc = animaldictionary[ac]
print('%4d'%valc,end=' '*19)
print()
print("="*60)
print("Animals that visited both stations at least 3 times: ")
for a in sorted(alla):
x = 'false'
for c in sorted(allc):
ac = (a,c)
count = 0
if ac in animaldictionary:
count = animaldictionary[ac]
if count >= 3:
x = 'true'
if x is 'true':
print('%6s'%a, end=' ')
print("")
print("="*60)
print("Average of the number visits in each month for each station:")
#(alla, allc) =
#for s in zip(*animaldictionary.keys()):
# (alla,allc).append(s)
#print(alla, allc)
(alla,allc,) = (set(s) for s in zip(*animaldictionary.keys())) ##### how else can you write this
##### how else can you rewrite the next code
print('\n'.join(['\t'.join((c,str(sum(animaldictionary.get(ac,0) for a in alla for ac in ((a,c,),))//12)))for c in sorted(allc)]))
print("="*60)
print("Month with the maximum number of visits for each station:")
print("Station Month Number")
print("1")
print("2")
The two lines you indicated are indeed rather confusing. I'll try to explain them as best I can, and suggest alternative implementations.
The first one computes values for alla and allc:
(alla,allc,) = (set(s) for s in zip(*animaldictionary.keys()))
This is nearly equivalent to the loops you've already done above to build your alla and allc lists. You can skip it completely if you want. However, lets unpack what it's doing, so you can actually understand it.
The innermost part is animaldictionary.keys(). This returns an iterable object that contains all the keys of your dictionary. Since the keys in animaldictionary are two-valued tuples, that's what you'll get from the iterable. It's actually not necessary to call keys when dealing with a dictionary in most cases, since operations on the keys view are usually identical to doing the same operation on the dictionary directly.
Moving on, the keys gets wrapped up by a call to the zip function using zip(*keys). There's two things happening here. First, the * syntax unpacks the iterable from above into separate arguments. So if animaldictionary's keys were ("a1", "c1), ("a2", "c2"), ("a3", "c3") this would call zip with those three tuples as separate arguments. Now, what zip does is turn several iterable arguments into a single iterable, yielding a tuple with the first value from each, then a tuple with the second value from each, and so on. So zip(("a1", "c1"), ("a2", "c2"), ("a3", "c3")) would return a generator yielding ("a1", "a2", "a3") followed by ("c1", "c2", "c3").
The next part is a generator expression that passes each value from the zip expression into the set constructor. This serves to eliminate any duplicates. set instances can also be useful in other ways (e.g. finding intersections) but that's not needed here.
Finally, the two sets of a and c values get assigned to variables alla and allc. They replace the lists you already had with those names (and the same contents!).
You've already got an alternative to this, where you calculate alla and allc as lists. Using sets may be slightly more efficient, but it probably doesn't matter too much for small amounts of data. Another, more clear, way to do it would be:
alla = set()
allc = set()
for key in animaldict: # note, iterating over a dict yields the keys!
a, c = key # unpack the tuple key
alla.add(a)
allc.add(c)
The second line you were asking about does some averaging and combines the results into a giant string which it prints out. It is really bad programming style to cram so much into one line. And in fact, it does some needless stuff which makes it even more confusing. Here it is, with a couple of line breaks added to make it all fit on the screen at once.
print('\n'.join(['\t'.join((c,str(sum(animaldictionary.get(ac,0)
for a in alla for ac in ((a,c,),))//12)
)) for c in sorted(allc)]))
The innermost piece of this is for ac in ((a,c,),). This is silly, since it's a loop over a 1-element tuple. It's a way of renaming the tuple (a,c) to ac, but it is very confusing and unnecessary.
If we replace the one use of ac with the tuple explicitly written out, the new innermost piece is animaldictionary.get((a,c),0). This is a special way of writing animaldictionary[(a, c)] but without running the risk of causing a KeyError to be raised if (a, c) is not in the dictionary. Instead, the default value of 0 (passed in to get) will be returned for non-existant keys.
That get call is wrapped up in this: (getcall for a in alla). This is a generator expression that gets all the values from the dictionary with a given c value in the key
(with a default of zero if the value is not present).
The next step is taking the average of the values in the previous generator expression: sum(genexp)//12. This is pretty straightforward, though you should note that using // for division always rounds down to the next integer. If you want a more precise floating point value, use just /.
The next part is a call to '\t'.join, with an argument that is a single (c, avg) tuple. This is an awkward construction that could be more clearly written as c+"\t"+str(avg) or "{}\t{}".format(c, avg). All of these result in a string containing the c value, a tab character and the string form of the average calcualted above.
The next step is a list comprehension, [joinedstr for c in sorted(allc)] (where joinedstr is the join call in the previous step). Using a list comprehension here is a bit odd, since there's no need for a list (a generator expression would do just as well).
Finally, the list comprehension is joined with newlines and printed: print("\n".join(listcomp)). This is straightforward.
Anyway, this whole mess can be rewritten in a much clearer way, by using a few variables and printing each line separately in a loop:
for c in sorted(allc):
total_values = sum(animaldictionary.get((a,c),0) for a in alla)
average = total_values // 12
print("{}\t{}".format(c, average))
To finish, I have some general suggestions.
First, your data structure may not be optimal for the uses you are making of you data. Rather than having animaldict be a dictionary with (a,c) keys, it might make more sense to have a nested structure, where you index each level separately. That is, animaldict[a][c]. It might even make sense to have a second dictionaries containing the same values indexed in the reverse order (e.g. one is indexed [a][c] while another is indexed [c][a]). With this approach you might not need the alla and allc lists for iterating (you'd just loop over the contents of the main dictionary directly).
My second suggestion is about code style. Many of your variables are named poorly, either because their names don't have any meaning (e.g. c) or where the names imply a meaning that is incorrect. The most glaring issue is your key and value variables, which in fact unpack two pieces of the key (AKA a and c). In other situations you can get keys and values together, but only when you are iterating over a dictionary's items() view rather than on the dictionary directly.

Resources