Searchable tagged tables in lua? - search

So I need to search tables based on a number of tags.
It needs to create lists of matches based on the number of tags.
So if we had a match with 4 tags that would be a list, 3 tags another list, 2 another.
How would you implement this with lua tables?
I don't want too complicated things,if there is a library or a interface with a database that is not complicated to setup, fine.
But if not I can accept a native solution at the cost of speed and memory.
What i mean by tags is this
Lets say we have table
T[1] ={cat,mouse bat, fly, car, airplane,glider}
A few of those terms like the bat,fly,airplane,glider will have a tag flyable.
Another tag could be machine for the car and airplane.
Another tag is animals: cat,mouse,bat,fly
So if you search with both tags flyable + machine you get airpalne
If you search animal+flyable you get bat and fly.
So what I need is a structure to contain this tag information that lets me easily search.

Sets might be the type of data structure you are looking for. Have a look at Set as explained in the Programming in Lua book.

function findtable(table,value)
for k,v in pairs(table) do
if (v == value) or (k == value) then
return true
end
end
return false
end
function tagged_flyable(value)
local flyable_table = {'bat','fly','airplane','glider'}
if(findtable(flyable_table,value) == true) then
return true
else
return false
end
end
function tagged_animals(value)
local animals_table = {'cat','mouse','bat','fly'}
if(findtable(animals_table,value) == true) then
return true
else
return false
end
end
function tagged_machines(value)
local machines_table = {'car', 'airplane'}
if(findtable(machines_table,value) == true) then
return true
else
return false
end
end
-- main process
local T_1 = {'cat','mouse', 'bat', 'fly', 'car', 'airplane', 'glider'}
local search_results = {}
-- search for tag: flyable+machine
for i=1,table.getn(T_1) do
if(tagged_machines(T_1[i]) and tagged_flyable(T_1[i])) then
table.insert(search_results, T_1[i])
print("found :", T_1[i])
end
end
-- search for tag: flyable+animals
search_results = {}
for i=1,table.getn(T_1) do
if(tagged_animals(T_1[i]) and tagged_flyable(T_1[i])) then
table.insert(search_results, T_1[i])
print("found :", T_1[i])
end
end

The easiest would be a table of tags, plus two results tables. Table of tags:
T = {cat={"animal", "legs"}, bat={"animal", "wings"}, ...}
Result table is just regular table of which objects have a certain tag:
res1 = get(T, "wings")
print(res1) -- prints cat bat plane
res2 = get(T, "machine")
print(res2) -- prints car train plane
Then a function that finds intersection of both results:
bothTags = getIntersection(res1,res2)
The getIntersection() just has to loop over first table res1 and test if res2[itemFromFirstTable] is nil, if not then you have an item in both tables.

Related

Iterating thourgh a SRT file until index is found

This might sound like "Iterate through file until condition is met" question (which I have already checked), but it doesn't work for me.
Given a SRT file (any) as srtDir, I want to go to the index choice and get timecode values and caption values.
I did the following, which is supposed to iterate though the SRT file until condition is met:
import os
srtDir = "./media/srt/001.srt"
index = 100 #Index. Number is an examaple
found = False
with open(srtDir, "r") as SRT:
print(srtDir)
content = SRT.readlines()
content = [x.strip() for x in content]
for x in content:
print(x)
if x == index:
print("Found")
found = True
break
if not found:
print("Nothing was found")
As said, it is supposed to iterate until Index is found, but it returns "Nothing is found", which is weird, because I can see the number printed on screen.
What did I do wrong?
(I have checked libraries, AFAIK, there's no one that can return timecode and captions given the index)
You have a type mismatch in your code: index is an int but x in your loop is a str. In Python, 100 == "100" evaluates to False. The solution to this kind of bug is to adopt a well-defined data model and write library methods that apply it consistently.
However, with something like this, it's best not to reinvent the wheel and let other people do the boring work for you.
import srt
# Sample SRT file
raw = '''\
1
00:31:37,894 --> 00:31:39,928
OK, look, I think I have a plan here.
2
00:31:39,931 --> 00:31:41,931
Using mainly spoons,
3
00:31:41,933 --> 00:31:43,435
we dig a tunnel under the city and release it into the wild.
'''
# Parse and get index
subs = list(srt.parse(raw))
def get_index(n, subs_list):
for i in subs_list:
if i.index == n:
return i
return None
s = get_index(2, subs)
print(s)
See:
https://github.com/cdown/srt
https://srt.readthedocs.io/en/latest/quickstart.html
https://srt.readthedocs.io/en/latest/api.html

(Beginner Python assignment help) Search input list

I have just started learning python and i have been given an assignment to create a list of players and stats using different loops.
I cant work out how to create a function that searches the player list and gives an output of the players name and the players stat.
Here is the assignment:
Create an empty list called players
Use two input() statements inside a for loop to collect the name
and performance of each player (the name will be in the form of a
string and the performance as an integer from 0 – 100.) Add both
pieces of information to the list (so in the first iteration of the
loop players[0] will contain the name of the first player and
players[1] will contain their performance.) You are not required to
validate this data.
Use a while loop to display all the player information in the
following form:
Player : Performance
Use a loop type of your choice to copy the performance values from
the players list and store these items in a new list called results
Write a function that accepts the values “max” or “min” and
returns the maximum or minimum values from the results list
Write a function called find_player() that accepts a player name
and displays their name and performance from the players list, or an
error message if the player is not found.
Here is what I have so far:
print ("Enter 11 Player names and stats")
# Create player list
playerlist = []
# Create results list
results = []
# for loop setting amount of players and collecting input/appending list
for i in range(11):
player = (input("Player name: "))
playerlist.append(player)
stats = int(input("Player stats: "))
playerlist.append(stats)
# While loop printing player list
whileLoop = True
while whileLoop == True:
print (playerlist)
break
# for loop append results list, [start:stop:step]
for i in range(11):
results.append(playerlist[1::2])
break
# max in a custom function
def getMax(results):
results = (playerlist[1::2])
return max(results)
print ("Max Stat",getMax(results))
# custom function to find player
def find_player(playerlist):
list = playerlist
name = str(input("Search keyword: "))
return (name)
for s in list:
if name in str(s):
return (s)
print (find_player(playerlist))
I have tried many different ways to create the find player function without success.
I think I am having problems because my list consists of strings and integers eg. ['john', 6, 'bill', 8]
I would like it to display the player that was searched for and the stats ['John', 6]
Any help would be greatly appreciated.
PS:
I know there is no need for all these loops but that is what the assignment seems to be asking for.
Thank you
I cut down on the fat and made a "dummy list", but your find_player function seems to work well, once you remove the first return statement! Once you return something, the function just ends.
All it needs is to also display the performance like so:
# Create player list
playerlist = ["a", 1, "b", 2, "c", 3]
# custom function to find player
def find_player(playerlist):
name = str(input("Search keyword: "))
searchIndex = 0
for s in playerlist:
try:
if name == str(s):
return ("Player: '%s' with performance %d" % (name, playerlist[searchIndex+1]))
except Exception as e:
print(e)
searchIndex += 1
print (find_player(playerlist))
>>Search keyword: a
>>Player: 'a' with performance 1
I also added a try/except in case something goes wrong.
Also: NEVER USE "LIST" AS A VARIABLE NAME!
Besides, you already have an internal name for it, so why assign it another name. You can just use playerlist inside the function.
Your code didn't work because you typed a key and immediately returned it. In order for the code to work, you must use the key to find the value. In this task, it is in the format of '' key1 ', value1,' key2 ', value2, ...]. In the function, index is a variable that stores the position of the key. And it finds the position of key through loop. It then returns list [index + 1] to return the value corresponding to the key.
playerlist = []
def find_player(playerlist):
list = playerlist
name = str(input("Search keyword: "))
index = 0
for s in list:
if name == str(s):
return ("This keyword's value: %d" % (list[index+1]))
index+=1
print (find_player(playerlist))

How to make loop or another function to repeat the same for loop again and again?

The content I am getting from Dmoz website. The for loop which I have made, I don't want to repeat it every time. Instead I want to make a way not to repeat with every function. There are many functions which I have created. I shared one just to show you that the zip is binding the values of names and finder. And I don't want for loop to be repeated in every function.
def parse_about(self, response):
# do your stuff on second page
items = response.meta['items']
names = {'name1': 'Headings',
'name2': 'Paragraphs',
'name3': '3 Projects',
'name4': 'About Dmoz',
'name5': 'Languages',
'name6': 'You can make a differnce',
'name7': 'Further Information'
}
finder = {'find1': 'h2::text , #mainContent h1::text',
'find2': 'p::text',
'find3': 'li~ li+ li b a::text , li:nth-child(1) b a::text',
'find4': '.nav ul a::text , li:nth-child(2) b a::text',
'find5': '.nav~ .nav a::text',
'find6': 'dd::text , #about-contribute::text',
'find7': 'li::text , #about-more-info a::text'
}
for name, find in zip(names.values(), finder.values()):
items[name] = response.css(find).extract()
yield items
Assuming the current code works as expected and the other functions differ only by the contents of the names and finder dictionaries, you can abstract out the loop with a function like this:
def find_items(response, names, finder):
items = response.meta['items']
for name, find in zip(names.values(), finder.values()):
items[name] = response.css(find).extract()
yield items
Then if you are using Python 3.3 or higher, you can use a yield from statement in the calling functions like so:
yield from find_items(reponse, names1, finder1)

How do I store the value of an indexed list in a global variable and call at it through a formatted function?

How do I store the value of an index and then use that value in a formatted exec function to print me the second results of each list under class Animal(): Dog list, which is what I expect to print. A simplified version of the essence of my problem along with further clarification below:
class Global():
str_list = []
current_word = ""
adj_word = 'poofy'
adj_int = 0
size = 0
pounds = 0
dog_years = 0
class Animal():
##### Formatted like so:[[visual size],[pounds],[age in dog years],[almost dead]] #####
dog = [['small', 'poofy'],[7, 45],[18, 101],[0, 1]]
input = 'dog'
def done():
print(Global.adj_int)
print(str(Global.size), str(Global.pounds), str(Global.dog_years))
def split_str():
Global.str_list = input.split()
split_str()
def analyze():
Global.current_word = Global.str_list.pop(0)
exec(f"""if Global.adj_word in Animal.{Global.current_word}[0]:
Global.adj_int = Animal.{Global.current_word}[0].index('{Global.adj_word}')
Global.size = Animal.{Global.current_word}[1][{Global.adj_int}]
Global.pounds = Animal.{Global.current_word}[2][{Global.adj_int}]
Global.dog_years = Animal.{Global.current_word}[3][{Global.adj_int}]""")
if len(Global.str_list) == 0:
done()
analyze()
it returns:
1
7 18 0
When I expect it to return "45 101 1" for size, pounds, dog_years because I am storing the .index value of 'poofy' for Animal.dog list in Global.adj_int. which in this case is '1'. Then I try to format the code so it uses that value to print the second values of each list but for some reason it will not print the expected results(under def analyze():... exec(f""".... Does anyone have an answer to this question?? This is a much more simple version of what I originally have but produces the exact same result. when I try to use the formatted code it acts as if adj_int = 0 when really it's adj_int = 1 (and I know it is stored as 1 like it should be because I print adj_int at the end to check) or I am not able to format the code in this way? But I need a work around regardless.
The problem is that the string argument to exec is being evaluated before it is executed. So, when you are calling exec this is what is called:
exec(f"""if Global.adj_word in Animal.dog[0]:
Global.adj_int = Animal.{dog}[0].index('poofy')
Global.size = Animal.dog[1][0]
Global.pounds = Animal.dog[2][0]
Global.dog_years = Animal.dog[3][0]""")
And after this, Global.adj_int becomes 1. The control flow and the structure of your code is incredibly complex comparing to its simplicity so I would carefully rethink its design, but for a quick fix, you probably want to first execute the part that sets the adj_int and then the rest, like this:
exec(f"""if Global.adj_word in Animal.{Global.current_word}[0]:
Global.adj_int = Animal.{Global.current_word}[0].index('{Global.adj_word}'"""))
exec(f"""if Global.adj_word in Animal.{Global.current_word}[0]:
Global.size = Animal.{Global.current_word}[1][{Global.adj_int}]
Global.pounds = Animal.{Global.current_word}[2][{Global.adj_int}]
Global.dog_years = Animal.{Global.current_word}[3][{Global.adj_int}]""")
Even after stating that your program is unnecessarily complex, let me additionally point out that using a global, mutable state is a really really bad practice, which makes your programs hard to follow, maintain & debug. The same concern relates to using exec.

How to Transpose an rdd in pyspark (list is not a matrix)

I have an rdd where the list of strings are ['abc', 'ccd', 'xyz'...'axd']
When I "print rdd.take(2), I am expecting it to return me ['abc', 'ccd'], but instead it gives me everything. I am very new to spark or python so please forgive me if this is a dumb question. is there a way to transpose this list to rows?
Eventually i need to convert this into a dataframe and insert into a hive table.
Here is a pice of my code
domainsrdd = zonerdd.reduceByKey(lambda x,y: x + ' ' + y).map(lambda a: (a[0], a[1].split(' ')))
print domainsrdd.take(2)
[(u'COOL', [u'shirtmaker.cool', u'videocandy.cool', u'the-happy-factory.cool', u'vic.cool', u'atl.cool',...... u'booze.cool'])]
def sampler(l, tldvar):
tld = l[0]
domain_data = l[1]
domains = []
ct = tldvar.value[tld]
for item in domain_data:
domains.extend([item])
if len(domains) == ct:
break
return domains
domainslist = domainsrdd.map(lambda l: sampler(l, tldvar))
print domainslist.take(2) # still returns everything
[[u'shirtmaker.cool', u'videocandy.cool', u'the-happy-factory.cool',...])]
Long story short, i am trying to loop thru a set of domains grouped by tld's and producing a sample of those domain names, tldvar is a dictionary which has set of domains i need to return for a specific tld. TLD = com, net, org etc!
domainslist is of type RDD[Array[String]], so when you do a take, you're going to get an Array[Array[String]]. Which, in your case is filled with arrays that are never limited down based on what you are saying (len(domains) == ct is never true)
This is resolved - I have used a flatmap instead of map. Basically this worked
domainslist = domainsrdd.flatmap(lambda l: sampler(l, tldvar))

Resources