Iterating thourgh a SRT file until index is found - python-3.x

This might sound like "Iterate through file until condition is met" question (which I have already checked), but it doesn't work for me.
Given a SRT file (any) as srtDir, I want to go to the index choice and get timecode values and caption values.
I did the following, which is supposed to iterate though the SRT file until condition is met:
import os
srtDir = "./media/srt/001.srt"
index = 100 #Index. Number is an examaple
found = False
with open(srtDir, "r") as SRT:
print(srtDir)
content = SRT.readlines()
content = [x.strip() for x in content]
for x in content:
print(x)
if x == index:
print("Found")
found = True
break
if not found:
print("Nothing was found")
As said, it is supposed to iterate until Index is found, but it returns "Nothing is found", which is weird, because I can see the number printed on screen.
What did I do wrong?
(I have checked libraries, AFAIK, there's no one that can return timecode and captions given the index)

You have a type mismatch in your code: index is an int but x in your loop is a str. In Python, 100 == "100" evaluates to False. The solution to this kind of bug is to adopt a well-defined data model and write library methods that apply it consistently.
However, with something like this, it's best not to reinvent the wheel and let other people do the boring work for you.
import srt
# Sample SRT file
raw = '''\
1
00:31:37,894 --> 00:31:39,928
OK, look, I think I have a plan here.
2
00:31:39,931 --> 00:31:41,931
Using mainly spoons,
3
00:31:41,933 --> 00:31:43,435
we dig a tunnel under the city and release it into the wild.
'''
# Parse and get index
subs = list(srt.parse(raw))
def get_index(n, subs_list):
for i in subs_list:
if i.index == n:
return i
return None
s = get_index(2, subs)
print(s)
See:
https://github.com/cdown/srt
https://srt.readthedocs.io/en/latest/quickstart.html
https://srt.readthedocs.io/en/latest/api.html

Related

How to check if strings in two list are almost equal using python

I'm trying to find the strings in two list that almost match. Suppose there are two list as below
string_list_1 = ['apple_from_2018','samsung_from_2017','htc_from_2015','nokia_from_2010','moto_from_2019','lenovo_decommision_2017']
string_list_2 =
['apple_from_2020','samsung_from_2021','htc_from_2015','lenovo_decommision_2017']
Output
Similar = ['apple_from_2018','samsung_from_2017','htc_from_2015','lenovo_decommision_2017']
Not Similar =['nokia_from_2010','moto_from_2019']
I tried above one using below implementation but it is not giving proper result
similar = []
not_similar = []
for item1 in string_list_1:
for item2 in string_list_2:
if SequenceMatcher(a=item1,b=item2).ratio() > 0.90:
similar.append(item1)
else:
not_similar.append(item1)
When I tried above implementation it is not as expected. It would be appreciated if someone could identify the missing part and to get required result
You may make use of the following function in order to find similarity between two given strings
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
print(similar("apple_from_2018", "apple_from_2020"))
Output :
0.8666666666666667
Thus using this function you may select the strings which cross the threshold value of percentage similarity. Although you may need to reduce your threshold from 90 to maybe 85 in order to get the expected output.
Thus the following code should work fine for you
string_list_1 = ['apple_from_2018','samsung_from_2017','htc_from_2015','nokia_from_2010','moto_from_2019','lenovo_decommision_2017']
string_list_2 = ['apple_from_2020','samsung_from_2021','htc_from_2015','lenovo_decommision_2017']
from difflib import SequenceMatcher
similar = []
not_similar = []
for item1 in string_list_1:
# Set the state as false
found = False
for item2 in string_list_2:
if SequenceMatcher(None, a=item1,b=item2).ratio() > 0.80:
similar.append(item1)
found = True
break
if not found:
not_similar.append(item1)
print("Similar : ", similar)
print("Not Similar : ", not_similar)
Output :
Similar : ['apple_from_2018', 'samsung_from_2017', 'htc_from_2015', 'lenovo_decommision_2017']
Not Similar : ['nokia_from_2010', 'moto_from_2019']
This does cut down on the amount of time and redundant appends. Also I have reduced the similarity measure to 80 since 90 was too high. But feel free to tweak the values.

Converting a csv file containing pixel values to it's equivalent images

This is my first time working with such a dataset.
I have a .csv file containing pixel values (48x48 = 2304 columns) of images, with their labels in the first column and the pixels in the subsequent ones, as below:
A glimpse of the dataset
I want to convert these pixels into their images, and store them into different directories corresponding to their respective labels. Now I have tried the solution posted here but it doesn't seem to work for me.
Here's what I've tried to do:
labels = ['Fear', 'Happy', 'Sad']
with open('dataset.csv') as csv_file:
csv_reader = csv.reader(csv_file)
fear = 0
happy = 0
sad = 0
# skip headers
next(csv_reader)
for row in csv_reader:
pixels = row[1:] # without label
pixels = np.array(pixels, dtype='uint8')
pixels = pixels.reshape((48, 48))
image = Image.fromarray(pixels)
if csv_file['emotion'][row] == 'Fear':
image.save('C:\\Users\\name\\data\\fear\\im'+str(fear)+'.jpg')
fear += 1
elif csv_file['emotion'][row] == 'Happy':
image.save('C:\\Users\\name\\data\\happy\\im'+str(happy)+'.jpg')
happy += 1
elif csv_file['emotion'][row] == 'Sad':
image.save('C:\\Users\\name\\data\\sad\\im'+str(sad)+'.jpg')
sad += 1
However, upon running the above block of code, the following is the error message I get:
Traceback (most recent call last):
File "<ipython-input-11-aa928099f061>", line 18, in <module>
if csv_file['emotion'][row] == 'Fear':
TypeError: '_io.TextIOWrapper' object is not subscriptable
I referred to a bunch of posts that solved the above error (like this one), but I found that the people were trying their hand at a relatively different problem than mine, and others I couldn't understand.
This may well be a very trivial question, but as I mentioned earlier, this is my first time working with such a dataset. Kindly tell me what am I doing wrong and how I can fix my code.
Try -
if str(row[0]) == 'Fear':
And in a similar way for the other conditions:
elif str(row[0]) == 'Happy':
elif str(row[0]) == 'Sad':
(a good practice is to just save the first value of the array as a variable)
The first problem that arose was that the first row was just the column names. In order to take care of this, I used the skiprows parameter like so:
raw = pd.read_csv('dataset.csv', skiprows = 1)
Secondly, I moved the labels column to the end due to it being in the first column. For my own convenience.
Thirdly, after all the preparations were done, the dataset won't iterate over the whole row, and instead just took in the value of the first row and first column, which gave an issue in resizing. So I instead used the df.itertuples() like so:
for row in data.itertuples(index = False, name = 'Pandas'):
Lastly, thanks to #HadarM 's suggestions, I was able to get it to work.
Modified code of the above code snippet that was the problematic block:
for row in data.itertuples(index = False, name = 'Pandas'):
pixels = row[:-1] # without label
pixels = np.array(pixels, dtype='uint8')
pixels = pixels.reshape((48, 48))
image = Image.fromarray(pixels)
if str(row[-1]) == 'Fear':
image.save('C:\\Users\\name\\data\\fear\\im'+str(fear)+'.jpg')
fear += 1
elif str(row[-1]) == 'Happy':
image.save('C:\\Users\\name\\data\\happy\\im'+str(happy)+'.jpg')
happy += 1
elif str(row[-1]) == 'Sad':
image.save('C:\\Users\\name\\data\\sad\\im'+str(sad)+'.jpg')
sad += 1
print('done')

Produce the most unique elements with least lists

I am new here. I hope I can explain briefly below after giving an example.
example1: What is your name?
example1: Where are you from?
example1: How are you doing?
example2: What is your name?
example2: Where are you from?
example2: How are you doing?
example2: When did you move here?
example9: What is your name?
example3: Where are you from?
example23: Who gave you this book?
In the above example, I would like to print the unique questions by considering the number of example. So trying something like
expected output
example2: What is your name?
example2: Where are you from?
example2: How are you doing?
example2: When did you move here?
example23: Who gave you this book?
Here, I am searching for the unique questions in a file by considering fewer examples.
I played around something and placing that below.
import collections
s = collections.defaultdict(list)
u_s = set()
with open ('file.txt', 'r') as s1:
for line in s1:
data = line.split(':', maxsplit=1)
start = data[0]
end = data[-1]
if end not in u_s:
u_s.add(end)
s[start] += [end]
for start, ends in s.items():
print(start, ends[0])
for end in ends[1:]:
print(start, end)
Result that I am getting:
example1 What is your name?
example1 Where are you from?
example1 How are you doing?
example2 When did you move here?
example23 Who gave you this book?
Here, Instead of going to print example1, I want to consider example2 because it is giving more questions.
I tried by sorting the lines based on the repetitions of the line. I couldn't pass through it. I appreciate your help. Thanks
What your code achieved is to print all unique questions but cannot compare or print them in a whole set.
Apart from sorting, I would formulate the problem as to compare the combinations of example sets and select the one that contains the most unique questions with the least sets, so your question is more about the algorithm to me.
import collections
def calculate_contrib(values, set):
'''To calculate the contribution on the unique questions' number, based on values to add.
values: the list of question set to choose.
set: the already-added question set.'''
contrib = 0
for value in values:
if value not in set:
contrib += 1
return contrib
def print_result(x):
'''To print the result, x, as a dictionary, without repetition.'''
u_s = set()
for key, values in x.items():
for value in values:
if value not in u_s:
print(key,value)
u_s.add(value)
s = collections.defaultdict(list)
# get all questions in examples
with open('file.txt', 'r') as s1:
for line in s1:
data = line.split(':', maxsplit=1)
start = data[0]
end = data[-1]
s[start] += [end]
# Get the initial contribution on the unique questions' number for each example set
contrib = dict()
u_s = set()
result = dict()
for key,values in s.items():
contrib.update({key: calculate_contrib(s[key], u_s)})
# Execute the while loop when there are unique questions to add to u_s
while not(all([x == 0 for x in contrib.values()])):
# Add the example set with maximum contribution
max_contrib = 0
max_key = ""
for key, value in contrib.items():
if max_contrib < value:
max_key = key
max_contrib = value
result.update({max_key: s[max_key]})
u_s.update(s[max_key])
del s[max_key]
del contrib[max_key]
for key, values in s.items():
contrib[key] = calculate_contrib(values, u_s)
# print the result
print_result(result)
Above is a straightforward implementation, that is adding the example set with the most increase on the unique's number each time until no unique question remains.
Further improvement can be conducted. Hope it could give you some insight.

Please help me to fix the ''list index out of range'' error

I wrote a program to calculate the ratio of minor (under 20 of age) population in each prefecture of Japan and it keeps producing this error: list index out of range, at line 19: ratio =(agerange[1]+agerange[2]+agerange[3]+agerange[4])/population*100.0
Link to csv: https://drive.google.com/open?id=1uPSMpgHw0csRx1UgAJzRLit9p6NrztFY
f=open("population.csv","r")
header=f.readline()
header=header.rstrip("\r\n")
while True:
line=f.readline()
if line=="":
break
line=line.rstrip("\r\n")
field=line.split(sep=",")
population=0
ratio=0
agerange=[ "pref" ]
for age in range(1, len(field)):
agerange.append(int(field[age]))
population+=int(field[age])
ratio =(agerange[1]+agerange[2]+agerange[3]+agerange[4])/population*100.0
print(field[0],ratio)
On line 17, I assume you to do the following code:
ratio =(agerange[0]+agerange[1]+agerange[2]+agerange[3])/population*100.0
next time, write your error more in detail please.
What you could do instead is get the sums of populations in the required age ranges and then perform the ratio calculation.
In Python, you can use the map function to convert the values in an iterable to ints, and make that into a list.
Once you have the list, you can use the sum function on it, or a part of it.
So, I came up with:
f = open("population.csv","r")
header = f.readline()
header = header.rstrip("\r\n")
while True:
line = f.readline()
if line == "":
break
line = line.rstrip("\r\n")
field = line.split(sep=",")
popData = list(map(int, field[1:]))
youngPop = sum(popData[:4])
oldPop = sum(popData[4:])
ratio = youngPop / (youngPop + oldPop)
print(field[0].ljust(12), ratio)
f.close()
Which outputs (just showing a portion here):
Hokkaido 0.1544532130777903
Aomori 0.1564945226917058
Iwate 0.16108452950558214
Miyagi 0.16831683168316833
Akita 0.14357429718875503
Yamagata 0.16515426497277677
Fukushima 0.16586921850079744
(I don't really know Python, so there could be some "better" or more conventional way.)

Indexes and ranges in python

I have this code:
def main():
if (len(sys.argv) > 2) :
P=list()
f= open('Trace.txt' , 'w+')
Seed = int(sys.argv[1])
for i in range(2, len(sys.argv)):
P[i-2] = int(sys.argv[i])
for j in range(0, len(sys.argv)-1) :
Probability=P[j]
for Iteration in (K*j, K*(j+1)):
Instruction= generateInstruction(Seed, Probability)
f.write(Instruction)
f.close()
else:
print('Params Error')
if __name__ == "__main__":
main()
The idea is that I am passing some parameters through the command line. the first is seed and the rest I want to have them in a list that I am parsing later and doing treatments according to that parameter.
I keep receiving this error:
P[i-2] = int(sys.argv[i])
IndexError: list assignment index out of range
what am I doing wrong
PS: K, generateSegment() are defined in a previous part of the code.
The error you see is related to a list being indexed with an invalid index.
Specifically, the problem is that P is an empty list at the time is being called in that line so P[0] is indeed not accessible. Perhaps what you want is to actually add the element to the list, this can be achieved, for example, by replacing:
P[i-2] = int(sys.argv[i])
with:
P.append(int(sys.argv[i]))
Note also that argument parsing is typically achieved way more efficiently in Python by using the standard module argparse, rather than parsing sys.argv manually.
It looks like you might be referencing a list item that does not exist.
I haven't used Python in quite a while but I'm pretty sure that if you want to add a value to the end of a list you can use someList.append(foo)
The problem is that you are assigning a value to an index which does not yet exist.
You need to replace
P[i-2] = int(sys.argv[I])
with
P.append(int(sys.argv[i]))
Furthermore, len(sys.argv) will return the number of items in sys.argv however indexing starts at 0 so you need to change:
for i in range(2, len(sys.argv)):
with
for i in range(2, len(sys.argv)-1):
As you will run into a list index out of range error otherwise

Resources