Python For Loop enumerate control - python-3.x

Hello I Have This Code
loss = list(range(1,10))
lists_fru = ['apple','banana','strawberry','erdberry','mango']
for index ,i in enumerate(loss):
if i > len(lists_fru):
print('larg')
else:
print(lists_fru[index])
The Resul Of It
apple
banana
strawberry
erdberry
mango
larg
larg
larg
larg
What I'm Looking For Or What I'm Trying To Do
I Wanna when the list_fru end to complete the loop from the begining
Like This
apple
banana
strawberry
erdberry
mango
apple
banana
strawberry
erdberry
like this

You can do what you want using the modulo operator, %.
loss = list(range(1,10))
lists_fru = ['apple','banana','strawberry','erdberry','mango']
for index ,i in enumerate(loss):
print(lists_fru[index % len(lists_fru)])

Related

Understanding the sed N command

The sed manual states about the N command:
N
Add a newline to the pattern space, then append the next line of input to the pattern space. If there is no more input then sed exits without processing any more commands.
Now, from what I know, sed reads each line of input, applies the script(s) on it, and (if -n is not specified) prints it as the output stream.
So, given the following example file:
Apple
Banana
Grapes
Melon
Running this:
$ sed '/Banana/N;p'
From what I understand, sed should process each line of input: Apple, Banana, Grapes and Melon.
So, I would think that the output will be:
Apple
Apple
Banana
Grapes # since it read the next line with N
Banana
Grapes
Grapes (!)
Grapes (!)
Melon
Melon
Explanation:
Apple is read to the pattern space. it doesn't match Banana regex, so only p is applied. It's printed twice: once for the p command, and once because sed prints the pattern space by default.
Next, Banana is read to the pattern space. It matches the regex, so that the N command is applied: so it reads the next line Grapes to the pattern space, and then p prints it: Banana\nGrapes. Next, the pattern space is printed again due to the default behavior.
Now, I would expect that Grapes will be read to the pattern space, so that Grapes will be printed twice, same as for Apple and Melon.
But in reality, this is what I get:
Apple
Apple
Banana
Grapes
Banana
Grapes
Melon
Melon
It seems that once Grapes was read as part of the N command that was applied to Banana, it will no longer be read as a line of its own.
Is that so? and if so, why isn't it emphasized in the docs?
This might explain it (GNU sed):
sed '/Banana/N;p' file --d
SED PROGRAM:
/Banana/ N
p
INPUT: 'file' line 1
PATTERN: Apple
COMMAND: /Banana/ N
COMMAND: p
Apple
END-OF-CYCLE:
Apple
INPUT: 'file' line 2
PATTERN: Banana
COMMAND: /Banana/ N
PATTERN: Banana\nGrapes
COMMAND: p
Banana
Grapes
END-OF-CYCLE:
Banana
Grapes
INPUT: 'file' line 4
PATTERN: Melon
COMMAND: /Banana/ N
COMMAND: p
Melon
END-OF-CYCLE:
Melon
Where --d is short for --debug
You will see the INPUT: lines go 1,2,4 because the second cycle also grabs input line 3 with the N command.
To debug this, I added = to your script so that you can see what's being emitted at each iteration. The line numbers conveniently demarcate the output from each. Then to identify the default print action at the end of each iteration, I added s/.*/==&==/ so you can see what was printed by sed because you did not specify -n.
sed '/Banana/N;=;p;=;s/.*/==&==/' <<\:
> Apple
> Banana
> Grapes
> Melon
> :
1
Apple
1
==Apple==
3
Banana
Grapes
3
==Banana
Grapes==
4
Melon
4
==Melon==
So, the pattern space containing Banana and Grapes was printed twice, and the first and last lines were printed in isolation twice.

Fuzzy String Matching using Python

I have a training dataset for eg.
Letter Word
A Apple
B Bat
C Cat
D Dog
E Elephant
and I need to check the dataframe such as
AD Apple Dog
AE Applet Elephant
DC Dog Cow
EB Elephant Bag
AED Apple Elephant Dog
D Door
ABC All Bat Cat
the instances AD,AE,EB are almost accurate (Apple and Applet are considered closer to each other, similar for Bat and Bag) but DC doesn't match.
Output Required:
Letters Words Status
AD Apple Dog Accept
AE Applet Elephant Accept
DC Dog Cow Reject
EB Elephant Bag Accept
AED Apple Elephant Dog Accept
D Door Reject
ABC All Bat Cat Accept
ABC accepted because 2 of 3 words match.
The words accepted need to be matched 70% (Fuzzy Match). yet, threshold subject to change.
How can I find these matches using Python.
You can use thefuzz to solve your problem:
# Python env: pip install thefuzz
# Conda env: conda install thefuzz
from thefuzz import fuzz
THRESHOLD = 70
df2['Others'] = (df2['Letters'].agg(list).explode().reset_index()
.merge(df1, left_on='Letters', right_on='Letter')
.groupby('index')['Word'].agg(' '.join))
df2['Ratio'] = df2.apply(lambda x: fuzz.ratio(x['Words'], x['Others']), axis=1)
df2['Status'] = np.where(df2['Ratio'] > THRESHOLD, 'Accept', 'Reject')
Output:
>>> df2
Letters Words Others Ratio Status
0 AD Apple Dog Apple Dog 100 Accept
1 AE Applet Elephant Apple Elephant 97 Accept
2 DC Dog Cow Dog Cat 71 Accept
3 EB Elephant Bag Elephant Bat 92 Accept
4 AED Apple Elephant Dog Apple Dog Elephant 78 Accept
5 D Door Dog 57 Reject
6 ABC All Bat Cat Apple Cat Bat 67 Reject

Getting string output embedded with \n characters

While scraping a website data, i am getting below o/p:
['1 tablespoon\nvegetable or coconut oil\n1 tablespoon\npeeled and minced fresh ginger (from a 1-inch piece)\n2 cloves\ngarlic, minced\n3 tablespoons\nvegan Thai red curry paste, such as Thai Kitchen\n2\nmedium sweet potatoes (about 1 pound total), peeled and cut into 1/2-inch cubes\n1 (15-ounce) can\nchickpeas, drained and rinsed\n1 (13- to 14-ounce) can\nfull-fat coconut milk\n1/2 cup\nwater\n1 teaspoon\nkosher salt\n1/4 teaspoon\nfreshly ground black pepper\n1 (5-ounce) bag\nbaby spinach (about 5 packed cups)\nJuice from 1 medium lime (about 2 tablespoons)\nCooked rice, for serving (optional)']
Where the first element is 1 tablespoon\nvegetable or coconut oil, second is
1 tablespoon\npeeled and minced fresh ginger (from a 1-inch piece)
So, you can understand that the individual elements are separated by \n and also the individual elements also contains \n. So I am totally confused, how to make a list of the individual ingredient items with no \n there, like:
['1 tablespoon vegetable or coconut oil, 1 tablespoon peeled and minced fresh ginger (from a 1-inch piece), 2 cloves garlic, minced, 3 tablespoons vegan Thai red curry paste, such as Thai Kitchen, Juice from 1 medium lime (about 2 tablespoons), Cooked rice, for serving (optional)']
For the list you can see that, there is no specific pattern like the if we can grab the \n just preceeding any integer as \n is there before Cooked rice, for serving (optional).
If we replace all the \n then all the occurrences will be replaced. I need to wipe out the \n occurrences from inside individual ingredient and also the \n separator between two ingredients need to be replaced by , as i have shown the expected o/p above.
Actual o/p:
['1 tablespoon\nvegetable or coconut oil\n1 tablespoon\npeeled and minced fresh ginger (from a 1-inch piece)\n2 cloves\ngarlic, minced\n3 tablespoons\nvegan Thai red curry paste, such as Thai Kitchen\n2\nmedium sweet potatoes (about 1 pound total), peeled and cut into 1/2-inch cubes\n1 (15-ounce) can\nchickpeas, drained and rinsed\n1 (13- to 14-ounce) can\nfull-fat coconut milk\n1/2 cup\nwater\n1 teaspoon\nkosher salt\n1/4 teaspoon\nfreshly ground black pepper\n1 (5-ounce) bag\nbaby spinach (about 5 packed cups)\nJuice from 1 medium lime (about 2 tablespoons)\nCooked rice, for serving (optional)']
Expected o/p:
['1 tablespoon vegetable or coconut oil, 1 tablespoon peeled and minced fresh ginger (from a 1-inch piece), 2 cloves garlic, minced, 3 tablespoons vegan Thai red curry paste, such as Thai Kitchen, Juice from 1 medium lime (about 2 tablespoons), Cooked rice, for serving (optional)']
I got something close to what you want, hope it helps:
I found 3 separate occasions to replace in the string:
when there's a line break with a number, replace with ", (number)"
when there's a line break with an uppercase letter, replace with ", (letter)"
when there's a line break that doesn't fit both of these categories, replace with " "
import re
text = "['1 tablespoon\nvegetable or coconut oil\n1 tablespoon\npeeled and minced fresh ginger (from a 1-inch piece)\n2 cloves\ngarlic, minced\n3 tablespoons\nvegan Thai red curry paste, such as Thai Kitchen\n2\nmedium sweet potatoes (about 1 pound total), peeled and cut into 1/2-inch cubes\n1 (15-ounce) can\nchickpeas, drained and rinsed\n1 (13- to 14-ounce) can\nfull-fat coconut milk\n1/2 cup\nwater\n1 teaspoon\nkosher salt\n1/4 teaspoon\nfreshly ground black pepper\n1 (5-ounce) bag\nbaby spinach (about 5 packed cups)\nJuice from 1 medium lime (about 2 tablespoons)\nCooked rice, for serving (optional)']"
text = re.sub("\\n(\d)",", \g<1>", text)
text = re.sub("\\n([A-Z])", ", \g<1>", text)
text = re.sub("\\n"," ", text)
print (text)
output: ['1 tablespoon vegetable or coconut oil, 1 tablespoon peeled and minced fresh ginger (from a 1-inch piece), 2 cloves garlic, minced, 3 tablespoons vegan Thai red curry paste, such
as Thai Kitchen, 2 medium sweet potatoes (about 1 pound total), peeled and cut into 1/2-inch cubes, 1 (15-ounce) can chickpeas, drained and rinsed, 1 (13- to 14-ounce) can full-f
at coconut milk, 1/2 cup water, 1 teaspoon kosher salt, 1/4 teaspoon freshly ground black pepper, 1 (5-ounce) bag baby spinach (about 5 packed cups), Juice from 1 medium lime (about 2 tablespoons), Cooked rice, for serving (optional)']

Linux, awk and how to count and print consecutive lines in a file?

For example I have a file like:
apple
apple
strawberry
What I want to achieve is to print the consecutive line(apple) and count how many times it is consecutive(2) like this: apple-2 using awk.
My code so far is this however it does the following: apple1-apple1.
awk '{current = $NF;
getline;
if($NF == current) i++;
printf ("%s-%d",current,i) }' $file
Thank you in advance.
How about uniq -c and awk for filtering:
$ uniq -c foo|awk '$1>1'
2 apple
Given:
$ cat file
apple
apple
strawberry
mango
apple
strawberry
strawberry
strawberry
You can do:
$ awk '$1==last{seen[$1]++}
{last=$1}
END{for (e in seen)
print seen[e]+1, e}' file
2 apple
3 strawberry

sed: How do I replace all lines containing a certain string?

Say I have a file containing these lines:
tomatoes
bananas
tomatoes with apples
pears
pears with apples
How do I replace every line containing the word "apples" with just "oranges"? This is what I want to end up with:
tomatoes
bananas
oranges
pears
oranges
Use sed 's/.*apples.*/oranges/' ?
you can use awk
$ awk '/apples/{$0="orange"}1' file
tomatoes
bananas
orange
pears
orange
this says to search for apples, then change the whole record to "orange".

Resources