Starting out with Python Program 4-9 - python-3.x

I'm trying to figure out how to design a count-controlled loop with the for statement. I have copied the example program in the text perfectly (or so I thought). I tried to run this program, but I keep getting a syntax error. Can someone tell me what I'm doing wrong?
# This program converts the speeds 60 kph
# through 130 kph (in 10 kph increments)
# to mph.
START_SPEED = 60 # Starting speed
END_SPEED = 131 # Ending speed
INCREMENT = 10 # Speed increment
CONVERSION_FACTOR = 0.6214 # Conversion factor
# Print the table headings.
print('KPH\tMPH')
print('--------------')
# Print the speeds.
for kph in range(START_SPEED, END_SPEED, INCREMENT)
mph = kph * CONVERSION_FACTOR
print(kph, '\t', format(mph, '.1f'))

You're lacking a :
...
for kph in range(START_SPEED, END_SPEED, INCREMENT):
mph = kph * CONVERSION_FACTOR
...
With the :
I get
KPH MPH
--------------
60 37.3
70 43.5
80 49.7
90 55.9
100 62.1
110 68.4
120 74.6
130 80.8

Related

How to get sum of probabilities of rolling w dice where one is fair and the other one is unfair?

I am writing a little program and wanted to ask how I can add the logic of having an unfair dice in the game? Right now, my code produces the sum of probabilities of rolling 2 dices with 6 faces for i times. However, it is treating the dices with a 1/6 probability of rolling a given number. How do I tweak it, so that the unfair dice ONLY shows up in the range of 2-5 but never as 1 or 6? The output should the sum of probs for all numbers in range 2-12 given the fair and unfair dice.
import random
from collections import defaultdict
def main():
dice = 2
sides = 6
rolls = int(input("Enter the number of rolls to simulate: "))
result = roll(dice, sides, rolls)
maxH = 0
for i in range(dice, dice * sides + 1):
if result[i] / rolls > maxH: maxH = result[i] / rolls
for i in range(dice, dice * sides + 1):
print('{:2d}{:10d}{:8.2%} {}'.format(i, result[i], result[i] / rolls, '#' * int(result[i] / rolls / maxH * 40)))
def roll(dice, sides, rolls):
d = defaultdict(int)
for _ in range(rolls):
d[sum(random.randint(1, sides) for _ in range(dice))] += 1
return d
main()
Output
Enter the number of rolls to simulate: 10000
2 265 2.65% ######
3 567 5.67% #############
4 846 8.46% ####################
5 1166 11.66% ############################
6 1346 13.46% ################################
7 1635 16.35% ########################################
8 1397 13.97% ##################################
9 1130 11.30% ###########################
10 849 8.49% ####################
11 520 5.20% ############
12 279 2.79% ######
Given that the logic of which results are possible is currently being controlled by the line
random.randint(1, sides)
that's the line to change if you want to roll with different bounds. For example, to get 2-5, you could generalize the function:
def main():
dice = 2
sides = 6
unfair_min = 2
unfair_max = 5
rolls = int(input("Enter the number of rolls to simulate: "))
result_unfair = roll_biased(dice, sides, rolls, min_roll=unfair_min, max_roll=unfair_max)
maxH = max(result_unfair.values()) / rolls
for i in range(dice, dice * sides + 1):
print('{:2d}{:10d}{:8.2%} {}'.format(i, result_unfair[i], result_unfair[i] / rolls,
'#' * int(result_unfair[i] / rolls / maxH * 40)))
def roll_biased(dice, sides, rolls, min_roll=1, max_roll=None):
if max_roll is None:
max_roll = sides
d = defaultdict(int)
for _ in range(rolls):
d[sum(random.randint(min_roll, max_roll) for _ in range(dice))] += 1
return d
Which could print:
Enter the number of rolls to simulate: 10000
2 0 0.00%
3 0 0.00%
4 632 6.32% ##########
5 1231 12.31% ###################
6 1851 18.51% #############################
7 2480 24.80% ########################################
8 1873 18.73% ##############################
9 1296 12.96% ####################
10 637 6.37% ##########
11 0 0.00%
12 0 0.00%
You could also generalize this to arbitrary choices (or arbitrary weights) using random.choices() as such:
def roll_from_choices(dice, sides, rolls, allowed_rolls=None):
if allowed_rolls is None:
allowed_rolls = list(range(1, sides+1))
d = defaultdict(int)
for _ in range(rolls):
d[sum(random.choices(allowed_rolls, k=dice))] += 1
return d
which you can call as:
result_unfair = roll_from_choices(dice, sides, rolls, allowed_rolls=[2, 3, 4, 5])
I would start with a single function that returns the result of a die(a) roll, where the options available can be tailored to exclude the impossible, something like:
import random
def throw_die(options):
return random.choice(options)
Then I would code for the generalised case where you can have any number of dice, each of varying abilities (to be passed as the options when throwing the die). In your particular case, that would be two dice with the second excluding 1 and 6(b):
dice = [
[1, 2, 3, 4, 5, 6],
[ 2, 3, 4, 5 ]
]
Then allocate enough storage for the results (I've wasted a small amount of space here to ensure code for collecting data is much simpler):
min_res = sum([min(die) for die in dice]) # Smallest possible result,
max_res = sum([max(die) for die in dice]) # and largest.
count = [0] * (max_res + 1) # Allocate space + extra.
Your data collection is then the relatively simple (I've hard-coded the roll count here rather than use input, but you can put that back in):
rolls = 10000 # rolls = int(input("How many rolls? "))
for _ in range(rolls):
# Throw each die, sum the results, then increment correct count.
result = sum([throw_die(die) for die in dice])
count[result] += 1
And the data output can be done as (rounding rather than truncating so that highest count has forty hashes - that's just my CDO(c) nature kicking in):
hash_mult = 40 / max(count)
for i in range(min_res, max_res + 1):
per_unit = count[i] / rolls
hashes = "#" * int(count[i] * hash_mult + 0.5)
print(f"{i:2d}{count[i]:10d}{per_unit:8.2%} {hashes}")
The complete program then becomes:
import random
# Throw a single die.
def throw_die(options):
return random.choice(options)
# Define all dice here as list of lists.
# No zero/negative number allowed, will
# probably break code :-)
dice = [
[1, 2, 3, 4, 5, 6],
[ 2, 3, 4, 5 ]
]
# Get smallest/largest possible result.
min_res = sum([min(die) for die in dice])
max_res = sum([max(die) for die in dice])
# Some elements wasted (always zero) to simplify later code.
# Example: throwing three normal dice cannot give 0, 1, or 2.
count = [0] * (max_res + 1)
# Do the rolls and collect results.
rolls = 10000
for _ in range(rolls):
result = sum([throw_die(die) for die in dice])
count[result] += 1
# Output all possible results.
hash_mult = 40 / max(count)
for i in range(min_res, max_res + 1):
per_unit = count[i] / rolls
hashes = "#" * int(count[i] * hash_mult + 0.5)
print(f"{i:2d}{count[i]:10d}{per_unit:8.2%} {hashes}")
and a few sample runs to see it in action:
pax:/mnt/c/Users/Pax/Documents/wsl> python prog.py
3 418 4.18% #########
4 851 8.51% ####################
5 1266 12.66% ##############################
6 1681 16.81% ########################################
7 1606 16.06% ######################################
8 1669 16.69% #######################################
9 1228 12.28% #############################
10 867 8.67% ####################
11 414 4.14% #########
pax:/mnt/c/Users/Pax/Documents/wsl> python prog.py
3 450 4.50% ##########
4 825 8.25% ###################
5 1206 12.06% ############################
6 1655 16.55% #######################################
7 1679 16.79% ########################################
8 1657 16.57% #######################################
9 1304 13.04% ###############################
10 826 8.26% ###################
11 398 3.98% #########
pax:/mnt/c/Users/Pax/Documents/wsl> python prog.py
3 394 3.94% #########
4 838 8.38% ####################
5 1271 12.71% ##############################
6 1617 16.17% ######################################
7 1656 16.56% #######################################
8 1669 16.69% ########################################
9 1255 12.55% ##############################
10 835 8.35% ####################
11 465 4.65% ###########
Footnotes:
(a) Using the correct nomenclature for singular die and multiple dice, in case any non-English speakers are confused.
(b) You could also handle cases like [1, 2, 3, 4, 4, 5, 6] where you're twice as likely to get a 4 as any other numbers. Anything much more complex than that would probably better be handled with a tuple representing each possible result and its relative likelihood. Probably a little too complex to put in a footnote (given it's not a requirement of the question) but you can always ask about this in a separate question if you're interested.
(c) Just like OCD but in the Correct Damn Order :-)

Overflow when unpacking long - Pytorch

I am running the following code
import torch
from __future__ import print_function
x = torch.empty(5, 3)
print(x)
on an Ubuntu machine in CPU mode, which gives me following error, what would be the reason and how to fix
x = torch.empty(5, 3)
----> print(x)
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in __repr__(self)
55 # characters to replace unicode characters with.
56 if sys.version_info > (3,):
---> 57 return torch._tensor_str._str(self)
58 else:
59 if hasattr(sys.stdout, 'encoding'):
/usr/local/lib/python3.6/dist-packages/torch/_tensor_str.py in _str(self)
216 suffix = ', dtype=' + str(self.dtype) + suffix
217
--> 218 fmt, scale, sz = _number_format(self)
219 if scale != 1:
220 prefix = prefix + SCALE_FORMAT.format(scale) + ' ' * indent
/usr/local/lib/python3.6/dist-packages/torch/_tensor_str.py in _number_format(tensor, min_sz)
94 # TODO: use fmod?
95 for value in tensor:
---> 96 if value != math.ceil(value.item()):
97 int_mode = False
98 break
RuntimeError: Overflow when unpacking long
Since, torch.empty() gives uninitialized memory, so you may or may not get a large value from it. Try
x = torch.rand(5, 3)
print(x)
this would give the response.

'and', 'or' for python unequal statement between strings

I am reading a text file using python. Each line will be read if the first character does not equal to # or #. The content of the text is pasted in the end.
This works:
if line[0] != '#' and line[0] != '#':
value=float(line[3:8])
But this does not work:
if line[0] != ('#' and '#'):
value=float(line[3:8])
ValueError: could not convert string to float: 'his f'
This also does not work:
if line[0] != ('#' or '#'):
value=float(line[3:8])
ValueError: could not convert string to float: ' tit'
So why both or and and not work?
The content of the text file:
# This file was created Wed Feb 21 12:32:25 2018
# Created by:
# :-) GROMACS - gmx do_dssp, VERSION 5.1.1 (-:
#
# Executable: /shared/ucl/apps/gromacs/5.1.1/intel-2015-update2/bin//gmx
# Data prefix: /shared/ucl/apps/gromacs/5.1.1/intel-2015-update2
# Command line:
# gmx do_dssp -f md_0_1_noPBC.xtc -s md_0_1.tpr -ssdump ssdump.dat -map ss.map -o ss.xpm -sc scount.xvg -a area.xpm -ta totarea.xvg -aa averarea.xvg -tu ns
# gmx do_dssp is part of G R O M A C S:
#
# Gnomes, ROck Monsters And Chili Sauce
#
# title "Secondary Structure"
# xaxis label "Time (ns)"
# yaxis label "Number of Residues"
#TYPE xy
# subtitle "Structure = A-Helix + B-Sheet + B-Bridge + Turn"
# view 0.15, 0.15, 0.75, 0.85
# legend on
# legend box on
# legend loctype view
# legend 0.78, 0.8
# legend length 2
# s0 legend "Structure"
# s1 legend "Coil"
# s2 legend "B-Sheet"
# s3 legend "B-Bridge"
# s4 legend "Bend"
# s5 legend "Turn"
# s6 legend "A-Helix"
# s7 legend "3-Helix"
# s8 legend "5-Helix"
0 278 106 199 10 44 60 9 15 0
0.1 267 118 203 7 52 47 10 6 0
0.2 245 144 175 8 54 51 11 0 0
0.3 264 118 192 8 58 56 8 3 0
line[0] not in ['#' , '#'] will do the trick.
It checks if line[0] exists in a given array.

How do I rightly code linear regression with gradient descent in Python?

import pandas as pd
import matplotlib.pyplot as plt
# I'm trying to code the utter basic func of LinearRegression
# from sklearn.linear_model import LinearRegression
dataframe = pd.read_fwf('brain_body.txt') # link given below
x_values = dataframe[['Brain']]
y_values = dataframe[['Body']]
lr = LinearRegression(0.0001, 10) # sending learning_rate and iterations
lr.fit(x_values, y_values)
# commenting out because the values are insane
# plt.scatter(x_values, y_values)
# plt.plot(x_values, clf.predict(x_values))
# plt.show()
Link to brain_body.txt
Here's the class I've written
class LinearRegression:
def __init__(self, learning_rate, iterations):
self.b = 0 # b as in y=mx+b
self.m = 0 # m as in y=mx+b
self.learning_rate = learning_rate
self.iterations = iterations
def get_y(self, x):
return self.m * float(x) + self.b
def step_gradient(self, x_values, y_values):
print()
print("Values before: m =", self.m, " b =", self.b)
m_gradient = 0
b_gradient = 0
N = float(len(x_values.ix[:, 0]))
print('%11s' % "d(m)", '%11s' % "m_gradient", '%11s' % "d(b)", '%11s' % "b_gradient")
for i in range(int(N)):
x = x_values.iloc[i][0]
y = y_values.iloc[i][0]
# EDIT: I missed a * -1 here
# But that wouldn't just fix everything, adjusting learning rate does
pm = (y - self.get_y(x)) * x # partial derivative of m
pb = (y - self.get_y(x)) * -1 # partial derivative of b
m_gradient += pm * 2 / N
b_gradient += pb * 2 / N
print('%11s' % pm, '%11s' % m_gradient, '%11s' % pb, '%11s' % b_gradient)
self.m -= self.learning_rate * m_gradient # adjust current m
self.b -= self.learning_rate * b_gradient # adjust current b
print("Values after: m =", self.m, " b =", self.b)
print()
def fit(self, x_values, y_values): # equivalent to train_model
for i in range(self.iterations):
self.step_gradient(x_values, y_values)
return
def predict(self, x_values): # equivalent to get_output
predictions = []
for x in x_values.ix[:, 0]:
predictions.append(self.get_y(x))
return predictions
I watched Siraj Raval's How to do Linear Regression the right way and followed almost the same way he did. I did learn what partial derivatives and gradient descents are, but I do not what the values of partial derivatives be (or to guess them). And the numbers are going like crazy in the first iteration itself:
Values before: m = 0 b = 0
d(m) m_gradient d(b) b_gradient
150.6325 4.85911290323 -44.5 -1.43548387097
7.44 5.09911290323 -15.5 -1.93548387097
10.935 5.45185483871 -8.1 -2.19677419355
196695.0 6350.45185484 -423.0 -15.8419354839
4341.435 6490.49814516 -119.5 -19.6967741935
3180.9 6593.10782258 -115.0 -23.4064516129
1456.306 6640.08543548 -98.2 -26.5741935484
5.72 6640.26995161 -5.5 -26.7516129032
243.02 6648.10930645 -58.0 -28.6225806452
2.72 6648.19704839 -6.4 -28.8290322581
0.404 6648.21008065 -4.0 -28.9580645161
5.244 6648.37924194 -5.7 -29.1419354839
6.6 6648.59214516 -6.6 -29.3548387097
0.0007 6648.59216774 -0.14 -29.3593548387
0.06 6648.59410323 -1.0 -29.3916129032
37.8 6649.81345806 -10.8 -29.74
24.6 6650.60700645 -12.3 -30.1367741935
10.71 6650.95249032 -6.3 -30.34
11723841.0 384839.371845 -4603.0 -178.823870968
0.0069 384839.372068 -0.3 -178.833548387
78394.9 387368.23981 -419.0 -192.349677419
341255.0 398376.465616 -655.0 -213.478709677
2.7475 398376.554245 -3.5 -213.591612903
1150.0 398413.651019 -115.0 -217.301290323
84.48 398416.376181 -25.6 -218.127096774
1.0 398416.408439 -5.0 -218.288387097
24.675 398417.204406 -17.5 -218.852903226
359720.0 410021.075374 -680.0 -240.788387097
84042.0 412732.107632 -406.0 -253.88516129
27625.0 413623.236665 -325.0 -264.369032258
9.225 413623.534245 -12.3 -264.765806452
81840.0 416263.534245 -1320.0 -307.346451613
38007648.0 1642316.69554 -5712.0 -491.604516129
13.65 1642317.13586 -3.9 -491.730322581
1217.2 1642356.40037 -179.0 -497.504516129
1960.0 1642419.62618 -56.0 -499.310967742
68.85 1642421.84715 -17.0 -499.859354839
0.12 1642421.85102 -1.0 -499.891612903
0.0092 1642421.85132 -0.4 -499.904516129
0.0025 1642421.8514 -0.25 -499.912580645
17.5 1642422.41591 -12.5 -500.315806452
122500.0 1646374.02882 -490.0 -516.122258065
30.25 1646375.00462 -12.1 -516.512580645
9712.5 1646688.31107 -175.0 -522.157741935
15700.0 1647194.76269 -157.0 -527.222258065
22950.4 1647935.09817 -440.0 -541.415806452
1893.725 1647996.18607 -179.5 -547.206129032
1.32 1647996.22865 -2.4 -547.283548387
4860.0 1648153.00285 -81.0 -549.896451613
75.6 1648155.44156 -21.0 -550.573870968
168.0896 1648160.8638 -39.2 -551.838387097
0.532 1648160.88096 -1.9 -551.899677419
0.09 1648160.88387 -1.2 -551.938387097
0.366 1648160.89567 -3.0 -552.03516129
0.01584 1648160.89619 -0.33 -552.045806452
34560.0 1649275.73489 -180.0 -557.852258065
75.0 1649278.15425 -25.0 -558.658709677
27040.0 1650150.41231 -169.0 -564.110322581
2.34 1650150.4878 -2.6 -564.194193548
18.468 1650151.08354 -11.4 -564.561935484
0.26 1650151.09193 -2.5 -564.642580645
213.444 1650157.97722 -50.4 -566.268387097
Values after: m = -165.015797722 b = 0.0566268387097
Values after 10 iteration: m = -1.76899770934e+22 b = 4.21166966984e+18
How do I rightly do LinearRegression from scratch?
This might not be a true answer as it's using R (I could probably figure this out in python, but it would take me longer). I think your issue is in the size of your learning_rate. I'm taking this machine learning class at the moment and so I'm familiar with what you're doing and attempted to implement it myself. Here was my code:
library(ggplot2)
## create test data
data <- data.frame(x = 1:10, y = 1:10)
n <- nrow(data)
## initialize values
m <- 0
b <- 0
alpha <- 0.01
iters <- 100
results <- data.frame(i = 1:iters,
pm = 1:iters,
pb = 1:iters,
m = 1:iters,
b = 1:iters)
for (i in 1:iters) {
y_hat <- (m * data$x) + b
pm <- (1/n) * sum((y_hat - data$y) * data$x)
pb <- (1/n) * sum(y_hat - data$y)
m <- m - (alpha * pm)
b <- b - (alpha * pb)
## uncomment if you want; shows "animated" change
## p <- ggplot(data, aes(x = x, y = y)) + geom_point()
## p <- p + geom_abline(intercept = b, slope = m)
## print(p)
## this turned out to be key for looking at output
results[i, 2:5] <- c(pm, pb, m, b)
}
Now, note the end of results with a big alpha, 0.1:
> tail(results)
i pm pb m b
95 95 -2.864612e+45 -4.114745e+44 2.135518e+44 3.067470e+43
96 96 8.390457e+45 1.205210e+45 -6.254938e+44 -8.984628e+43
97 97 -2.457567e+46 -3.530062e+45 1.832073e+45 2.631600e+44
98 98 7.198218e+46 1.033956e+46 -5.366146e+45 -7.707961e+44
99 99 -2.108360e+47 -3.028460e+46 1.571745e+46 2.257664e+45
100 100 6.175391e+47 8.870365e+46 -4.603646e+46 -6.612702e+45
See how m and b are flip flopping? The learning rate alpha is so high that alpha * derivative are jumping over the minima! In the linked class this is shown in the gradient descent videos, but the concept is the same as this image I found:
Look at results using alpha = 0.01:
> tail(results)
i pm pb m b
95 95 -0.003483741 0.02425319 0.9834438 0.1152615
96 96 -0.003476426 0.02420226 0.9834785 0.1150195
97 97 -0.003469127 0.02415144 0.9835132 0.1147780
98 98 -0.003461842 0.02410073 0.9835478 0.1145370
99 99 -0.003454573 0.02405012 0.9835824 0.1142965
100 100 -0.003447319 0.02399962 0.9836169 0.1140565
It's slow, but we're honing in on m = 1 and b = 0 as expected. With your real data, I had a similar issue. The main code body is the same, with this replacing the data <- data.frame() line at the beginning:
data <- read.table(file = "https://raw.githubusercontent.com/llSourcell/linear_regression_demo/master/brain_body.txt",
header = T, sep = "", stringsAsFactors = F)
names(data) <- c("y", "x")
Everything else is the same, except that I played with alpha and iters. Here's what I found!
## your learning rate; diverging/flip-flopping
## alpha <- 0.0001
> tail(results)
i pm pb m b
95 95 -3.842565e+190 -1.167811e+187 3.801319e+186 1.155276e+183
96 96 3.541406e+192 1.076285e+189 -3.503393e+188 -1.064732e+185
97 97 -3.263851e+194 -9.919315e+190 3.228817e+190 9.812842e+186
98 98 3.008048e+196 9.141894e+192 -2.975760e+192 -9.043766e+188
99 99 -2.772294e+198 -8.425404e+194 2.742537e+194 8.334966e+190
100 100 2.555018e+200 7.765068e+196 -2.527592e+196 -7.681718e+192
## 1/10 as big; still diverging!
## alpha <- 0.00001
> tail(results)
i pm pb m b
95 95 -2.453089e+92 -7.455293e+88 2.189776e+87 6.655047e+83
96 96 2.040052e+93 6.200012e+89 -1.821074e+88 -5.534508e+84
97 97 -1.696559e+94 -5.156089e+90 1.514452e+89 4.602638e+85
98 98 1.410902e+95 4.287936e+91 -1.259457e+90 -3.827672e+86
99 99 -1.173342e+96 -3.565957e+92 1.047397e+91 3.183190e+87
100 100 9.757815e+96 2.965541e+93 -8.710418e+91 -2.647222e+88
## even smaller; that's better!
## alpha <- 0.000001
> tail(results)
i pm pb m b
95 95 -0.01579109 51.95899 0.8856351 -0.004667159
96 96 -0.01579107 51.95894 0.8856352 -0.004719118
97 97 -0.01579106 51.95889 0.8856352 -0.004771077
98 98 -0.01579104 51.95885 0.8856352 -0.004823036
99 99 -0.01579103 51.95880 0.8856352 -0.004874995
100 100 -0.01579102 51.95875 0.8856352 -0.004926953
With this final result, I plotted the results which look reasonable?
p <- ggplot(data, aes(x = x, y = y)) + geom_point()
p <- p + geom_abline(intercept = b, slope = m)
print(p)
So, to wrap up:
I didn't verify/check your python code
I did implement my understanding of gradient descent in R and try with a test to verify behavior
I re-tried this with your actual data to find it appears to work
thus, my recommendation would be to re-try your method with simplified data (sounds like you already might have) and then look at the initial steps with a very small learning rate to see if that fixes it. If not, there may still be something wrong with your math?
Hope that helps!

Python3 Extracting lines between first instance of two markers

I imported a text file from URL and want to process it. The file looks as below. There are two instances of " innings " and "Extras ". I want to extract lines between the FIRST instance of " innings " and FIRST instance of "Extras ". The code that I wrote extracts ALL instances. How do I resolve this?
Toss: Sri Lanka Umpires: M Erasmus (South Africa) and NJ Llong
(England) TV umpire: S Ravi (India) Match referee: DC Boon
(Australia) Reserve umpire: SD Fry (Australia) Player of the
match: CJ Anderson New Zealand innings (50 overs maximum)
R M B 4 6 MJ Guptill c Sangakkara b Lakmal
49 94 62 5 0 CJ Anderson c Lakmal b
Kulasekara 75 77 46 8 2
+L Ronchi not out 29 29 19 4 0
Extras (lb 2, w 8, nb 3) 13 Total (6 wickets, 50 overs, 226 mins) 331
Sri Lanka innings (target: 332 runs from 50 overs) R
M B 4 6 HDRL Thirimanne b Boult
65 90 60 8 0 RAS Lakmal not out
7 21 17 0 0
Extras (w 10, nb 1) 11 Total (all out, 46.1 overs, 210 mins) 233
Here is my code:
flag = 1
for line in data:
if " innings " in line:
flag = 0
print('')
if line.startswith("Extras "):
flag = 1
print('')
if not flag and not " innings " in line:
print(line)
Your program must stop on the first occurrence of Extras:
active = False # A variable `flag` is not very precisely named,
# better call it `active`, make it boolean
# and flip the values
for line in data:
if " innings " in line:
active = True # now we want to do things
print('')
continue # but not in this loop
if line.startswith("Extras "):
print('')
break # now we're done!
# alternative Method:
# active = False
if active:
print(line)
If you want to store all occurrences:
active = False
stored = []
for line in data:
if " innings " in line:
tmp = []
active = True # now we want to do things
continue # but not in this loop
if line.startswith("Extras "):
stored.append(tmp)
active = False
continue
if active:
tmp.append(line)
You'll end up with a list of lists of lines for further processing.

Resources