Set in python replacing number 10 from a file with 0,1 - python-3.x

I have a file with the line:1 2 3 4 5 10. When I add this line to a set in Python, I get {1,2,3,4,5,0} instead of {1,2,3,4,5,10}. How do I code so that I get the 10 inside the set instead of it recognizing it as a 1 and a 0?
EDIT: This was the code I wrote:
states = set()
line = open("filepath", "r").readlines()[0]
states.add(line)
print (states)
Input file content:
1 2 3 4 5 10

As set cannot have a same number twice, the zero which belongs to 10 is being treated as a unique element thus set cannot contain two same elements.
Do something like this to fix it (Assuming you don't have newline characters, if you do, just use the strip method.):
line = open("filepath", "r").readlines()[0]
line = line.split(' ') #Split by Space
number_set = set(line) #Since file is a list after splitting.

Related

Generating Binary Encoded Symbols Python Program

I have an assignment that has instructions as follows:
write a program that reads in 4 sets of 4 dashed lines and outputs the four binary symbols that each set of four lines represents.
input consists of 16 lines in total, consisting of any number of dashes and spaces.
the first four lines represents a symbol, the next four lines represents the next symbol and so on.
print out the four binary-encoded symbols represented by the 16 lines in total.
each binary symbol should be on its own line
This is based upon a previous program that I wrote where input is a single line of text consisting of any number of spaces and dashes. If there is an even number of dashes in the line, output 0. Otherwise, output 1.
This is the code for the above:
line = input()
num_dashes = line.count("-")
mod = num_dashes % 2
if mod == 0:
print("0")
else:
print("1")
Please may someone assist me?
Thank you.
The code you have for processing one line is fine, although you could replace the if...else with just:
print(mod)
Now to extend this to multiple lines, it might be better not to call print like that, but to collect the output in a variable, and only output that variable when all 16 lines have been processed. This way the output does not get mixed with the input from the console.
So for instance, it could happen like this:
output = []
for part in range(4): # loop 4 times
digits = ""
for line in range(4): # loop 4 times
line = input()
num_dashes = line.count("-")
mod = num_dashes % 2
digits += str(mod) # collect the digit
output.append(digits) # append 4 digits to a list
print("\n".join(output)) # print the list, separated by linebreaks

How to skip N central lines when reading file?

I have an input file.txt like this:
3
2
A
4
7
B
1
9
5
2
0
I'm trying to read the file and
when A is found, print the line that is 2 lines below
when B is found, print the line that is 4 lines below
My current code and current output are like below:
with open('file.txt') as f:
for line in f:
if 'A' in line: ### Skip 2 lines!
f.readline() ### Skipping one line
line = f.readline() ### Locate on the line I want
print(line)
if 'B' in line: ## Skip 4 lines
f.readline() ### Skipping one line
f.readline() ### Skipping two lines
f.readline() ### Skipping three lines
line = f.readline() ### Locate on the line I want
print(line)
'4\n'
7
'1\n'
'9\n'
'5\n'
2
>>>
Is printing the values I want, but is printing also 4\n,1\n... and besides that, I need to write several f.realines()which is not practical.
Is there a better way to do this?
My expected output is like this:
7
2
Here is a much simpler code for you:
lines=open("file.txt","r").read().splitlines()
#print(str(lines))
for i in range(len(lines)):
if 'A' in lines[i]:
print(lines[I+2]) # show 2 lines down
elif 'B' in lines[i]:
print(lines[I+4]) # show 4 lines down
This reads the entire file as an array in which each element is one line of the file. Then it just goes through the array and directly changes the index by 2 (for A) and 4 (for B) whenever it finds the line it is looking for.
if you don't like repeated readline then wrap it in a function so the rest of the code is very clean:
def skip_ahead(it, elems):
assert elems >= 1, "can only skip positive integer number of elements"
for i in range(elems):
value = next(it)
return value
with open('file.txt') as f:
for line in f:
if 'A' in line:
line = skip_ahead(f, 2)
print(line)
if 'B' in line:
line = skip_ahead(f, 4)
print(line)
As for the extra output, when the code you have provided is run in a standard python interpreter only the print statements cause output, so there is no extra lines like '1\n', this is a feature of some contexts like the IPython shell when an expression is found in a statement context, in this case f.readline() is alone on it's own line so it is detected as possibly having a value that might be interesting. to suppress this you can frequently just do _ = <expr> to suppress output.

Python 3.x - don't count carriage returns with len

I'm writing the following code as part of my practice:
input_file = open('/home/me/01vshort.txt', 'r')
file_content = input_file.read()
input_file.close()
file_length_question = input("Count all characters (y/n)? ")
if file_length_question in ('y', 'Y', 'yes', 'Yes', 'YES'):
print("\n")
print(file_content, ("\n"), len(file_content) - file_content.count(" "))
It's counting carriage returns in the output, so for the following file (01vshort.txt), I get the following terminal output:
Count all characters (y/n)? y
0
0 0
1 1 1
9
...or...
Count all characters (y/n)? y
0
00
111
9
In both cases, the answer should be 6, as there are 6 characters, but I'm getting 9 as the result.
I've made sure the code is omitting whitespace, and have tested this with my input file by deliberately adding whitespace and running the code with and without the line:
- file_content.count(" ")
Can anyone assist here as to why the result is 9 and not 6?
Perhaps it isn't carriage returns at all?
I'm also curious as to why the result of 9 is indented by 1 whitespace? The input file simply contains the following (with a blank line at the end of the file, line numbers indicated in the example):
1. 0
2. 0 0
3. 1 1 1
4.
...or...
1. 0
2. 00
3. 111
4.
Thanks.
If you want to ignore all whitespace characters including tabs and newlines and other control characters:
print(sum(not c.isspace() for c in file_content))
will give you the 6 you expect.
Alternatively you can take advantage of the fact the .split() method with no argument will split a string on any whitespace character. So split it into non-space chunks and then join them all back together again without the whitespace characters:
print(len(''.join(file_content.split())))
You're getting 9 because the content of the file could be interpreted like:
file_content = "0\n0 0\n1 1 1\n"
and you're only matching the white spaces (file_content.count(" ")).
In order to count only the characters you'd either:
read line by line the file, or
use a regexp to match white space.
For the indenting of 9: print processes the commas as outlined here

Pulling a list of lines out of a string

Beginning
Line 2
Line 3
Line 4
Line 5
Line 6
End
Trying to pull off line 2 through line 6. Can't do it to save my soul.
a is the saved string I'm searching through.
b = re.findall(r'Beginning(.*?)End', a)
Doesn't give me a thing, just a blank b. I know it's because of the newlines but how do I go about detecting and moving on forward with the newlines. I've tried, not knowing exactly for sure how I'm suppose to use MULTILINE or DOTALL. Nothing changed.
How do I go about getting it to put lines 2 through 6 in b?
To add in this will occur multiple times through the same file that I need to perform this search and pull technique. I have no other easy way of doing this since the information in Lines 2-6 need to be looked through further to pull off data that will be put into a csv file. Since some of the data contains hours and some of the data doesn't contain hours, aka Unavailable, I need to be able to pull off and differentiate between the two occurrences.
string = """Beginning
Line 2
Line 3
Line 4
Line 5
Line 6
End
"""
lines = string.splitlines()
answer = []
flag = False
for line in lines:
line = line.strip()
if not line: continue
if line == "Beginning":
flag = True
continue
if line == "End": flag = False
if not flag: continue
answer.append(line)
Output:
In [209]: answer
Out[209]: ['Line 2', 'Line 3', 'Line 4', 'Line 5', 'Line 6']
You could make a function that takes a multi-line string, then a starting line, and an ending line.
def Function(string, starting_line, ending_line):
if "\n" in string: #Checks for whether or not string is mult-line
pass
else:
return "The string given isn't a multiline string!" #If string isn't multiline, then Function returns a string explaining that string isn't a multi-line string
if ending_line < starting_line: #Checks if ending_line < starting_line
return "ending_line is greater than starting_line!" #If ending_line < starting_line, then Function returns a string explaining that ending_line > starting_line
array = [] #Defines an array
for i in range(len(string)): #Loops through len(string)
if list(string)[i] = "\n": #Checks whether list(string)[i] = a new line
array.append(i) #Appends i to array
return string[array[starting_line - 1]::array[ending_line - 1]]
print(Function(a, 3, 7))
This code should return:
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6

unix - automatically determine field separator and record (EOL) separator?

Say you have 20 files and you don't won't to look at each one but instead have a script determine the format of the file.
ie bash findFileFormat direcName
Then loops through each file in a directory and print out the filename plus whether it has a delimiter (in which case is it a comma, pipe or otherwise) or fixed with for field separator and then what is the record separator. ie CR, LF, Ctrl+Z character.etc
I was thinking because some files may have a lot of pipes and commas in the data, that it could use a count of each character per line to determine what the delimiter is --> if this process does not produce consistent numbers of the character per line it is safe to assume that the file uses a fixed width field separator.
Is there a command or script that can be used to determine these 2 bits of info for each file?
Here's a small python script that will do as a starting point for what you need:
import sys
separators = [',', '|']
file_name = sys.argv[1]
def sep_cnt(line):
return {sep:line.count(sep) for sep in separators}
with open(file_name, 'r') as inf:
lines = inf.readlines()
cnts = [sep_cnt(line) for line in lines]
print(cnts)
def cnts_red(a, b):
c = {}
for k, v in a.iteritems():
if v > 0 and v == b[k]:
c[k] = v
return c
final = reduce(cnts_red, cnts[1:], cnts[0])
if len(final) == 0:
ftype = 'fixed'
else:
ftype = 'sep by ' + str(final.iteritems().next()[0])
print(ftype)
Name the above heur_sep.py and run this somewhere safe (e.g. /tmp):
# Prepare
rm *.txt
# Commas
cat >f1.txt <<e
a,a,a,a
b,b,b,b
c,c,c,c
e
# Pipes
cat >f2.txt <<e
a|a|a|a
b|b|b|b
c|c|c|c
e
# Fixed width
cat >f3.txt <<e
1 2 3
1 2 3
1 2 3
e
# Fixed width with commas
cat >f4.txt <<e
1, 2 3
1 2, 3
1 2, 3,
e
for i in *.txt; do
echo --- $i
python heur_sep.py $i
done
You would have to do some more work to make this resistant to different kinds of errors, but should be a good starting point. Hope this helps.

Resources