So while reading a CSV-file into python, some of the variables have the following structure:
'"variable"'
I stored them in listed tuples.
Now, some of these variables have to be compared to each other as they are numeric.
But I can't seem to find a way to compare them to each other. For example:
counter = 0
if '"120000"' < '"130000"':
counter += 1
However, the counter remains at 0.
Any advice on how to work with these types of datastructures?
I tryed converting them to integers but this gives my a ValueError.
The original file has the following layout:
Date,"string","string","string","string","integer"
I read the file as follows:
with open(dataset, mode="r") as flight_information:
flight_information_header = flight_information.readline()
flight_information = flight_information.read()
flight_information = flight_information.splitlines()
flight_information_list = []
for lines in flight_information:
lines = lines.split(",")
flight_information_tuple = tuple(lines)
flight_information_list.append(flight_information_tuple)
For people in the future, the following solved my problem:
Since the tuples are immutable I now removed the "" around my numerical values while loading the csv file:
Example:
with open(dataset, mode="r") as flight_information:
flight_information_header = flight_information.readline()
flight_information = flight_information.read()
flight_information = flight_information.splitlines()
flight_information_list = []
for lines in flight_information:
lines = lines.replace('"', '').split(",")
flight_information_tuple = tuple(lines)
flight_information_list.append(flight_information_tuple)
Note this line in particular:
lines = lines.replace('"', '').split(",")
Related
I have a binary file and limited knowledge of the structure of the file. I'd like to unpack the contents of the file, make a change to a value, and then re-pack the modified contents into a new binary file. If I can complete the unpacking successfully, I certainly can modify one of the values; and then I believe I will be able to handle the re-packing to create a new binary file. However, I am having trouble completing the unpacking. This is what I have so far
image = None
one = two = three = four = five = 0
with open(my_file, 'rb') as fil:
one = struct.unpack('i', fil.read(4))[0]
two = struct.unpack('i', fil.read(4))[0]
three = struct.unpack('d', fil.read(8))[0]
four = struct.unpack('d', fil.read(8))[0]
five = struct.unpack('iiii', fil.read(16))
image = fil.read(920)
When I set a breakpoint below the section of code displayed above, I can see that the type of the image variable above is <class 'bytes'>. The type of fil is <class 'io.BufferedReader'>. How can I unpack the data in this image variable?
The recommendation from #Stanislav directly led me to the solution to this problem. Ultimately, I did not need struct unpack/pack to reach my goal. The code below roughly illustrates the solution.
with open(my_file, 'rb') as fil:
data = bytearray(fil.read())
mylist = list(data)
mylist[8] = mylist[8] + 2 #modify some fields
mylist[9] = mylist[9] + 2
mylist[16] = mylist[16] + 3
data = bytearray(mylist)
another_file = open("other_file.bin", "wb")
another_file.write(data)
another_file.close()
I am developing a program which works with a ; separated csv.
When I try to execute the following code
def accomodate(fil, targets):
l = fil
io = []
ret = []
for e in range(len(l)):
io.append(l[e].split(";"))
for e in io:
ter = []
for theta in range(len(e)):
if targets.count(theta) > 0:
ter.append(e[theta])
ret.append(ter)
return ret
, being 'fil' the read rows of the csv file and 'targets' a list which contains the columns to be chosen. While applying the split to the csv file it raises the folowing error: "'l' name is not defined" while as far as I can see the 'l' variable has already been defined.
Does anyone know why this happens? Thanks beforehand
edit
As many of you have requested, I shall provide with an example.
I shall post an example of csv, not a shard of the original one. It comes already listed
k = ["Cookies;Brioche;Pudding;Pie","Dog;Cat;Bird;Fish","Boat;Car;Plane;Skate"]
accomodate(k, [1,2]) = [[Brioche, Pudding], [Cat, Bird], [Car, Plane]]
You should copy the content of fil list:
l = fil.copy()
I am currently trying to compare to text files, to see if they have any words in common in both files.
The text files are as
ENGLISH.TXT
circle
table
year
competition
FRENCH.TXT
bien
competition
merci
air
table
My current code is getting them to print, Ive removed all the unnessecary squirly brackets and so on, but I cant get them to print on different lines.
List = open("english.txt").readlines()
List2 = open("french.txt").readlines()
anb = set(List) & set(List2)
anb = str(anb)
anb = (str(anb)[1:-1])
anb = anb.replace("'","")
anb = anb.replace(",","")
anb = anb.replace('\\n',"")
print(anb)
The output is expected to separate both results onto new lines.
Currently Happening:
Competition Table
Expected:
Competition
Table
Thanks in advance!
- Xphoon
Hi I'd suggest you to try two things as a good practice:
1) Use "with" for opening files
with open('english.txt', 'r') as englishfile, open('french.txt', 'r') as frenchfile:
##your python operations for the file
2) Try to use the "f-String" opportunity if you're using Python 3:
print(f"Hello\nWorld!")
File read using "open()" vs "with open()"
This post explains very well why to use the "with" statement :)
And additionally to the f-strings if you want to print out variables do it like this:
print(f"{variable[index]}\n variable2[index2]}")
Should print out:
Hello and World! in seperate lines
Here is one solution including converting between sets and lists:
with open('english.txt', 'r') as englishfile, open('french.txt', 'r') as frenchfile:
english_words = englishfile.readlines()
english_words = [word.strip('\n') for word in english_words]
french_words = frenchfile.readlines()
french_words = [word.strip('\n') for word in french_words]
anb = set(english_words) & set(french_words)
anb_list = [item for item in anb]
for item in anb_list:
print(item)
Here is another solution by keeping the words in lists:
with open('english.txt', 'r') as englishfile, open('french.txt', 'r') as frenchfile:
english_words = englishfile.readlines()
english_words = [word.strip('\n') for word in english_words]
french_words = frenchfile.readlines()
french_words = [word.strip('\n') for word in french_words]
for english_word in english_words:
for french_word in french_words:
if english_word == french_word:
print(english_word)
I'm getting completely confused with string encodings in Python. I read a number of other answers, but none show, what is really going on in the last three lines of the code below:
filename = "/path/to/file.txt" #textfile contains only the string "\bigcommand"
with open(filename,'r') as f:
file = list(f)
val = file[0] #val = '\\bigcommand\n'
valnew = val.encode('unicode-escape') #valnew = b'\\\\bigcommand\\n'
valnewnew = str(valnew,'utf-8') #valnewnew = '\\\\bigcommand\\n'
Why is the valnew variable suddenly a bytestring? I thought it would be the same as before - but just with the escape characters doubled?
Is there a shorter way to do this, than the convoluted last three lines, in order to get the output of valnewnew?
This will get you the output of valnewnew:
val = file[0].encode('unicode-escape').decode()
with open('t', 'r') as f:
file = list(f)
val = file[0].encode('unicode-escape').decode() # value: '\\\\bigcommand\\n'
When you encode a string in python3.x, you're encoding the string into bytes which then needed to be subsequently decoded to get a string back as a result.
If you give some insight into what you're trying to do, I can try expand.
I tried to digest lines of a DictReader object after I read in a 60 MB csv file. I asked the question here: how to chunk a csv (dict)reader object in python 3.2?. (Code repated below.)
However, now I realize that chunking up the original text file might as well do the trick (and do the DictRead and the line-by-line digest later on). However, I found no io tool that multiprocessing.Pool could use.
Thanks for any thoughts!
source = open('/scratch/data.txt','r')
def csv2nodes(r):
strptime = time.strptime
mktime = time.mktime
l = []
ppl = set()
for row in r:
cell = int(row['cell'])
id = int(row['seq_ei'])
st = mktime(strptime(row['dat_deb_occupation'],'%d/%m/%Y'))
ed = mktime(strptime(row['dat_fin_occupation'],'%d/%m/%Y'))
# collect list
l.append([(id,cell,{1:st,2: ed})])
# collect separate sets
ppl.add(id)
return (l,ppl)
def csv2graph(source):
r = csv.DictReader(source,delimiter=',')
MG=nx.MultiGraph()
l = []
ppl = set()
# Remember that I use integers for edge attributes, to save space! Dic above.
# start: 1
# end: 2
p = Pool(processes=4)
node_divisor = len(p._pool)*4
node_chunks = list(chunks(r,int(len(r)/int(node_divisor))))
num_chunks = len(node_chunks)
pedgelists = p.map(csv2nodes,
zip(node_chunks))
ll = []
for l in pedgelists:
ll.append(l[0])
ppl.update(l[1])
MG.add_edges_from(ll)
return (MG,ppl)