I would like to write strings to a bin file as header.
However, I can only write type 'bytes' to the binary file.
Here is my code:
header1 = str.encode("1\n")
header1 = str.encode("2\n")
print (type(header))
with open("abc.bin",'wb') as f_test:
f_test.write(header1)
f_test.write(header2)
Here are my questions:
1, when I open the abc.bin file using notepad, I can see "1" and "2" but they are not at the separated line. Why is it seems that \n is not functional?
2, in the .bin file, what are the format of "1" and "2". are they strings?
3, I tried pickle and marshal too. However, when I open .bin file, I found something in front of "1" and "2"(like when I used marshal.dump(header1,f_test), it gave me: ?1?2). What are these'?' and where do they come frome?
This is not originally from me, but I get the solution from the comment on this post:
https://pythonconquerstheuniverse.wordpress.com/2011/05/08/newline-conversion-in-python-3/
To Sum up, the newline need to be converted to a byte. i.e. b"\n"
if you try the following, it will print a new line:
header1 = str.encode("1")
header1 = str.encode("2")
print (type(header))
with open("abc.bin",'wb') as f_test:
f_test.write(header1+b"n")
f_test.write(header2+b"n")
Related
I'm trying to write a simple header line in Intel Fortran (containing actual content commas) to an Excel csv. What I'd like to see in the first two columns is:
FMG(1,1) FMG(2,1)
Enclosing each term in quotes "FGM(i,j)" worked when I did it line by line:
Code: write (*,*) "FMG(1,1), kg/s (O2): ", FMG(1,1)
Output: FMG(1,1), kg/s (O2): 0.129000000000000
Some of the things I've tried include:
code: write (10,*) "FMG(1,1)","FMG(2,1)"
csv column output: FMG(1 1)FMG(2 1)
code: write (10,*) "FMG(1,1)" , "FMG(2,1)"
csv column output: FMG(1 1)FMG(2 1) (same thing)
code: write (10,*) " FMG(1,1)," "FMG(2,1)"
csv column output: FMG(1 1) FMG(2,1)
got the 2nd one correctly
CSV by name means Comma Separated Values. If you output "FMG(1,1),FMG(1,2)" then removing the commas, you will get
FMG(1
1)
FMG(1
2)
which is what you are seeing. To include the commas, the strings need to be enclosed in quotes. If you write
write (10,*) '"FMG(1,1)","FMG(2,1)"'
it might achieve what you are looking for.
I have a csv file named Qid-NamedEntityMapping.csv having data like this:
Q1000070 b'Myron V. George'
Q1000296 b'Fred (footballer, born 1979)'
Q1000799 b'Herbert Greenfield'
Q1000841 b'Stephen A. Northway'
Q1001203 b'Buddy Greco'
Q100122 b'Kurt Kreuger'
Q1001240 b'Buddy Lester'
Q1001867 b'Fyodor Stravinsky'
The second column is 'ascii' encoded, and when I am reading the file using the following code, then also it not being read properly:
import chardet
import pandas as pd
def find_encoding(fname):
r_file = open(fname, 'rb').read()
result = chardet.detect(r_file)
charenc = result['encoding']
return charenc
my_encoding = find_encoding('datasets/KGfacts/Qid-
NamedEntityMapping.csv')
df = pd.read_csv('datasets/KGfacts/Qid-
NamedEntityMapping.csv',error_bad_lines=False, encoding=my_encoding)
But the output looks like this:
Also, I tried to use encoding='UTF-8'. but still, the output is the same.
What can be done to read it properly?
Looks like you have an improperly saved TSV file. Once you circumvent the TAB problem (as suggested in my comment), you can convert the column with names to a more suitable representation.
Let's assume that the second column of the dataframe is called "names". The b'XXX' thing is probably a bytes [mis]representation of a string. Convert it to a bytes object with ast.literal_eval and then decode to a string:
import ast
df["names"].apply(ast.literal_eval).apply(bytes.decode)
#0 Myron...
#1 Fred...
Last but not least, your problem has almost nothing to do with encodings or charsets.
Your issue looks like the CSV is actually tab separated; so you need to have sep='\t' in the read_csv function. It's reading everything else as a single column, except "born 1979" in the first row, as that is the only cell with a comma in it.
I have a program that I created with two sections.
The first one copies a text file with an integer in the middle of the file name in this format.
file = "Filename" + "str(int)" + ".txt"
the user can create as many copies of the file that they would like.
The second part of the program is what I am having the problem with. There is an integer at the very bottom of the file that is to correspond with the integer in the file name. After the first part is done, I open each file one at a time in "r+" read/write format. So I can file.seek(1000) to about where the integer is in the file.
Now in my opinion the next part should be easy. I should just simply have to write str(int) into the file right here. But it wasn't that easy. It worked just fine doing it like that in Linux at home, but at work on Windows it proved difficult. What I ended up having to do after file.seek(1000) is write to the file using Unicode UTF-8. I accomplished this with this code snippet of the rest of the program. I will document it so that it is able to be understood what is going on. Instead of having to write this in Unicode, I would love to be able to write this in good old regular English ASCII characters. Eventually this program will be expanded to include a lot more data at the bottom of each file. Having to write the data in Unicode is going to make things extremely difficult. If I just write the data without turning it into Unicode this is the result. This string is supposed to say #2 =1534, instead it says #2 =ㄠ㌵433.
If someone can show me what I am doing wrong that would be great. I would love to just use something like file.write('1534') to write the data to the file instead of having to do it in Unicode UTF-8.
while a1 < d1 :
file = "file" + str(a1) + ".par"
f = open(file, "r+")
f.seek(1011)
data = f.read() #reads the data from that point in the file into a variable.
numList= list(str(a1)) # "a1" is the integer in the file name. I had to turn the integer into a list to accomplish the next task.
replaceData = '\x00' + numList[0] + '\x00' + numList[1] + '\x00' + numList[2] + '\x00' + numList[3] + '\x00' #This line turns the integer into Utf 8 Unicode. I am by no means a Unicode expert.
currentData = data #probably didn't need to be done now that I'm looking at this.
data = data.replace(currentData, replaceData) #replaces the Utf 8 string in the "data" variable with the new Utf 8 string in "replaceData."
f.seek(1011) # Return to where I need to be in the file to write the data.
f.write(data) # Write the new Unicode data to the file
f.close() #close the file
f.close() #make sure the file is closed (sometimes it seems that this fails in Windows.)
a1 += 1 #advances the integer, and then return to the top of the loop
This is an example of writing to a file in ASCII. You need to open the file in byte mode, and using the .encode method for strings is a convenient way to get the end result you want.
s = '12345'
ascii = s.encode('ascii')
with open('somefile', 'wb') as f:
f.write(ascii)
You can obviously also open in rb+ (read and write byte mode) in your case if the file already exists.
with open('somefile', 'rb+') as f:
existing = f.read()
f.write(b'ascii without encoding!')
You can also just pass string literals with the b prefix, and they will be encoded with ascii as shown in the second example.
I have written a script to convert delimiter in the csv file from comma to pipe symbol but while doing so it doesn't remove the extra quotation marks added by csv file.
script is as follows:-
import csv
filename = "sample.csv"
with open(filename,mode='rU') as fin,open('c:\\files\\sample.txt',mode='w') as fout:
reader= csv.DictReader(fin)
writer = csv.DictWriter(fout,reader.fieldnames,delimiter='|')
writer.writeheader()
writer.writerows(reader)
case 1:
Now, for example if one of the field in the csv contains "hi" hows you,good then the csv will make it as """hi" hows you,good"" and python loads it as """hi" hows you,good"" in the text file instead of "hi" hows you,good
case 2:
Whereas for the fields like hi hows, you csv makes it as "hi hows,you" and after running the script it is saved as hi hows,you in the text file which is correct.
Please could you help me to solve case 1.
example csv file when you open it in notepad:-
ID,IDN,DESC,TNO
A019,1,"""Pins "" is dangerous",2
B020,1,"""ache"",headache/fever-like",3
C021,2,stomach cancer,1
D231,3,"hair,""fall""",1
after script result:
ID|IDN|DESC|TNO
A019|1|"""Pins "" is dangerous"|2
B020|1|"""ache"",headache/fever-like"|3
C021|2|stomach cancer|1
D231|3|"hair,""fall"""|1
i want the result as :
ID|IDN|DESC|TNO
A019|1|"Pins " is dangerous|2
B020|1|"ache",headache/fever-like|3
C021|2|stomach cancer|1
D231|3|hair,"fall"|1
that works:
writer = csv.DictWriter(fout,reader.fieldnames,delimiter='|',quoting=csv.QUOTE_NONE,quotechar="")
defining the quoting as "no quoting": quoting=csv.QUOTE_NONE
defining the quote char as "no quote char": quotechar=""
result
ID|IDN|DESC|TNO
A019|1|"Pins " is dangerous|2
B020|1|"ache",headache/fever-like|3
C021|2|stomach cancer|1
D231|3|hair,"fall"|1
note that quoting is useful. So disabling it exposes you to the joy of "delimiters in the fields". It's up to you to make sure it's not going to happen.
I have this code:
f=open('myfile.txt','r')
name=[]
for line in f:
name.append(line)
for i in range (len(name)):
print("hola"+name[i]+".txt".format((name[i]).strip("\r\n")))
myfile has two rows separated by newline, like this:
Da
Df
And I would like this ouput:
holaDa.txt
holaDf.txt
But instead, I have this:
holaD4I5M4
.txt
holaD4i5J8.txt
And I tryed several things to avoid the newline before the ".", but nothing seems to work.
Thank for your help! I am very new in Python, sorry!
You're stripping the string correctly, but instead of outputting the stripped value, you're outputting the original value. Using using "hola{0}.txt" format instead of outputting the original value with "hola"+name[i]+".txt" will output the correct - stripped - string;
for i in range (len(name)):
print("hola{0}.txt".format((name[i]).strip("\r\n")))