Python: Keep trailing zeroes while concatenating two hex strings - python-3.x

I have a csv file which contains data in two columns. The data is in decimal format.
I am trying to convert the data into hexadecimal format, then concatenate it.
I am able to convert and concatenate when data in column2 is non-zero.
For example: Column1 = 52281 and Column2 = 49152, then i am able to get CC39C000. (hex(52281) = CC39 and hex(49152) = C000).
However, if data in Column2 is zero:-
Column1 = 52281 and Column2 = 0, then i get CC390 instead of CC390000.
Following is my code snippet:-
file=open( inputFile, 'r')
reader = csv.reader(file)
for line in reader:
col1,col2=int(line[0]),int(line[1])
newstr = '{:x}'.format(col1)+'{:x}'.format(col2)
When the data in column2 is 0, i am expecting to get 0000.
How can i modify my code to achieve this??

If you have
a=52281
b=0
You can convert yo hex and calculate the longest string to fill with zeros
hex_a = hex(a)[2:] # [2:] removes the trailing 0x you might want to use [3:] if you have negative numbers
hex_b = hex(b)[2:]
longest=max(len(hex_a), len(hex_b))
Then you can fill with 0 with the zfill method:
print(hex_a.zfill(longest) + hex_b.zfill(longest))
If you only need 4 characters you can do zfill(4)
If I am trying to adapt your code, which is hard to test because I do not have access to the file,
file=open( inputFile, 'r')
reader = csv.reader(file)
for line in reader:
col1,col2=int(line[0]),int(line[1])
hex_col1 = hex(col1)[2:] if col1>=0 else hex(col1)[3:]
hex_col2 = hex(col2)[2:] if col2>=0 else hex(col2)[3:]
longest = max(len(hex_col1), len(hex_col2))
newstr = hex_col1.zfill(longest) + hex_col2.zfill(longest)

Related

Store 2 different encoded data in a file in python

I have 2 types of encoded data
ibm037 encoded - a single delimiter variable - value is ###
UTF8 encoded - a pandas dataframe with 100s of columns.
Example dataframe:
Date Time
1 2
My goal is to write this data into a python file. The format should be:
### 1 2
In this way I need to have all the rows of the dataframe in a python file where the 1st character for every line is ###.
I tried to store this character at the first location in the pandas dataframe as a new column and then write to the file but it throws error saying that two different encodings can't be written to a file.
Tried another way to write it:
df_orig_data = pandas dataframe,
Record_Header = encoded delimiter
f = open("_All_DelimiterOfRecord.txt", "a")
for row in df_orig_data.itertuples(index=False):
f.write(Record_Header)
f.write(str(row))
f.close()
It also doesn't work.
Is this kind of data write even possible? How can I write these 2 encoded data in 1 file?
Edit:
StringData = StringIO(
"""Date,Time
1,2
1,2
"""
)
df_orig_data = pd.read_csv(StringData, sep=",")
Record_Header = "2 "
f = open("_All_DelimiterOfRecord.txt", "a")
for index, row in df_orig_data.iterrows():
f.write(
"\t".join(
[
str(Record_Header.encode("ibm037")),
str(row["Date"]),
str(row["Time"]),
]
)
)
f.close()
I would suggest doing the encoding yourself, and writing a bytes object to the file. This isn't a situation where you can rely on the built-in encoding do it.
That means that the program opens the file in binary mode (ab), all of the constants are byte-strings, and it works with byte-strings whenever possible.
The question doesn't say, but I assumed you probably wanted a UTF8 newline after each line, rather than an IBM newline.
I also replaced the file handling with a context manager, since that makes it impossible to forget to close a file after you're done.
import io
import pandas as pd
StringData = io.StringIO(
"""Date,Time
1,2
1,2
"""
)
df_orig_data = pd.read_csv(StringData, sep=",")
Record_Header = "2 "
with open("_All_DelimiterOfRecord.txt", "ab") as f:
for index, row in df_orig_data.iterrows():
f.write(Record_Header.encode("ibm037"))
row_bytes = [str(cell).encode('utf8') for cell in row]
f.write(b'\t'.join(row_bytes))
# Note: this is an UTF8 newline, not an IBM newline.
f.write(b'\n')

Loop to write strings from specific rows of a panda dataframe into a PDF

I have a blank pdf template and a pandas dataframe with some data. I want to search one column of the df for a specific pattern and if that pattern is found, take the data from that row and write it into a pdf. My search works and is able to find all of the rows with matches and I am generating a new pdf file for each row with a match. However, the second, third, etc. files still contain the data from the previous rows. I'm not sure why these strings are not being overwritten each time I go over the loop. I also tried setting each variable to None at the start of the loop but that did not help.
My df has the format...
Title
Type
H1
H2
H3
s1
blank
--
--
--
s2
261.1_1X
1
2
3
s3
262.1_1X
4
5
6
s4
blank
--
--
--
My code is able to find the patterns (###.#_#X) and take the data from those rows, but the second file would still contain the data from the first row.
Here is a snippet of my code...
df = upload_spreadsheet(file_path, active_sheet_only=True)
text = ' '.join([str(x) for x in combine_dataframe(df)])
pattern = r'((\d+\.)(.*)(X))'
matches = text_search(text, pattern)
head1 = df.iloc[0,2]
head2 = df.iloc[0,3]
for i in matches: #This searches through the second column to match the samples
matchedRow = df[1].str.match(str(i))
rows = matchedRow[matchedRow==True]
val1 = df.iloc[rows.index[0],2]
val2 = df.iloc[rows.index[0],3]
newPDFname = str(df.iloc[rows.index[0],1])
pdf2 = FPDF()
pdf2.add_page()
pdf2.set_font('Arial', 'B', 16)
xoffset = pdf2.x + 20
pdf2.x = xoffset
pdf2.setfillcolor = (0,0,255)
pdf2.multi_cell(0, 10, str(head1)+' '+ str(val1), 0, 'L', fill = False)
pdf2.x = xoffset
pdf2.multi_cell(0, 10, str(head2)+' '+ str(val2), 0, 'L', fill = False)
pdf2.output('temp.pdf', 'F')
pdf2 = PdfFileReader('temp.pdf')
first_page1 = pdf1.getPage(0)
first_page2 = pdf2.getPage(0)
first_page1.mergePage(first_page2)
pdf_writer = PyPDF2.PdfFileWriter()
pdf_writer.addPage(first_page1)
with open(newPDFname+'.pdf', "wb") as filehandle_output:
pdf_writer.write(filehandle_output)
os.remove("temp.pdf")
I'm not sure you are selecting rows properly. Your code looks complicated.
You can select rows based on string pattern match with this line:
foo = df[df['Type'].str.match(pattern)]
pandas.Series.str.match
Pandas: Select rows that match a string

From CSV list to XLSX. Numbers recognise as text not as numbers

I am working with CSV datafile.
From this file I took some specific data. These data convey to a list that contains strings of words but also numbers (saved as string, sigh!).
As this:
data_of_interest = ["string1", "string2, "242", "765", "string3", ...]
I create new XLSX (should have this format) file in which this data have been pasted in.
The script does the work but on the new XLSX file, the numbers (float and int) are pasted in as text.
I could manually convert their format on excel but it would be time consuming.
Is there a way to do it automatically when writing the new XLSX file?
Here the extract of code I used:
## import library and create excel file and the working sheet
import xlsxwriter
workbook = xlsxwriter.Workbook("newfile.xlsx")
sheet = workbook.add_worksheet('Sheet 1')
## take the data from the list (data_of_interest) from csv file
## paste them inside the excel file, in rows and columns
column = 0
row = 0
for value in data_of_interest:
if type(value) is float:
sheet.write_number(row, column, value)
elif type(value) is int:
sheet.write_number(row, column, value)
else:
sheet.write(row, column, value)
column += 1
row += 1
column = 0
workbook.close()
Is the problem related with the fact that the numbers are already str type in the original list, so the code cannot recognise that they are float or int (and so it doesn't write them as numbers)?
Thank you for your help!
Try int(value) or float(value) before if block.
All data you read are strings you have to try to convert them into float or int type first.
Example:
for value in data_of_interest:
try:
value.replace(',', '.') # Note that might change commas to dots in strings which are not numbers
value = float(value)
except ValueError:
pass
if type(value) is float:
sheet.write_number(row, column, line)
else:
sheet.write(row, column, line)
column += 1
row += 1
column = 0
workbook.close()
The best way to do this with XlsxWriter is to use the strings_to_numbers constructor option:
import xlsxwriter
workbook = xlsxwriter.Workbook("newfile.xlsx", {'strings_to_numbers': True})
sheet = workbook.add_worksheet('Sheet 1')
data_of_interest = ["string1", "string2", "242", "765", "string3"]
column = 0
row = 0
for value in data_of_interest:
sheet.write(row, column, value)
column += 1
workbook.close()
Output: (note that there aren't any warnings about numbers stored as strings):

Python: How to Remove range of Characters \x91\x87\xf0\x9f\x91\x87 from File

I have this file with some lines that contain some unicode literals like:
"b'Who\xe2\x80\x99s he?\n\nA fan rushed the field to join the Cubs\xe2\x80\x99 celebration after Jake Arrieta\xe2\x80\x99s no-hitter."
I want to remove those xe2\x80\x99 like characters.
I can remove them if I declare a string that contains these characters but my solutions don't work when reading from a CSV file. I used pandas to read the file.
SOLUTIONS TRIED
1.Regex
2.Decoding and Encoding
3.Lambda
Regex Solution
line = "b'Who\xe2\x80\x99s he?\n\nA fan rushed the field to join the Cubs\xe2\x80\x99 celebration after Jake Arrieta\xe2\x80\x99s no-hitter."
code = (re.sub(r'[^\x00-\x7f]',r'', line))
print (code)
LAMBDA SOLUTION
stripped = lambda s: "".join(i for i in s if 31 < ord(i) < 127)
code2 = stripped(line)
print(code2)
ENCODING SOLUTION
code3 = (line.encode('ascii', 'ignore')).decode("utf-8")
print(code3)
HOW FILE WAS READ
df = pandas.read_csv('file.csv',encoding = "utf-8")
for index, row in df.iterrows():
print(stripped(row['text']))
print(re.sub(r'[^\x00-\x7f]',r'', row['text']))
print(row['text'].encode('ascii', 'ignore')).decode("utf-8"))
SUGGESTED METHOD
df = pandas.read_csv('file.csv',encoding = "utf-8")
for index, row in df.iterrows():
en = row['text'].encode()
print(type(en))
newline = en.decode('utf-8')
print(type(newline))
print(repr(newline))
print(newline.encode('ascii', 'ignore'))
print(newline.encode('ascii', 'replace'))
Your string is valid utf-8. Therefore it can be directly converted to a python string.
You can then encode it to ascii with str.encode(). It can ignore non-ascii characters with 'ignore'.
Also possible: 'replace'
line_raw = b'Who\xe2\x80\x99s he?'
line = line_raw.decode('utf-8')
print(repr(line))
print(line.encode('ascii', 'ignore'))
print(line.encode('ascii', 'replace'))
'Who’s he?'
b'Whos he?'
b'Who?s he?'
To come back to your original question, your 3rd method was correct. It was just in the wrong order.
code3 = line.decode("utf-8").encode('ascii', 'ignore')
print(code3)
To finally provide a working pandas example, here you go:
import pandas
df = pandas.read_csv('test.csv', encoding="utf-8")
for index, row in df.iterrows():
print(row['text'].encode('ascii', 'ignore'))
There is no need to do decode('utf-8'), because pandas does that for you.
Finally, if you have a python string that contains non-ascii characters, you can just strip them by doing
text = row['text'].encode('ascii', 'ignore').decode('ascii')
This converts the text to ascii bytes, strips all the characters that cannot be represented as ascii, and then converts back to text.
You should look up the difference between python3 strings and bytes, that should clear things up for you, I hope.

Convert and concatenate data from two columns of a csv file

I have a csv file which contains data in two columns, as follows:
40500 38921
43782 32768
55136 49651
63451 60669
50550 36700
61651 34321
and so on...
I want to convert each data into it's hex equivalent, then concatenate them, and write them into a column in another csv file.
For example: hex(40500) = 9E34, and hex(38921) = 9809.
So, in output csv file, element A1 would be 9E349809
So, i am expecting column A in output csv file to be:
9E349809
AB068000
D760C1F3
F7DBECFD
C5768F5C
F0D38611
I referred a sample code which concatenates two columns, but am struggling with the converting them to hex and then concatenating them. Following is the code:-
import csv
inputFile = 'input.csv'
outputFile = 'output.csv'
with open(inputFile) as f:
reader = csv.reader(f)
with open(outputFile, 'w') as g:
writer = csv.writer(g)
for row in reader:
new_row = [''.join([row[0], row[1]])] + row[2:]
writer.writerow(new_row)
How can i convert data in each column to its hex equivalent, then concatenate them and write them in another file?
You could do this in 4 steps:
Read the lines from the input csv file
Use formatting options to get the hex values of each number
Perform string concatenation to get your result
Write to new csv file.
Sample Code:
with open (outputFile, 'w') as outfile:
with open (inputFile,'r') as infile:
for line in infile: # Iterate through each line
left, right = int(line.split()[0]), int(line.split()[1]) # split left and right blocks
newstr = '{:x}'.format(left)+'{:x}'.format(right) # create new string using hex values excluding '0x'
outfile.write(newstr) # write to output file
print ('Conversion completed')
print ('Closing outputfile')
Sample Output:
In[44] line = '40500 38921'
Out[50]: '9e349809'
ParvBanks solution is good (clear and functionnal), I would simplify it a little like that:
with open (inputFile,'r') as infile, open (outputFile, 'w+') as outfile:
for line in infile:
outfile.write("".join(["{:x}".format(int(v)) for v in line.split()]))

Resources