Very specific output format Python - python-3.x

I am trying to figure out how to print in the terminal specifically as follows:
and then write it to a .txt file.
My Attempt:
I have been using "tabulate" for a while, so my natural go to attempt was to use the library. Here is my code:
from tabulate import tabulate
import numpy as np
lista = np.zeros(4)
print(tabulate([lista, lista], headers=['n', r'$\phi_{n}$', 'a_{n}', 'e_{n}'],
numalign="center"))
with open('table.txt', 'w') as f:
f.write(tabulate([lista, lista], headers=['n', r'$\phi_{n}$', 'a_{n}', 'e_{n}'], numalign="center"))
The above code generates the following result:
Which is nice, but not what i want. I tried to delete the 'headers' parameter, but it still gives me a table containing a header like structure. Furthermore, it does not contain the '&' character I need nor the '\' thing. I suspect i might need to do it manually somehow.
Thanks in advance, Lucas

Related

Raw output data frame manipulation in python

Using python 3 I need to process qPCR sequencing raw data outputs by searching for the first occurrence of a user defined string and then making a new data frame using all lines after that string. I am trying to find solutions in the pandas doc but so far unsuccessful.
This is a raw output .csv file that I need to process. (couldn't paste complete csv as exceeds character limit, this is lines 40-50 and am hoping this text is useful?). I need to tell pandas to create a new data frame that 1. starts at the line containg the first occurance of str("Sample Name") with that line as header and containing all lines following. And then 2., only including columns ("Sample Name"), ("Target Name"), ("CT").
Could someone please help me so that I can use python to help me analyze biological data?
Many thanks,
Luke
40,Quantification Cycle Method,Ct,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
41,Signal Smoothing On,true,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
42,Stage where Melt Analysis is performed,Stage3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
43,Stage/ Cycle where Ct Analysis is performed,"Stage2, Step2",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
44,User Name,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
45,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
46,Well,Well Position,Omit,Sample Name,Target Name,Task,Reporter,Quencher,Quantity,Quantity Mean,SE,RQ,RQ Min,RQ Max,CT,Ct Mean,Ct SD,Delta Ct,Delta Ct Mean,Delta Ct SD,Delta Ct SE,Delta Delta Ct,Automatic Ct Threshold,Ct Threshold,Automatic Baseline,Baseline Start,Baseline End,Amp Status,Comments,Cq Conf,CQCONF,HIGHSD,OUTLIERRG,Tm1,Tm2,Tm3,Tm4
47,1,A1,False,WT1,AtTubulin,UNKNOWN,SYBR,None,,,,,,,23.357698440551758,23.4766845703125,0.5336655378341675,,,,,,True,20959.612776965325,True,3,17,Amp,,0.9588544573203085,N,Y,N,81.40960693359375,,,
48,2,A2,False,WT1,AtTubulin,UNKNOWN,SYBR,None,,,,,,,24.05980110168457,23.4766845703125,0.5336655378341675,,,,,,True,20959.612776965325,True,3,15,Amp,,0.9592687354496955,N,Y,N,81.40960693359375,,,
49,3,A3,False,WT1,AtTubulin,UNKNOWN,SYBR,None,,,,,,,23.012556076049805,23.4766845703125,0.5336655378341675,,,,,,True,20959.612776965325,True,3,16,Amp,,0.9592714462250367,N,Y,N,81.40960693359375,,,
50,4,A4,False,fla11fla12-1,AtTubulin,UNKNOWN,SYBR,None,,,,,,,23.803699493408203,24.419523239135742,0.5669151544570923,,,,,,True,20959.612776965325,True,3,17,Amp,,0.9671570584141241,N,Y,N,81.40960693359375,,,
This is the code that I have so far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_excel ("2019-02-27_161601 AtWAKL8 different version expressions.xls", sheet_name='Results').fillna(0)
data.to_csv('df1' + '.csv', index=True)
df1 = pd.read_csv ("df1.csv")
You are having trouble with quoting.
grep is a better fit for .csv files rather than .xlsx
You are forking off a shell subprocess with a filename argument,
without correctly quoting the spaces in the filename.
It would be simplest to rename it, turning spaces into dashes,
e.g. 2019-02-27_161601-AtWAKL8-different-version-expressions.xls
As it stands, you are trying to grep the string "Position"
from a file named 2019-02-27_161601,
and from a 2nd file named AtWAKL8,
a 3rd named different, and so on,
which is unlikely to work.
An .xlsx spreadsheet is not the line-oriented
text format that grep expects.
You will be happier if you export or Save As .csv format within Excel,
or if you execute data.to_csv('expressions.csv')

How to write this type of data to csv

im a noobie in python. I want to get some data from csv with pandas and after write a new csv file with extra data in format
"type";"currency";"amount";"comment"
"type1";"currency1";"amount1";"comment1"
etc
import pandas as pd
import csv
req=pd.read_csv('/Users/user/web/python/Bookcopy.csv')
type="type"
comment = "2week"
i=0
while i<3:
Currency = req['Currency'].values[i]
ReqAmount = req['Request'].values[i]
r = round(ReqAmount,-1)
i+=1
data =[type,Currency,r,comment]
#print(data)
csv_file = open('data2.csv', 'w')
with csv_file:
writer = csv.writer(csv_file)
writer.writerow(data)
print("DONE")
writer.writerows(data)
_csv.Error: iterable expected, not numpy.float64
I have multiple things to criticize here. I hope it doesn't come across mean, but rather educational. While your code would work regardless of those points, it is good coding style to follow them.
Variables should not start with a capital letter. Currency and ReqAmount should be currency and reqAmount.
Don't use keywords for variable names. type is a python keyword.
Make sure your formatting doesn't get destroyed when posting here. Especially for python, which relies on tab formatting. Read here for more information: https://stackoverflow.com/editing-help#code
That said, let me try to go through your code and give you tips and tricks:
Don't run code in python directly. Always use the main() method. It's just better coding style.
When looping, don't use the i=0; while i<3; i+=1 construct, rather use for i in range(3). While it works, it is not very pythonic and a lot harder to read.
Never assume anything that cannot be guaranteed in your code. In this case, you assume that the csv-file has at least 3 lines, otherwise your program would crash. Instead, read the number of lines from the csv file, with len(req).
data =[type,Currency,r,comment] keeps overwriting your data variable. You could either append to data and then write everything to the output file at the end, or directly write to the output file in every iteration.
Don't use open to create a variable (except absolutely necessary). Instead, use open in a with statement. This ensures that the file will get closed properly. I have seen that you do use the with statement, nonetheless you usually use it like with open(...) as variable_name:.
CSV files usually start with the column names. Therefore you should write the column names before you write data.
I won't fix that, because it will change the appearance of the program completely, but normally, don't mix different libraries. If you use pandas for csv reading, also use it for writing. If you use the csv library for writing, also use it for reading. While it isn't wrong to mix them, it is bad style and creates more dependencies than would be necessary.
I don't really understand what your code is supposed to do, so I just guess and hope it goes in the right direction.
When fixing all those points, you might have something like that:
import pandas as pd
import csv
def main():
req=pd.read_csv('/Users/user/web/python/Bookcopy.csv')
transferType = "type"
comment = "2week"
with open('data2.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(["type","currency","amount","comment"])
for i in range(len(req)):
currency = req['Currency'].values[i]
reqAmount = req['Request'].values[i]
r = round(reqAmount,-1)
data = [transferType,currency,r,comment]
#print(data)
writer.writerow(data)
print("DONE")
# Whenever you run a program, __name__ will be set to '__main__' in the initial
# script. This makes it easier later when you work with multiple code files.
if __name__ == '__main__':
main()

How to remove special usual characters from a pandas dataframe using Python

I have a file some crazy stuff in it. It looks like this:
I attempted to get rid of it using this:
df['firstname'] = map(lambda x: x.decode('utf-8','ignore'), df['firstname'])
But I wound up with this in my dataframe: <map object at 0x0000022141F637F0>
I got that example from another question and this seems to be the Python3 method for doing this but I'm not sure what I'm doing wrong.
Edit: For some odd reason someone thinks that this has something to do with getting a map to return a list. The central issue is getting rid of non UTF-8 characters. Whether or not I'm even doing that correctly has yet to be established.
As I understand it, I have to apply an operation to every character in a column of the dataframe. Is there another technique or is map the correct way and if it is, why am I getting the output I've indicated?
Edit2: For some reason, my machine wouldn't let me create an example. I can now. This is what i'm dealing with. All those weird characters need to go.
import pandas as pd
data = [['🦎Ale','Αλέξανδρα'],['��Grain','Girl🌾'],['Đỗ Vũ','ên Anh'],['Don','Johnson']]
df = pd.DataFrame(data,columns=['firstname','lastname'])
print(df)
Edit 3: I tired doing this using a reg ex and for some reason, it still didn't work.
df['firstname'] = df['firstname'].replace('[^a-zA-z\s]',' ')
This regex works FINE in another process, but here, it still leaves the ugly characters.
Edit 4: It turns out that it's image data that we're looking at.

How To Compare two .ico (icon) files in python?

Although i am new to python i just want to compare two .ico extension files.
Anyone with the expertise can tell how can i do that?
Is there any package or library readily available in python to do so ?
Thanks for reading the question. Your suggestions will be appreciated.
What I am currently doing is as follows but It is not giving me what i expect :
import cv2
import numpy as np
Original = cv2.imread("1.ico")
Edited = cv2.imread("chrome.ico")
diff = cv2.subtract(Original, Edited)
cv2.imwrite("diff.jpg", diff)
If you just want to check if files have changes, you can use hashlib of python to get it. The below code finds hash:
import hashlib
h = hashlib.md5()
with open('ico_file.ico', 'rb') as f:
buffer = f.read()
h.update(buffer)
print(buffer) # May not be needed
print(h.hexdigest())
Use the above code for the two files you want to compare and then match their output hash. If it's the same, then files are very likely to be same. If different, then they are definitely different.

How to import .dta via pandas and describe data?

I am new to python and have a simple problem. In a first step, I want to load some sample data I created in Stata. In a second step, I would like to describe the data in python - that is, I'd like a list of the imported variable names. So far I've done this:
from pandas.io.stata import StataReader
reader = StataReader('sample_data.dta')
data = reader.data()
dir()
I get the following error:
anaconda/lib/python3.5/site-packages/pandas/io/stata.py:1375: UserWarning: 'data' is deprecated, use 'read' instead
warnings.warn("'data' is deprecated, use 'read' instead")
What does it mean and how can I resolve the issue? And, is dir() the right way to get an understanding of what variables I have in the data?
Using pandas.io.stata.StataReader.data to read from a stata file has been deprecated in pandas 0.18.1 version and hence you are getting that warning.
Instead, you must use pandas.read_stata to read the file as shown:
df = pd.read_stata('sample_data.dta')
df.dtypes ## Return the dtypes in this object
Sometimes this did not work for me especially when the dataset is large. So the thing I propose here is 2 steps (Stata and Python)
In Stata write the following commands:
export excel Cevdet.xlsx, firstrow(variables)
and to copy the variable labels write the following
describe, replace
list
export excel using myfile.xlsx, replace first(var)
restore
this will generate for you two files Cevdet.xlsx and myfile.xlsx
Now you go to your jupyter notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Cevdet.xlsx')
This will allow you to read both files into jupyter (python 3)
My advice is to save this data file (especially if it is big)
df.to_pickle('Cevdet')
The next time you open jupyter you can simply run
df=pd.read_pickle("Cevdet")

Resources