GNU Parallel with Python Script - command line variables not working - python-3.x

This is the first time I am trying to do python execution in GNU parallel.
I have the below python script. I am trying to run it in parallel with a text.txt document loading the variables. The text document has the variables one on each line.
I execute the below script with this code:
parallel --bar -a PairNames.txt python3 CreateDataTablePythonScriptv2.py
Here is the python script being executed:
import sqlite3
import sys
PairName = sys.argv[1]
print(PairName)
DTBLocation = '//mnt//c//Users//Jonathan//OneDrive - Mazars in Oman//Trading//Systems//FibMatrix//Testing Trade Analysis//SQLite//Trade Analysis.db
connection = sqlite3.connect(DTBLocation)
cursor = connection.cursor()
TableName = PairName+'_DATA'
print(TableName)
cursor.execute("""CREATE TABLE IF NOT EXISTS {}
(
Date_Time INTEGER,
Open REAL,
Max_60m_Box REAL
)""".format(TableName))
connection.commit()
connection.close()
It executes correctly the first variable just fine. But the remainder of the variables do print correctly from the print command for the PairName, but for print(TableName) I get the below displays:
GBPUSD
_DATAD
USDCHF
_DATAF
NZDJPY
_DATAY
Its weird to me that it prints the PairName just fine and correctly, but then the PairName does not show up when concating the TableName.
Also, its weird that an extra letter gets added to the end of DATA for each one. It appears that the extra letter at the end of the DATA is the last letter of the input variable. I don't know why its choping the 5 letters off and how it puts it at the end of the DATA.
I printed the tablename.
I watched this video at https://www.youtube.com/watch?v=OpaiGYxkSuQ&ab_channel=OleTange[^]
I tried moving the TableName concat to right under the PairName
I printed the type of the PairName, and it is a string
I tried seperating the varibales in the txt document by tabs and commas instead of next line
I tried assigning the "_DATA" to a variable and then concating the two objects. But it had same result:
TableEnd = '_DATA'
TableName = PairName + TableEnd
If I remove the concat of PairName+'_DATA' and just use PairName only as the TableName, then it works correctly.
Sorry if this is a simple answer, but I cannot figure it out and especially since there is not too much documentation / tutorials for a newbie on GNU Parallel in this situation. Thanks for the help!

The input file is not in DOS format (i.e. ends in a CRLF rather than just an LF)? I checked this using the FILE command:
$ file test.txt
test.txt: ASCII text, with CRLF line terminators
$
Since it was CRLF (DOS format), I converted it using tr:
Copy Codetr -d '\r' < input.file > output.file```

Related

sed gives unknown command error: char 1: unknown command: `''

i am trying to use sed to do some text processing in a file called host
cluster_ip = "10.223.10.21"
srv_domain = "service_domain.svc"
cmd = f"'/^.*{srv_domain}/!p;$a'{cluster_ip}'\t{srv_domain}'"
Then I am calling it like this
subprocess.call(["/usr/bin/sed", "-i", "-ne", cmd, "host"])
But i am getting this error:
/usr/bin/sed: -e expression #1, char 1: unknown command: `''
Could someone please explain me what am I doing wrong?
Thank You
I also tried using fileinput but i am unable to print print(f"{cluster_ip}\t{srv_domain}\n") to the file instead this is going to the console.
cluster_ip = "123.234.45.5"
srv_domain = "service_domain.svc"
def main():
pattern = '^.*service_domain.svc'
filename = "host1"
matched = re.compile(pattern).search
with fileinput.FileInput(filename, inplace=1) as file:
for line in file:
if not matched(line): # save lines that do not match
print(line, end='') # this goes to filename due to inplace=1
# this is getting printed in console
print(f"{cluster_ip}\t{srv_domain}\n")
main()
I suppose you want to remove the first line and add a last line. You don't need to protect arguments, it's already done by the subprocess module. so you're getting the quotes literally.
quickfix:
cmd = f"/^.*{srv_domain}/!p;$a{cluster_ip}\t{srv_domain}"
better: learn to use python to avoid calling sed in your script and make them complex and non portable. You don't even need regexes here, just substring search (which could be improved with regexes to avoid substring match but the problem is already present in the original expression)
First read your file, drop the line where srv_domain is defined, and add your last line.
Something like this, using a temporary file to hold the modified contents, then overwriting it:
with open("hosts") as fr,open("hosts2","w") as fw:
for line in fr:
if not srv_domain in line:
fw.write(line)
fw.write(f"{cluster_ip}\t{srv_domain}\n")
os.remove("hosts")
os.rename("hosts2","hosts")

How to print string with embedded whitespace in python3

Assume the following string:
s = '\r\nthis is the second line\r\n\tthis line is indented\r\n'
If I run Python (v. 3.8.6) interactively, printing this string works as expected:
>>> print(s)
this is the second line
this line is indented
>>>
But when I print this string to a file (ie, print(s, file = "string.txt")), the embedded whitespace is not interpreted and the text file contains the string literally (with "\t" instead of a tab, etc).
How can I get the same interactive output written to file? Attempts using str(), f-strings, and format() were unsuccessful.
this worked for me:
with open('file.txt','w') as file:
print('\r\nthis is the second line\r\n\tthis line is indentedd\r\n',file=file)

Remove that extra line (called a newline) when printing in python3

I'm new bee in python 3 and stuck here to remove \n while compiling code as given below, want to return two random lines with out printing \n and w/o square bracket [ ], what should i do?
code is
import random
def head():
f = open("quotes.txt")
quotes = f.readlines()
f.close()
last=18
print(random.sample(quotes,2))
if __name__== "__main__":
head()
And executed this file the result returned as selected two random lines it is fine for me, but in the format like this included \n
['IMPOSSIBLE says itself I M POSSIBLE\n', 'Never stops to Learning till dead end\n']
You are getting results like ['IMPOSSIBLE says itself I M POSSIBLE\n', 'Never stops to Learning till dead end\n'] is because it is list and you directly printing list as it is.
Solution
Remove print(random.sample(quotes,2)) and add following code
tmp = random.sample(quotes,2)
for i in tmp:
print(i,end="")
This will solve your problem and end in print is because your quotes already has newline so we are preventing print from inserting extra \n.
It's resolved!!!
I ran the code by typing command python which it was taken as python 2.7 and returned as this type of junk result, but it works fine as executed with python3 command.

str.format places last variable first in print

The purpose of this script is to parse a text file (sys.argv[1]), extract certain strings, and print them in columns. I start by printing the header. Then I open the file, and scan through it, line by line. I make sure that the line has a specific start or contains a specific string, then I use regex to extract the specific value.
The matching and extraction work fine.
My final print statement doesn't work properly.
import re
import sys
print("{}\t{}\t{}\t{}\t{}".format("#query", "target", "e-value",
"identity(%)", "score"))
with open(sys.argv[1], 'r') as blastR:
for line in blastR:
if line.startswith("Query="):
queryIDMatch = re.match('Query= (([^ ])+)', line)
queryID = queryIDMatch.group(1)
queryID.rstrip
if line[0] == '>':
targetMatch = re.match('> (([^ ])+)', line)
target = targetMatch.group(1)
target.rstrip
if "Score = " in line:
eValue = re.search(r'Expect = (([^ ])+)', line)
trueEvalue = eValue.group(1)
trueEvalue = trueEvalue[:-1]
trueEvalue.rstrip()
print('{0}\t{1}\t{2}'.format(queryID, target, trueEvalue), end='')
The problem occurs when I try to print the columns. When I print the first 2 columns, it works as expected (except that it's still printing new lines):
#query target e-value identity(%) score
YAL002W Paxin1_129011
YAL003W Paxin1_167503
YAL005C Paxin1_162475
YAL005C Paxin1_167442
The 3rd column is a number in scientific notation like 2e-34
But when I add the 3rd column, eValue, it breaks down:
#query target e-value identity(%) score
YAL002W Paxin1_129011
4e-43YAL003W Paxin1_167503
1e-55YAL005C Paxin1_162475
0.0YAL005C Paxin1_167442
0.0YAL005C Paxin1_73182
I have removed all new lines, as far I know, using the rstrip() method.
At least three problems:
1) queryID.rstrip and target.rstrip are lacking closing ()
2) Something like trueEValue.rstrip() doesn't mutate the string, you would need
trueEValue = trueEValue.rstrip()
if you want to keep the change.
3) This might be a problem, but without seeing your data I can't be 100% sure. The r in rstrip stands for "right". If trueEvalue is 4e-43\n then it is true the trueEValue.rstrip() would be free of newlines. But the problem is that your values seem to be something like \n43-43. If you simply use .strip() then newlines will be removed from either side.

python : How to allow python to not add quotation marks while changing delimiter in the csv file

I have written a script to convert delimiter in the csv file from comma to pipe symbol but while doing so it doesn't remove the extra quotation marks added by csv file.
script is as follows:-
import csv
filename = "sample.csv"
with open(filename,mode='rU') as fin,open('c:\\files\\sample.txt',mode='w') as fout:
reader= csv.DictReader(fin)
writer = csv.DictWriter(fout,reader.fieldnames,delimiter='|')
writer.writeheader()
writer.writerows(reader)
case 1:
Now, for example if one of the field in the csv contains "hi" hows you,good then the csv will make it as """hi" hows you,good"" and python loads it as """hi" hows you,good"" in the text file instead of "hi" hows you,good
case 2:
Whereas for the fields like hi hows, you csv makes it as "hi hows,you" and after running the script it is saved as hi hows,you in the text file which is correct.
Please could you help me to solve case 1.
example csv file when you open it in notepad:-
ID,IDN,DESC,TNO
A019,1,"""Pins "" is dangerous",2
B020,1,"""ache"",headache/fever-like",3
C021,2,stomach cancer,1
D231,3,"hair,""fall""",1
after script result:
ID|IDN|DESC|TNO
A019|1|"""Pins "" is dangerous"|2
B020|1|"""ache"",headache/fever-like"|3
C021|2|stomach cancer|1
D231|3|"hair,""fall"""|1
i want the result as :
ID|IDN|DESC|TNO
A019|1|"Pins " is dangerous|2
B020|1|"ache",headache/fever-like|3
C021|2|stomach cancer|1
D231|3|hair,"fall"|1
that works:
writer = csv.DictWriter(fout,reader.fieldnames,delimiter='|',quoting=csv.QUOTE_NONE,quotechar="")
defining the quoting as "no quoting": quoting=csv.QUOTE_NONE
defining the quote char as "no quote char": quotechar=""
result
ID|IDN|DESC|TNO
A019|1|"Pins " is dangerous|2
B020|1|"ache",headache/fever-like|3
C021|2|stomach cancer|1
D231|3|hair,"fall"|1
note that quoting is useful. So disabling it exposes you to the joy of "delimiters in the fields". It's up to you to make sure it's not going to happen.

Resources