How to convert a SAP .txt extraction into a .csv file - python-3.x

I have a .txt file as in the example reported below. I would like to convert it into a .csv table, but I'm not having much success.
Mack3 Line Item Journal Time 14:22:33 Date 03.10.2015
Panteni Ledger 1L TGEPIO00/CANTINAOAS Page 20.001
--------------------------------------------------------------------------------------------------------------------------------------------
| Pstng Date|Entry Date|DocumentNo|Itm|Doc..Date |BusA|PK|SG|Sl|Account |User Name |LCurr| Amount in LC|Tx|Assignment |S|
|------------------------------------------------------------------------------------------------------------------------------------------|
| 07.01.2014|07.02.2014|4919005298| 36|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 0,85 | |20140107 | |
| 07.01.2014|07.02.2014|4919065298| 29|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 2,53 | |20140107 | |
| 07.01.2014|07.02.2014|4919235298| 30|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 30,00 | |20140107 | |
| 07.01.2014|07.02.2014|4119005298| 32|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 1,00 | |20140107 | |
| 07.01.2014|07.02.2014|9019005298| 34|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 11,10 | |20140107 | |
|------------------------------------------------------------------------------------------------------------------------------------------|
The file in question is structure as a report from SAP. Practicing with python and looking in other posts I found this code:
with open('file.txt', 'rb') as f_input:
for line in filter(lambda x: len(x) > 2 and x[0] == '|' and x[1].isalpha(), f_input):
header = [cols.strip() for cols in next(csv.reader(StringIO(line), delimiter='|', skipinitialspace=True))][1:-1]
break
with open('file.txt', 'rb') as f_input, open(str(ii + 1) + 'output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
for line in filter(lambda x: len(x) > 2 and x[0] == '|' and x[1] != '-' and not x[1].isalpha(), f_input):
csv_input = csv.reader(StringIO(line), delimiter='|', skipinitialspace=True)
csv_output.writerow(csv_input)
Unfortunately it does not work for my case. In fact it creates empty .csv files and it seems to not read properly the csv_input.
Any possible solution?

Your input file can be treated as CSV once we filter out a few lines, namely the ones that do not start with a pipe symbol '|' followed by a space ' ', which would leave us with this:
| Pstng Date|Entry Date|DocumentNo|Itm|Doc..Date |BusA|PK|SG|Sl|Account |User Name |LCurr| Amount in LC|Tx|Assignment |S|
| 07.01.2014|07.02.2014|4919005298| 36|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 0,85 | |20140107 | |
| 07.01.2014|07.02.2014|4919065298| 29|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 2,53 | |20140107 | |
| 07.01.2014|07.02.2014|4919235298| 30|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 30,00 | |20140107 | |
| 07.01.2014|07.02.2014|4119005298| 32|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 1,00 | |20140107 | |
| 07.01.2014|07.02.2014|9019005298| 34|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 11,10 | |20140107 | |
Your output is mainly empty because that x[1].isalpha() check is never true on this data. The character in position 1 on each line is always a space, never alphabetic.
It's not necessary to open the input file multiple times, we can read, filter and write to the output in one go:
import csv
ii = 0
with open('file.txt', 'r', encoding='utf8', newline='') as f_input, \
open(str(ii + 1) + 'output.csv', 'w', encoding='utf8', newline='') as f_output:
input_lines = filter(lambda x: len(x) > 2 and x[0] == '|' and x[1] == ' ', f_input)
csv_input = csv.reader(input_lines, delimiter='|')
csv_output = csv.writer(f_output)
for row in csv_input:
csv_output.writerow(col.strip() for col in row[1:-1])
Notes:
You should not use binary mode when reading text files. Use r and w modes, respectively, and explicitly declare the file encoding. Choose the encoding that is the right one for your files.
For work with the csv module, open files with newline='' (which lets the csv module pick the correct line endings)
You can wrap multiple files in the with statements using the \ at the end of the line.
StringIO is completely unnecesary.
I'm not using skipinitialspace=True because some of the columns also have spaces at the end. Therefore I'm calling .strip() manually on each value when writing the row.
The [1:-1] is necessary to get rid of the superfluous empty columns (before the first and after the last | in the input)
Output is as follows
Pstng Date,Entry Date,DocumentNo,Itm,Doc..Date,BusA,PK,SG,Sl,Account,User Name,LCurr,Amount in LC,Tx,Assignment,S
07.01.2014,07.02.2014,4919005298,36,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"0,85",,20140107,
07.01.2014,07.02.2014,4919065298,29,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"2,53",,20140107,
07.01.2014,07.02.2014,4919235298,30,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"30,00",,20140107,
07.01.2014,07.02.2014,4119005298,32,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"1,00",,20140107,
07.01.2014,07.02.2014,9019005298,34,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"11,10",,20140107,

Related

What is wrong in my script in this Python 3.x program?

import random
word_list = ["elma", "armut","kalem"]
chosen_word = random.choice(word_list)
stages = ['''
+---+
| |
O |
/|\ |
/ \ |
|
=========
''', '''
+---+
| |
O |
/|\ |
/ |
|
=========
''', '''
+---+
| |
O |
/|\ |
|
|
=========
''', '''
+---+
| |
O |
/| |
|
|
=========''', '''
+---+
| |
O |
| |
|
|
=========
''', '''
+---+
| |
O |
|
|
|
=========
''', '''
+---+
| |
|
|
|
|
=========
''']
map_1 = []
count = 0
end = False
for x in range(len(chosen_word)):
map_1.append("_")
if "_" not in map_1:
end = True
Print("YOU WON")
if count == 6 and "_" in map_1:
print("YOU LOST")
end = True
while end == False:
guess = input("Guess a letter ").lower()
count += 1
print(stages[- count])
for letter in chosen_word:
if guess == letter:
a = chosen_word.index(letter)
map_1[a] = guess
print(map_1)
else:
continue
print(map_1)
it does not stop when you guess all the letters,
it gives IndexError: list index out of range every time ı execute the program,I didn't write a part for what happens when you make a wrong guess now ı do the correct guess every time but still it ends like this.I just started the learn in this part ı am stuck ı don't know how to solve this.
you need to move the conditions into the while loop, otherwise, the program doesn't skip the while loop (end always is False).
while end == False:
guess = input("Guess a letter ").lower()
count += 1
print(stages[- count])
for letter in chosen_word:
if guess == letter:
a = chosen_word.index(letter)
map_1[a] = guess
print(map_1)
else:
continue
//end for loop
if "_" not in map_1:
end = True
print("YOU WON")
if count == 6 and "_" in map_1:
print("YOU LOST")
end = True

invalid string interpolation: `$$', `$'ident or `$'BlockExpr expected -> Spark SQL

The error I am getting:
invalid string interpolation: `$$', `$'ident or `$'BlockExpr expected
Spark SQL:
val sql =
s"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin
val df = spark.sql(sql)
df.show(10, false)
You added s prefix which means you want the string be interpolated. It means all tokens prefixed with $ will be replaced with the local variable with the same name. From you code it looks like you do not use this feature, so you could just remove s prefix from the string:
val sql =
"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin
Otherwise if you really need the interpolation you have to quote $ sign like this:
val sql =
s"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin

Write pyspark dataframe to file keeping nested quotes, but not "outer" ones?

Is there a way to preserve nested quotes in pyspark dataframe value when writing to file (in my case, a TSV) while also getting rid of the "outer" ones (ie. those that denote a string value in a column)?
>>> dff = sparkSession.createDataFrame([(10,'this is "a test"'), (14,''), (16,'')], ["age", "comments"])
>>> dff.show()
+---+----------------+
|age| comments|
+---+----------------+
| 10|this is "a test"|
| 14| |
| 16| |
+---+----------------+
>>> dff.write\
.mode('overwrite')\
.option("sep", "\t")\
.option("quoteAll", "false")\
.option("emptyValue", "").option("nullValue", "")\
.csv('/tmp/test')
then
$ cat /tmp/test/part-000*
10 "this is \"a test\""
14
16
# what I'd want to see is
10 this is "a test"
14
16
# because I am later parsing based only on TAB characters, so the quote sequences are not a problem in that regard
Is there any way to write the dataframe in this desired format?
* as aside, more info about the args used can be found here
Set the escapeQuotes option to false:
>>> dff = spark.createDataFrame([(10,'this is "a test"'), (14,''), (16,'')], ["age", "comments"])
>>> dff.show()
+---+----------------+
|age| comments|
+---+----------------+
| 10|this is "a test"|
| 14| |
| 16| |
+---+----------------+
>>> dff.write\
... .mode('overwrite')\
... .option("sep", "\t")\
... .option("quoteAll", "false")\
... .option("emptyValue", "").option("nullValue", "")\
... .option("escapeQuotes", "false").csv('/tmp/test')
>>>
➜ ~ cd /tmp/test
➜ test ls
_SUCCESS part-00001-f702e661-15c2-4ab9-aef2-8dad5d923412-c000.csv part-00003-f702e661-15c2-4ab9-aef2-8dad5d923412-c000.csv
part-00000-f702e661-15c2-4ab9-aef2-8dad5d923412-c000.csv part-00002-f702e661-15c2-4ab9-aef2-8dad5d923412-c000.csv
➜ test cat part*
10 this is "a test"
14
16
➜ test

OpenCV - Thin Plate Spline

How to convert an image from one shape to other using thin plate spline in opencv python3. in c++ we have shape transformer class. in opencv python3 how can we implement it.
Thin plate spline indeed exists for opencv in python3.
You can use help function to get more info on which functions exist and how to use them like this:
>>> help(cv2.createThinPlateSplineShapeTransformer()) ## () braces matter !!
Help on ThinPlateSplineShapeTransformer object:
class ThinPlateSplineShapeTransformer(ShapeTransformer)
| Method resolution order:
| ThinPlateSplineShapeTransformer
| ShapeTransformer
| Algorithm
| builtins.object
|
| Methods defined here:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate
signature.
|
| __repr__(self, /)
| Return repr(self).
|
| getRegularizationParameter(...)
| getRegularizationParameter() -> retval
|
| setRegularizationParameter(...)
| setRegularizationParameter(beta) -> None
|
| ----------------------------------------------------------------------
| Methods inherited from ShapeTransformer:
|
| applyTransformation(...)
| applyTransformation(input[, output]) -> retval, output
|
| estimateTransformation(...)
| estimateTransformation(transformingShape, targetShape, matches) ->
None
|
| warpImage(...)
| warpImage(transformingImage[, output[, flags[, borderMode[,
borderValue]]]]) -> output
|
| ----------------------------------------------------------------------
| Methods inherited from Algorithm:
|
| clear(...)
| clear() -> None
|
| getDefaultName(...)
| getDefaultName() -> retval
|
| save(...)
| save(filename) -> None
Source

Looking for ways to improve my hangman code

Just getting into python, and so I decided to make a hangman game. Works good, but I was wondering if there was any kind of optimizations I could make or ways to clean up the code. Also, if anyone could recommend a project that I could do next that'd be cool.
import sys
import codecs
import random
def printInterface(lst, attempts):
""" Prints user interface which includes:
- hangman drawing
- word updater """
for update in lst:
print (update, end = '')
if attempts == 1:
print ("\n\n\n\n\n\n\n\n\n\n\n\t\t _____________")
elif attempts == 2:
print ("""
|
|
|
|
|
|
|
|
|
______|______""")
elif attempts == 3:
print ("""
______
|
|
|
|
|
|
|
|
|
______|______""")
elif attempts == 4:
print ("""
______
| |
| |
(x_X) |
|
|
|
|
|
|
______|______""")
elif attempts == 5:
print ("""
______
| |
| |
(x_X) |
| |
| |
| |
|
|
|
______|______""")
elif attempts == 6:
print ("""
______
| |
| |
(x_X) |
| |
/| |
| |
|
|
|
______|______""")
elif attempts == 7:
print ("""
______
| |
| |
(x_X) |
| |
/|\ |
| |
|
|
|
______|______""")
elif attempts == 8:
print ("""
______
| |
| |
(x_X) |
| |
/|\ |
| |
/ |
|
|
______|______""")
elif attempts == 9:
print ("""
______
| |
| |
(x_X) |
| |
/|\ |
| |
/ \ |
|
|
______|______""")
def main():
try:
wordlist = codecs.open("words.txt", "r")
except Exception as ex:
print (ex)
print ("\n**Could not open file!**\n")
sys.exit(0)
rand = random.randint(1,5)
i = 0
for word in wordlist:
i+=1
if i == rand:
break
word = word.strip()
wordlist.close()
lst = []
for h in word:
lst.append('_ ')
attempts = 0
printInterface(lst,attempts)
while True:
guess = input("Guess a letter: ").strip()
i = 0
for letters in lst:
if guess not in word:
print ("No '{0}' in the word, try again!".format(guess))
attempts += 1
break
if guess in word[i] and lst[i] == "_ ":
lst[i] = (guess + ' ')
i+=1
printInterface(lst,attempts)
x = lst.count('_ ')
if x == 0:
print ("You win!")
break
elif attempts == 9:
print ("You suck! You iz ded!")
break
if __name__ == '__main__':
while True:
main()
again = input("Would you like to play again? (y/n): ").strip()
if again.lower() == "n":
sys.exit(1)
print ('\n')
I didn't try the code, but here's some random tips:
Try to format your code accordingly to PEP 8 (use i += 1 instead of i+=1). PEP 8 is the standard style guide for Python.
Use
lst = ['_ '] * len(word)
instead of the for-loop.
Use enumerate as in:
for i, word in enumerate(wordlist)
instead of manually keeping track of i in the loop.
The default mode for opening files is 'r', there's no need to specify it. Are you using codecs.open instead of the built-in open in order to get Unicode strings back? Also, try to catch a more specific exception that Exception -- probably IOError.
First idea: ASCII art
The things special to Python are regular expression syntax and range() function, as well as [xxx for yyy in zzz] array filler.
import re
def ascii_art(attempt):
return re.sub(r'\d', '', re.sub('[0{0}].' \
.format(''.join([str(e) for e in range(attempt + 1, 10)])), ' ', """
3_3_3_3_3_3_
4| 2|
4| 2|
4(4x4_4X4) 2|
5| 2|
6/5|7\ 2|
5| 2|
8/ 9\ 2|
2|
2|
1_1_1_1_1_1_1|1_1_1_1_1_1_
"""))
for i in range(1, 10):
print(ascii_art(i))
Second idea: loops
Use enumerate for word reading loop. Use
for attempt in range(1, 10):
# inside main loop
...
print ('you suck!')
as the main loop. Operator break should be used with care and not as replacement for for!
Unless I miss something, the structure of
for letters in lst:
if guess not in word:
...
break
if guess in word[i]:
...
will be more transparent as
if guess not in word:
...
else:
index = word.find (guess)
...
I would use list instead of if .. else statement in printInterface.

Resources