How to convert a SAP .txt extraction into a .csv file - python-3.x
I have a .txt file as in the example reported below. I would like to convert it into a .csv table, but I'm not having much success.
Mack3 Line Item Journal Time 14:22:33 Date 03.10.2015
Panteni Ledger 1L TGEPIO00/CANTINAOAS Page 20.001
--------------------------------------------------------------------------------------------------------------------------------------------
| Pstng Date|Entry Date|DocumentNo|Itm|Doc..Date |BusA|PK|SG|Sl|Account |User Name |LCurr| Amount in LC|Tx|Assignment |S|
|------------------------------------------------------------------------------------------------------------------------------------------|
| 07.01.2014|07.02.2014|4919005298| 36|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 0,85 | |20140107 | |
| 07.01.2014|07.02.2014|4919065298| 29|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 2,53 | |20140107 | |
| 07.01.2014|07.02.2014|4919235298| 30|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 30,00 | |20140107 | |
| 07.01.2014|07.02.2014|4119005298| 32|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 1,00 | |20140107 | |
| 07.01.2014|07.02.2014|9019005298| 34|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 11,10 | |20140107 | |
|------------------------------------------------------------------------------------------------------------------------------------------|
The file in question is structure as a report from SAP. Practicing with python and looking in other posts I found this code:
with open('file.txt', 'rb') as f_input:
for line in filter(lambda x: len(x) > 2 and x[0] == '|' and x[1].isalpha(), f_input):
header = [cols.strip() for cols in next(csv.reader(StringIO(line), delimiter='|', skipinitialspace=True))][1:-1]
break
with open('file.txt', 'rb') as f_input, open(str(ii + 1) + 'output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
for line in filter(lambda x: len(x) > 2 and x[0] == '|' and x[1] != '-' and not x[1].isalpha(), f_input):
csv_input = csv.reader(StringIO(line), delimiter='|', skipinitialspace=True)
csv_output.writerow(csv_input)
Unfortunately it does not work for my case. In fact it creates empty .csv files and it seems to not read properly the csv_input.
Any possible solution?
Your input file can be treated as CSV once we filter out a few lines, namely the ones that do not start with a pipe symbol '|' followed by a space ' ', which would leave us with this:
| Pstng Date|Entry Date|DocumentNo|Itm|Doc..Date |BusA|PK|SG|Sl|Account |User Name |LCurr| Amount in LC|Tx|Assignment |S|
| 07.01.2014|07.02.2014|4919005298| 36|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 0,85 | |20140107 | |
| 07.01.2014|07.02.2014|4919065298| 29|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 2,53 | |20140107 | |
| 07.01.2014|07.02.2014|4919235298| 30|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 30,00 | |20140107 | |
| 07.01.2014|07.02.2014|4119005298| 32|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 1,00 | |20140107 | |
| 07.01.2014|07.02.2014|9019005298| 34|07.01.2019| |81| | |60532640 |tARFooWMOND |EUR | 11,10 | |20140107 | |
Your output is mainly empty because that x[1].isalpha() check is never true on this data. The character in position 1 on each line is always a space, never alphabetic.
It's not necessary to open the input file multiple times, we can read, filter and write to the output in one go:
import csv
ii = 0
with open('file.txt', 'r', encoding='utf8', newline='') as f_input, \
open(str(ii + 1) + 'output.csv', 'w', encoding='utf8', newline='') as f_output:
input_lines = filter(lambda x: len(x) > 2 and x[0] == '|' and x[1] == ' ', f_input)
csv_input = csv.reader(input_lines, delimiter='|')
csv_output = csv.writer(f_output)
for row in csv_input:
csv_output.writerow(col.strip() for col in row[1:-1])
Notes:
You should not use binary mode when reading text files. Use r and w modes, respectively, and explicitly declare the file encoding. Choose the encoding that is the right one for your files.
For work with the csv module, open files with newline='' (which lets the csv module pick the correct line endings)
You can wrap multiple files in the with statements using the \ at the end of the line.
StringIO is completely unnecesary.
I'm not using skipinitialspace=True because some of the columns also have spaces at the end. Therefore I'm calling .strip() manually on each value when writing the row.
The [1:-1] is necessary to get rid of the superfluous empty columns (before the first and after the last | in the input)
Output is as follows
Pstng Date,Entry Date,DocumentNo,Itm,Doc..Date,BusA,PK,SG,Sl,Account,User Name,LCurr,Amount in LC,Tx,Assignment,S
07.01.2014,07.02.2014,4919005298,36,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"0,85",,20140107,
07.01.2014,07.02.2014,4919065298,29,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"2,53",,20140107,
07.01.2014,07.02.2014,4919235298,30,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"30,00",,20140107,
07.01.2014,07.02.2014,4119005298,32,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"1,00",,20140107,
07.01.2014,07.02.2014,9019005298,34,07.01.2019,,81,,,60532640,tARFooWMOND,EUR,"11,10",,20140107,
Related
What is wrong in my script in this Python 3.x program?
import random word_list = ["elma", "armut","kalem"] chosen_word = random.choice(word_list) stages = [''' +---+ | | O | /|\ | / \ | | ========= ''', ''' +---+ | | O | /|\ | / | | ========= ''', ''' +---+ | | O | /|\ | | | ========= ''', ''' +---+ | | O | /| | | | =========''', ''' +---+ | | O | | | | | ========= ''', ''' +---+ | | O | | | | ========= ''', ''' +---+ | | | | | | ========= '''] map_1 = [] count = 0 end = False for x in range(len(chosen_word)): map_1.append("_") if "_" not in map_1: end = True Print("YOU WON") if count == 6 and "_" in map_1: print("YOU LOST") end = True while end == False: guess = input("Guess a letter ").lower() count += 1 print(stages[- count]) for letter in chosen_word: if guess == letter: a = chosen_word.index(letter) map_1[a] = guess print(map_1) else: continue print(map_1) it does not stop when you guess all the letters, it gives IndexError: list index out of range every time ı execute the program,I didn't write a part for what happens when you make a wrong guess now ı do the correct guess every time but still it ends like this.I just started the learn in this part ı am stuck ı don't know how to solve this.
you need to move the conditions into the while loop, otherwise, the program doesn't skip the while loop (end always is False). while end == False: guess = input("Guess a letter ").lower() count += 1 print(stages[- count]) for letter in chosen_word: if guess == letter: a = chosen_word.index(letter) map_1[a] = guess print(map_1) else: continue //end for loop if "_" not in map_1: end = True print("YOU WON") if count == 6 and "_" in map_1: print("YOU LOST") end = True
invalid string interpolation: `$$', `$'ident or `$'BlockExpr expected -> Spark SQL
The error I am getting: invalid string interpolation: `$$', `$'ident or `$'BlockExpr expected Spark SQL: val sql = s""" |SELECT | ,CAC.engine | ,CAC.user_email | ,CAC.submit_time | ,CAC.end_time | ,CAC.duration | ,CAC.counter_name | ,CAC.counter_value | ,CAC.usage_hour | ,CAC.event_date |FROM | xyz.command AS CAC | INNER JOIN | ( | SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$.configuration.name'), '_')[1], 'acc', '') AS account_id | FROM xyz.metadata | ) AS QCM | ON QCM.account_id = CAC.account_id |WHERE | CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05' |""".stripMargin val df = spark.sql(sql) df.show(10, false)
You added s prefix which means you want the string be interpolated. It means all tokens prefixed with $ will be replaced with the local variable with the same name. From you code it looks like you do not use this feature, so you could just remove s prefix from the string: val sql = """ |SELECT | ,CAC.engine | ,CAC.user_email | ,CAC.submit_time | ,CAC.end_time | ,CAC.duration | ,CAC.counter_name | ,CAC.counter_value | ,CAC.usage_hour | ,CAC.event_date |FROM | xyz.command AS CAC | INNER JOIN | ( | SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$.configuration.name'), '_')[1], 'acc', '') AS account_id | FROM xyz.metadata | ) AS QCM | ON QCM.account_id = CAC.account_id |WHERE | CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05' |""".stripMargin Otherwise if you really need the interpolation you have to quote $ sign like this: val sql = s""" |SELECT | ,CAC.engine | ,CAC.user_email | ,CAC.submit_time | ,CAC.end_time | ,CAC.duration | ,CAC.counter_name | ,CAC.counter_value | ,CAC.usage_hour | ,CAC.event_date |FROM | xyz.command AS CAC | INNER JOIN | ( | SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$$.configuration.name'), '_')[1], 'acc', '') AS account_id | FROM xyz.metadata | ) AS QCM | ON QCM.account_id = CAC.account_id |WHERE | CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05' |""".stripMargin
Write pyspark dataframe to file keeping nested quotes, but not "outer" ones?
Is there a way to preserve nested quotes in pyspark dataframe value when writing to file (in my case, a TSV) while also getting rid of the "outer" ones (ie. those that denote a string value in a column)? >>> dff = sparkSession.createDataFrame([(10,'this is "a test"'), (14,''), (16,'')], ["age", "comments"]) >>> dff.show() +---+----------------+ |age| comments| +---+----------------+ | 10|this is "a test"| | 14| | | 16| | +---+----------------+ >>> dff.write\ .mode('overwrite')\ .option("sep", "\t")\ .option("quoteAll", "false")\ .option("emptyValue", "").option("nullValue", "")\ .csv('/tmp/test') then $ cat /tmp/test/part-000* 10 "this is \"a test\"" 14 16 # what I'd want to see is 10 this is "a test" 14 16 # because I am later parsing based only on TAB characters, so the quote sequences are not a problem in that regard Is there any way to write the dataframe in this desired format? * as aside, more info about the args used can be found here
Set the escapeQuotes option to false: >>> dff = spark.createDataFrame([(10,'this is "a test"'), (14,''), (16,'')], ["age", "comments"]) >>> dff.show() +---+----------------+ |age| comments| +---+----------------+ | 10|this is "a test"| | 14| | | 16| | +---+----------------+ >>> dff.write\ ... .mode('overwrite')\ ... .option("sep", "\t")\ ... .option("quoteAll", "false")\ ... .option("emptyValue", "").option("nullValue", "")\ ... .option("escapeQuotes", "false").csv('/tmp/test') >>> ➜ ~ cd /tmp/test ➜ test ls _SUCCESS part-00001-f702e661-15c2-4ab9-aef2-8dad5d923412-c000.csv part-00003-f702e661-15c2-4ab9-aef2-8dad5d923412-c000.csv part-00000-f702e661-15c2-4ab9-aef2-8dad5d923412-c000.csv part-00002-f702e661-15c2-4ab9-aef2-8dad5d923412-c000.csv ➜ test cat part* 10 this is "a test" 14 16 ➜ test
OpenCV - Thin Plate Spline
How to convert an image from one shape to other using thin plate spline in opencv python3. in c++ we have shape transformer class. in opencv python3 how can we implement it.
Thin plate spline indeed exists for opencv in python3. You can use help function to get more info on which functions exist and how to use them like this: >>> help(cv2.createThinPlateSplineShapeTransformer()) ## () braces matter !! Help on ThinPlateSplineShapeTransformer object: class ThinPlateSplineShapeTransformer(ShapeTransformer) | Method resolution order: | ThinPlateSplineShapeTransformer | ShapeTransformer | Algorithm | builtins.object | | Methods defined here: | | __new__(*args, **kwargs) from builtins.type | Create and return a new object. See help(type) for accurate signature. | | __repr__(self, /) | Return repr(self). | | getRegularizationParameter(...) | getRegularizationParameter() -> retval | | setRegularizationParameter(...) | setRegularizationParameter(beta) -> None | | ---------------------------------------------------------------------- | Methods inherited from ShapeTransformer: | | applyTransformation(...) | applyTransformation(input[, output]) -> retval, output | | estimateTransformation(...) | estimateTransformation(transformingShape, targetShape, matches) -> None | | warpImage(...) | warpImage(transformingImage[, output[, flags[, borderMode[, borderValue]]]]) -> output | | ---------------------------------------------------------------------- | Methods inherited from Algorithm: | | clear(...) | clear() -> None | | getDefaultName(...) | getDefaultName() -> retval | | save(...) | save(filename) -> None Source
Looking for ways to improve my hangman code
Just getting into python, and so I decided to make a hangman game. Works good, but I was wondering if there was any kind of optimizations I could make or ways to clean up the code. Also, if anyone could recommend a project that I could do next that'd be cool. import sys import codecs import random def printInterface(lst, attempts): """ Prints user interface which includes: - hangman drawing - word updater """ for update in lst: print (update, end = '') if attempts == 1: print ("\n\n\n\n\n\n\n\n\n\n\n\t\t _____________") elif attempts == 2: print (""" | | | | | | | | | ______|______""") elif attempts == 3: print (""" ______ | | | | | | | | | ______|______""") elif attempts == 4: print (""" ______ | | | | (x_X) | | | | | | | ______|______""") elif attempts == 5: print (""" ______ | | | | (x_X) | | | | | | | | | | ______|______""") elif attempts == 6: print (""" ______ | | | | (x_X) | | | /| | | | | | | ______|______""") elif attempts == 7: print (""" ______ | | | | (x_X) | | | /|\ | | | | | | ______|______""") elif attempts == 8: print (""" ______ | | | | (x_X) | | | /|\ | | | / | | | ______|______""") elif attempts == 9: print (""" ______ | | | | (x_X) | | | /|\ | | | / \ | | | ______|______""") def main(): try: wordlist = codecs.open("words.txt", "r") except Exception as ex: print (ex) print ("\n**Could not open file!**\n") sys.exit(0) rand = random.randint(1,5) i = 0 for word in wordlist: i+=1 if i == rand: break word = word.strip() wordlist.close() lst = [] for h in word: lst.append('_ ') attempts = 0 printInterface(lst,attempts) while True: guess = input("Guess a letter: ").strip() i = 0 for letters in lst: if guess not in word: print ("No '{0}' in the word, try again!".format(guess)) attempts += 1 break if guess in word[i] and lst[i] == "_ ": lst[i] = (guess + ' ') i+=1 printInterface(lst,attempts) x = lst.count('_ ') if x == 0: print ("You win!") break elif attempts == 9: print ("You suck! You iz ded!") break if __name__ == '__main__': while True: main() again = input("Would you like to play again? (y/n): ").strip() if again.lower() == "n": sys.exit(1) print ('\n')
I didn't try the code, but here's some random tips: Try to format your code accordingly to PEP 8 (use i += 1 instead of i+=1). PEP 8 is the standard style guide for Python. Use lst = ['_ '] * len(word) instead of the for-loop. Use enumerate as in: for i, word in enumerate(wordlist) instead of manually keeping track of i in the loop. The default mode for opening files is 'r', there's no need to specify it. Are you using codecs.open instead of the built-in open in order to get Unicode strings back? Also, try to catch a more specific exception that Exception -- probably IOError.
First idea: ASCII art The things special to Python are regular expression syntax and range() function, as well as [xxx for yyy in zzz] array filler. import re def ascii_art(attempt): return re.sub(r'\d', '', re.sub('[0{0}].' \ .format(''.join([str(e) for e in range(attempt + 1, 10)])), ' ', """ 3_3_3_3_3_3_ 4| 2| 4| 2| 4(4x4_4X4) 2| 5| 2| 6/5|7\ 2| 5| 2| 8/ 9\ 2| 2| 2| 1_1_1_1_1_1_1|1_1_1_1_1_1_ """)) for i in range(1, 10): print(ascii_art(i)) Second idea: loops Use enumerate for word reading loop. Use for attempt in range(1, 10): # inside main loop ... print ('you suck!') as the main loop. Operator break should be used with care and not as replacement for for! Unless I miss something, the structure of for letters in lst: if guess not in word: ... break if guess in word[i]: ... will be more transparent as if guess not in word: ... else: index = word.find (guess) ...
I would use list instead of if .. else statement in printInterface.