Insert a next line between each designated delimiters in Python [duplicate] - python-3.x

This question already exists:
Restore the order of mismatched lines of CSV file in Python
Closed 1 year ago.
Given that the number of columns is 3, and the head of the data is correct, the column delimiter is by "<|>", the mismatched lines are due to accidental feed by a new line.
Consider the following CSV file,
PERSON_ID<|>DEPT_ID<|>DATE_JOINED
AAAAA<|>S1<|>2021/01
/03
BBBBBB<|>S2<|>2021/02/03
CCCCC<|>S1<|>2021/03/05
I wish the output like,
enter image description here
The first thing I did is to remove the white spacing in the CSV file.
import re
your_string ="""PERSON_ID<|>DEPT_ID<|>DATE_JOINED
AAAAA<|>S1<|>2021/01
/03
BBBBBB<|>S2<|>2021/02/03
CCCCC<|>S1<|>2021/03/05"""
print(re.sub(r'\s{1,}','',your_string.strip()))
After this step I get tape-like strings:
PERSON_ID<|>DEPT_ID<|>DATE_JOINEDAAAAA<|>S1<|>2021/01/03BBBBBB<|>S2<|>2021/02/03CCCCC<|>S1<|>2021/03/05
Now I need to feed in a correct next line in "2021/01/03BBBBBB".
Assuming the total number of columns is 3, so we need to feed the next line between each:
the 2nd delimiter to 3rd delimiter,
the 4th delimiter to 5th delimiter,
the 6th delimiter to 7th delimiter...and so on.
Assuming the date shown in the string at a fixed length of 10, so I need a new line spacing feed in each designated delimiter after a string length of 10.
Assuming the data head will not change, so I can insert a new line spacing after a string length of 33 from the beginning of the file.
Then, finally, I can get my correct data in lines, the output of the rows in CSV would be like,
PERSON_ID<|>DEPT_ID<|>DATE_JOINED
AAAAA<|>S1<|>2021/01/03
BBBBBB<|>S2<|>2021/02/03
CCCCC<|>S1<|>2021/03/05
After this, I can separate them by the string delimiters. Hence, complete the mismatched lines restoration.
Therefore, I need help on how to insert a next line between the designated delimiters at a string length of 10 from its beginning?
Thanks!

What about getting lines of fields directly? Like that:
sep = '<|>'
your_data = [line.strip().split(sep) for line in your_string.strip().split('\n') if sep in line]
You got:
[['PERSON_ID', 'DEPT_ID', 'DATE_JOINED'], ['AAAAA', 'S1', '2021/01'], ['BBBBBB', 'S2', '2021/02/03'], ['CCCCC', 'S1', '2021/03/05']]

Related

Editing data in a text file in python for a given condition

I have a text file with the following contents:
1 a 20
2 b 30
3 c 40
I need to check if the first character of a particular line is 2 and edit its final two characters to 12, and rewrite the data into the file. New file should look something like this:
1 a 20
2 b 12
3 c 40
Need help doing this in python 3.
Couldn't figure it out. Help.
To modify contents of a file with python you will need to open the file in read mode to extract the contents of the file. You can then make changes on the extracted contents. To make your changes permanent, you have to write the contents back to the file.
The whole process looks something like this:
from pathlib import Path
# Define path to your file
your_file = Path("your_file.txt")
# Read the data in your file
with your_file.open('r') as f:
lines = f.readlines()
# Edit lines that start with a "2"
for i in range(len(lines)):
if lines[i].startswith("2"):
lines[i] = lines[i][:-3] + "12\n"
# Write data back to file
with your_file.open('w') as f:
f.writelines(lines)
Note that in order to change the last two characters of a string, you actually need to change the two characters before the last. This is because of the newline character, which indicates that the line has ended and new characters should be put on the line below. The \n you see after 12 is the newline character. If you don't put this in your replacement string, what originally was the next string will be put directly behind your replacement.

Reading from file returns 2 dictionaries

data = [line.strip('\n') for line in file3]
# print(data)
data2 = [line.split(',') for line in data]
data_dictionary = {t[0]:t[1] for t in data2}
print(data_dictionary)
So I'm reading content from a file under the assumption that there is no whitespace at the beginning of each line and not blank lines anywhere.
when I read this file I first strip the newline character and the split the data by a ',' because that is what the data in the file is separated by. but when I make the dictionary it returns two dictionaries instead of one it's doing that for other files where I use this procedure. how do I fix this?

Trouble writing a header line with a comma to an Excel csv file

I'm trying to write a simple header line in Intel Fortran (containing actual content commas) to an Excel csv. What I'd like to see in the first two columns is:
FMG(1,1) FMG(2,1)
Enclosing each term in quotes "FGM(i,j)" worked when I did it line by line:
Code: write (*,*) "FMG(1,1), kg/s (O2): ", FMG(1,1)
Output: FMG(1,1), kg/s (O2): 0.129000000000000
Some of the things I've tried include:
code: write (10,*) "FMG(1,1)","FMG(2,1)"
csv column output: FMG(1 1)FMG(2 1)
code: write (10,*) "FMG(1,1)" , "FMG(2,1)"
csv column output: FMG(1 1)FMG(2 1) (same thing)
code: write (10,*) " FMG(1,1)," "FMG(2,1)"
csv column output: FMG(1 1) FMG(2,1)
got the 2nd one correctly
CSV by name means Comma Separated Values. If you output "FMG(1,1),FMG(1,2)" then removing the commas, you will get
FMG(1
1)
FMG(1
2)
which is what you are seeing. To include the commas, the strings need to be enclosed in quotes. If you write
write (10,*) '"FMG(1,1)","FMG(2,1)"'
it might achieve what you are looking for.

How to split a csv line that has commas for formatting numbers

I download a cvs file ussing request and when I need to split but it has some formatting commas in the numbers fields, like:
line='2019-07-05,sitename.com,"14,740","14,559","7,792",$11.47'
when I try to splits:
data = line.split(',')
it got this value:
['2019-07-05', 'nacion.com', '"14', '740"', '"14', '559"',
'"7','792"', '$11.47']
I would need:
['2019-07-05', 'nacion.com', '14740', '14559', '7792', '$11.47']
I need to solve it in python 3.7
any help is welcome
I usually don't like using regex but there may be no other option here. Try this - it basically removes the inside ,s in two steps:
import re
line='2019-07-05,sitename.com,"14,740","14,559","7,792",$11.47'
new_line = re.sub(r',(?!\d)', r"xxx", line).replace(',','').replace('xxx',',')
print(new_line)
Output
2019-07-05,sitename.com,"14740","14559","7792",$11.47
You can now use:
data = new_line.split(',')
Explanation:
The regex ,(?!\d) selects all ,s in line that are not between two digits. The .sub replaces those (temporarily) with xxxs. The next .replace deletes the remaining ,s which are inside numbers by replacing them with nothing and, finally, the last .replace restores the , delimiters by replacing the temporary xxxs with ,.

Best way to fix inconsistent csv file in python

I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.
john,doe,52,florida
jane,mary,doe,55,texas
fred,johnson,23,maine
wally,mark,david,44,florida
Let's say that you have ① wrong.csv and want to produce ② fixed.csv.
You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this
with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)
Now we want to define the fix function...
Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.
def fix(line):
items = line.split(',') # items is a list of strings
if len(items) == 3: # the line is OK as it stands
return line
# join first and middle name
first_middle = join(' ')((items[0], items[1]))
# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])

Resources