vim how to combine lines containing same id [closed] - vim

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a csv file and I want to combine lines with same id. Example:
ID,Name
1113,Firefox
1114,Chrome
1113,InternetExplorer
and it should look like:
ID,Name
1113,"Firefox,InternetExplorer"
1114,Chrome
Thanks

Awk might be a solution for you, something like this might be sufficient:
:2,$!awk -F, '{ a[$1] = (a[$1] ? a[$1] "," : "") $2 } END { for (p in a) print p "," a[p] }'
It would join lines on their first column and concatenate all the second columns with a comma:
ID,Name
1113,Firefox,InternetExplorer
1114,Chrome
The second column isn't quoted when outputted, nor is the sorting guaranteed.

For VIM approach,
Sort from the second line to end of file.
:2,$sort
Create Marco 'a' starting from second line beginning
qa ---> Record Macro 'a'
vw"1y ---> Copy column 1 value including "," to register '1'
V ---> Select whole line
G?<CR>1<Enter> ---> Go to end of file then search backward for last occurrence of
register '1' to select all lines with same column 1 value
:'<,'>join ---> Join selected lines
:.s/ <CR>1/,/ge ---> Replace register '1' with a space in front to ","
0wa ---> Go to begin of line and move to the "," then append
" ---> Append "
<ESC>$a ---> Back to command mode and go to end of line then append
" ---> Append "
<ESC>j0 ---> Back to command mode and move 1 line down and go to start of line
1 ---> Complete Marco
Then play Macro 'a' any number of time like
1000#a ---> execute 1000 times
Constraint:
Will not work when column 2 contains value of {column 1},

Related

Editing data in a text file in python for a given condition

I have a text file with the following contents:
1 a 20
2 b 30
3 c 40
I need to check if the first character of a particular line is 2 and edit its final two characters to 12, and rewrite the data into the file. New file should look something like this:
1 a 20
2 b 12
3 c 40
Need help doing this in python 3.
Couldn't figure it out. Help.
To modify contents of a file with python you will need to open the file in read mode to extract the contents of the file. You can then make changes on the extracted contents. To make your changes permanent, you have to write the contents back to the file.
The whole process looks something like this:
from pathlib import Path
# Define path to your file
your_file = Path("your_file.txt")
# Read the data in your file
with your_file.open('r') as f:
lines = f.readlines()
# Edit lines that start with a "2"
for i in range(len(lines)):
if lines[i].startswith("2"):
lines[i] = lines[i][:-3] + "12\n"
# Write data back to file
with your_file.open('w') as f:
f.writelines(lines)
Note that in order to change the last two characters of a string, you actually need to change the two characters before the last. This is because of the newline character, which indicates that the line has ended and new characters should be put on the line below. The \n you see after 12 is the newline character. If you don't put this in your replacement string, what originally was the next string will be put directly behind your replacement.

Insert a next line between each designated delimiters in Python [duplicate]

This question already exists:
Restore the order of mismatched lines of CSV file in Python
Closed 1 year ago.
Given that the number of columns is 3, and the head of the data is correct, the column delimiter is by "<|>", the mismatched lines are due to accidental feed by a new line.
Consider the following CSV file,
PERSON_ID<|>DEPT_ID<|>DATE_JOINED
AAAAA<|>S1<|>2021/01
/03
BBBBBB<|>S2<|>2021/02/03
CCCCC<|>S1<|>2021/03/05
I wish the output like,
enter image description here
The first thing I did is to remove the white spacing in the CSV file.
import re
your_string ="""PERSON_ID<|>DEPT_ID<|>DATE_JOINED
AAAAA<|>S1<|>2021/01
/03
BBBBBB<|>S2<|>2021/02/03
CCCCC<|>S1<|>2021/03/05"""
print(re.sub(r'\s{1,}','',your_string.strip()))
After this step I get tape-like strings:
PERSON_ID<|>DEPT_ID<|>DATE_JOINEDAAAAA<|>S1<|>2021/01/03BBBBBB<|>S2<|>2021/02/03CCCCC<|>S1<|>2021/03/05
Now I need to feed in a correct next line in "2021/01/03BBBBBB".
Assuming the total number of columns is 3, so we need to feed the next line between each:
the 2nd delimiter to 3rd delimiter,
the 4th delimiter to 5th delimiter,
the 6th delimiter to 7th delimiter...and so on.
Assuming the date shown in the string at a fixed length of 10, so I need a new line spacing feed in each designated delimiter after a string length of 10.
Assuming the data head will not change, so I can insert a new line spacing after a string length of 33 from the beginning of the file.
Then, finally, I can get my correct data in lines, the output of the rows in CSV would be like,
PERSON_ID<|>DEPT_ID<|>DATE_JOINED
AAAAA<|>S1<|>2021/01/03
BBBBBB<|>S2<|>2021/02/03
CCCCC<|>S1<|>2021/03/05
After this, I can separate them by the string delimiters. Hence, complete the mismatched lines restoration.
Therefore, I need help on how to insert a next line between the designated delimiters at a string length of 10 from its beginning?
Thanks!
What about getting lines of fields directly? Like that:
sep = '<|>'
your_data = [line.strip().split(sep) for line in your_string.strip().split('\n') if sep in line]
You got:
[['PERSON_ID', 'DEPT_ID', 'DATE_JOINED'], ['AAAAA', 'S1', '2021/01'], ['BBBBBB', 'S2', '2021/02/03'], ['CCCCC', 'S1', '2021/03/05']]

Python3 - Problem during removing a line from a text file

I am trying to delete a line from a text file after opening it and without storing it in any list variable using f.readlines() or anything like that.
I dont have an option to open the file and store the contents in a variable and make some changes and write them to another file or any kind of operations that would require to open the file and store them again in a list variable and make some changes and store them back to the file. The file is being constantly appended by some other program, so I cannot do any kind of that stuff.
I am using f.seek() to reset the pointer to the beginning of the file, and using f.readline() as well as f.tell() to know the length of the first line. After that I am trying to replace each character with a blank space using while loop.
pos=0
eol = 0
ll=0
with open('file1.txt','rb+') as f:
f.seek(pos,1) #position at the beginning of the file
print(f.readline()) #reading the first line
pos = f.tell() #storing the length of first line
#the while loop will run from 0 to pos and replace every character with blank space
while eol != pos:
with open('file1.txt','rb+') as f:
f.seek(eol,1)
f.write(b' ')
eol += 1 #incrementing the eol variable to move the file pointer to next character
the code is working fine but with one problem which I cant figure out what,
for example if this is the original file
file1.txt
this is line 1
this is line 2
this is line 3
after running the program , my output is
this is line 2
this is line 3
the first line is getting deleted but there is a bunch of white space in front of the 2nd line.
Maybe I am missing a simple logic here.
Any help will be appreciated.
Thank you
Update :
If i have understood it correctly I have changed the code and made it like this, and instead of b' ' i am putting '\r' as carraige return, which resulted in this :
the code :
while eol != pos-1:
with open('file1.txt','rb+') as f:
f.seek(eol,0)
f.write(b'\r')
eol += 1
the result :
original :
this is line 1
this is line 2
this is line 3
after execution
this is line 2
this is line 3
you see the 1st line is removed but followed with '\r'

Python: remember line index when reading lines from text file

I'm extracting data in a loop from a text file between two strings with Python 3.6. I've got multiple strings of which I would like to extract data between those strings, see code below:
for i in range(0,len(strings1)):
with open('infile.txt','r') as infile, open('outfile.txt', 'w') as outfile:
copy = False
for line in infile:
if line == strings1[i]:
copy = True
elif line == strings2[i]:
copy = False
elif copy:
outfile.write(line)
continue
To decrease the processing time of the loop, I would like to modify my code such that after it has extracted data between two strings, let's say strings1[1] and strings2[1], it remembers the line index of strings2[1] and starts the next iteration of the loop at that line index. Therefore it doesn't have to read the whole file during each iteration. The string lists are build such that the previous strings will never occur after a current string, so modifying my code to what I want won't break the loop.
Does anyone how to do this?
===========================================================================
EDIT:
I've got a file in a format such as:
the first line
bla bla bla
FIRST some string 1
10 10
15 20
5 2.5
SECOND some string 2
bla bla bla
bla bla bla
FIRST some string 3
10 10
15 20
5 2.5
SECOND some string 4
The file goes on like this for many lines.
I want to extract the data between 'FIRST some string 1' and 'SECOND some string 2', and plot this data. When that is done, I want to do the same for the data between 'FIRST some string 3' and 'SECOND some string 4' (thus also plot the data). All the 'FIRST some string ..' are stored in strings1 list and all the 'SECOND some string ..' are stored in strings2 list.
To decrease computational time, I would like to modify the code such that after the first iteration, it knows that it can start from line with string 'some string 2' and not from 'the first line' AND also that when during the first iteration, it knows that it can stop the first iteration when it has found 'SECOND some string 2'.
Does anyone how to do this? Please let me know when something is unclear.
The key issue is you're reopening your files in a for loop, of course it will reiterate the files from the beginning each time. I wouldn't open the files in a for loop, that's horribly inefficient. You can load the files into memory first and then loop through strings1.
There are some other issues, namely here:
copy = False
for line in infile:
if line == strings1[i]:
copy = True
elif line == strings2[i]:
copy = False
elif copy:
outfile.write(line)
continue
The elif copy: line will never execute in the first iteration of the second loop because copy is only ever True once the line == strings1[i] is met. After that condition is met, for the rest of the iterations it will always write the lines from infile to outfile. Unless this is precisely what you're trying to achieve the logic doesn't work.
Without a full context it's hard to understand what exactly you're looking for.
But maybe what you want to do instead is simply this:
with open('infile.txt','r') as infile, open('outfile.txt', 'w') as outfile:
for line in infile.readlines():
if line.rstrip('\n') in strings1:
outfile.write(line)
What this code is doing:
1.) Open both files into memory.
2.) Iterate through the lines of the infile.
3.) Check if the iterated line, stripping the trailing newline character is in the list strings1, assuming your strings1 is a list that doesn't have any trailing newline characters. If each item in strings1 already has a trailing \n, then don't rstrip the line.
4.) If line occurs in strings1, write the line to outfile.
This looks to be the gist of what you're attempting.

Is there a way to keep the changes .append makes when closing a program? [duplicate]

This question already has answers here:
Writing a list to a file with Python, with newlines
(26 answers)
Closed 8 years ago.
So when I add a string to a list via .append and then close that window of the program, is there a way for it to actually alter the code?
(Python noob, so sorry if I'm being dumb)
Many thanks
From my experience with python I don't believe so. You could write the contents of your list to file, then whatever you appended would still be there once you repopulate your list. An example of writing to a file would be:
#!/usr/bin/python
# Open a file in write mode
fo = open("foo.txt", "rw+")
print "Name of the file: ", fo.name
# Assuming file has following 5 lines
# This is 1st line
# This is 2nd line
# This is 3rd line
# This is 4th line
# This is 5th line
str = "This is 6th line"
# Write a line at the end of the file.
fo.seek(0, 2)
line = fo.write( str )
# Now read complete file from beginning.
fo.seek(0,0)
for index in range(6):
line = fo.next()
print "Line No %d - %s" % (index, line)
# Close opend file
fo.close()
This example is from the Python docs at http://www.tutorialspoint.com/python/file_write.htm

Resources