python txt file to multiple txt files - python-3.x

I have a single txt file which contains multiple data samples in the form
ID-001
some data
ID-002
some other data
ID-003
some more data
and so on. Everything separated with an ID-i is a new data set, the IDs are unique, and the file is about 2000 lines.
I want to create a python script that will open the first file and create MULTIPLE txt files which will contain everything between ID-(i-1) to ID-i no matter how many different data samples there are in the file.
Any ideas?

You could use a regex like so:
import re
pat=r'^(ID-\d+)$([\s\S]+?)(?=(?:^ID-\d+)|\Z)'
with open(ur_file) as f:
for m in re.finditer(pat, f.read(), flags=re.M):
print(f'{m.group(2)}' )
Prints:
some data
some other data
some more data
m.group(1) will have the ID-xxx and you could use that to write each block into a file.
Or, you can split the block into data blocks like so:
import re
with open(ur_file) as f:
print([b for b in re.split(r'^ID-\d+', f.read(), flags=re.M) if b])
Prints:
['\nsome data\n\n', '\nsome other data\n\n', '\nsome more data\n']
Or you can use re.findall like so:
import re
pat=r'^(ID-\d+)$([\s\S]+?)(?=(?:^ID-\d+)|\Z)'
with open(ur_file) as f:
print(re.findall(pat, f.read(), flags=re.M))
Prints:
[('ID-001', '\nsome data\n\n'), ('ID-002', '\nsome other data\n\n'), ('ID-003', '\nsome more data\n')]
Again, you can use that tuple data to write into separate files.

Related

How to write a list of floats to csv in columns?

i am searching everywhere for a method to write a list of floats into csv but must be in column format.
My code for writing csv as follow:
csvfile=open('Test.csv','w', newline='')
obj=csv.writer(csvfile)
obj.writerow(list_dis_B1_avg)
csvfile.close()
It turn out that the floats are written in rows.
I have a list of floats stored under "list_dis_B1_avg"
How can i just write it in column?
You dont need any csv module to do that:
with open("Test.csv", "w") as f: # use with to close the file in any case
f.write("\n".join(list_dis_B1_avg)) # newline between the elements
More about the with keyword: https://www.geeksforgeeks.org/with-statement-in-python/
More about str.join(): https://www.programiz.com/python-programming/methods/string/join

how to remove List brackets in a csv file which was generated from a dictionary?

I am trying something similar-
sample={"name":["age","number","email"]}
#dictionary stores the relevant data in above format
with open('selected.csv','w') as csvf:
[csvf.write('{0},{1}\n'.format(key,value)) for key,value in
sample.items()]
#writing data in a csv file with my formatting
"Aaron",[21,020303030,"Aaron#blahblah.com"]
#csv file sample entry
Everything works fine but CSV file shows List brackets, how can I remove them?
This isn't difficult if you use the standard library csv module:
import csv
sample={"name":["age","number","email"]}
with open('selected.csv', 'w+', newline='') as csvf:
writer = csv.writer(csvf)
for k, v in sample.items():
writer.writerow([k, *v]) # unpack v into a list with k
This would produce a file with one line:
name,age,number,email

Merge line in csv file python

I have this in csv file:
Titre,a,b,c,d,e
01,jean,paul,,
01,,,jack,
02,jeanne,jack,,
02,,,jean
and i want :
Titre,a,b,c,d,e
01,jean,paul,jack,
02,jeanne,jack,,jean
can you help me ?
In general, a good approach is to read the csv file and iterate through the rows using Python's CSV module.
CSV will create an iterator that will let you loop through your file like this:
import csv
with open('your filename.csv', 'r') as infile:
reader = csv.reader(infile)
for line in reader:
for value in line:
# Do your thing
You're going to need to construct a new data set that has different properties. The requirements you described:
Ignore any empty cells
Any time you encounter a row that has a new index number, add a new row to your new data set
Any time you encounter a row that has an index number you've seen before, add it to the row that you already created (except for that index number value itself)
I'm not writing that part of the code for you because you need to learn and grow. It's a good task for a beginner.
Once you've constructed that data set, it will look like this:
example_processed_data = [["Titre","a","b","c","d","e"],
["01","jean","paul","jack"],
["02","jeanne","jack","","jean"]]
You can then create a CSV writer, and create your outfile by iterating over that data, similarly to how you iterated over the infile:
with open('outfile.csv', 'w') as outfile:
writer = csv.writer(outfile)
for line in example_processed_data:
writer.writerow(line)
print("Done! Wrote", len(example_processed_data), "lines to outfile.csv.")

Sort excel worksheet using python

I have an excel sheet like this:
I would like to output the data into an excel file like this:
Basically, for the common elements in column 2,3,4, I want to contacenate the values in the 5th column.
Please suggest, how could I do this ?
The easiest way to approach an issue like this is exporting the spreadsheet to CSV first, in order to help ensure that it imports correctly into Python.
Using a defaultdict, you can create a dictionary that has unique keys and then iterate through lines adding the final column's values to a list.
Finally you can write it back out to a CSV format:
from collections import defaultdict
results = defaultdict(list)
with open("in_file.csv") as f:
header = f.readline()
for line in f:
cols = line.split(",")
key = ",".join(cols[0:4])
results[key].append(cols[4])
with open("out_file.csv", "w") as f:
f.write(header)
for k, v in results.iteritems():
line = '{},"{}",\n'.format(k, ", ".join(v))
f.write(line)

Print A Pandas Data Frame to a Text File (Python 3)

I have a large data file like this
Words
One
Two
Three
....
Threethousand
I am trying to print this list to a text file with this code:
df1 = df[['Words']]
with open('atextfile.txt', 'w', encoding='utf-8') as outfile:
print(df1, file=outfile)
But what happens is that it doesn't print out the whole DF, it ends up looking like this:
Words
One
Two
Three
....
Threethousand
Fourthousand
Fivethousand
How can I print out the whole DF?
I would use to_string to do this, it doesn't abbreviate like the printing:
df['Words'].to_string('atextfile.txt')
# or
df[['Words']].to_string('atextfile.txt')

Resources