NETCDF4 file doesn't grow beyond 2GB - python-3.x

I have a NETCDF4 file which doesn't grow beyond 2GB.
I am using the following sample data - I am converting over 200 txt files to netcdf4 file
STATIONS_ID;MESS_DATUM; QN;FF_10;DD_10;eor
3660;201912150000; 3; 4.6; 170;eor
3660;201912150010; 3; 4.2; 180;eor
3660;201912150020; 3; 4.3; 190;eor
3660;201912150030; 3; 5.2; 190;eor
3660;201912150040; 3; 5.1; 190;eor
3660;201912150050; 3; 4.8; 190;eor
The code looks like:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
count = 0
for f in files:
filecp = open(f, "r", encoding="ISO-8859-1")
# NC file setup
mydata = netCDF4.Dataset('v5.nc', 'w', format='NETCDF4')
mydata.description = 'Measurement Data'
mydata.createDimension('STATION_ID',None)
mydata.createDimension('MESS_DATUM',None)
mydata.createDimension('QN',None)
mydata.createDimension('FF_10',None)
mydata.createDimension('DD_10',None)
STATION_ID = mydata.createVariable('STATION_ID',np.short,('STATION_ID'))
MESS_DATUM = mydata.createVariable('MESS_DATUM',np.long,('MESS_DATUM'))
QN = mydata.createVariable('QN',np.byte,('QN'))
FF_10 = mydata.createVariable('FF_10',np.float64,('FF_10'))
DD_10 = mydata.createVariable('DD_10',np.short,('DD_10'))
STATION_ID.units = ''
MESS_DATUM.units = 'Central European Time yyyymmddhhmi'
QN.units = ''
FF_10.units = 'meters per second'
DD_10.units = "degree"
txtdata = pd.read_csv(filecp, delimiter=';').values
#txtdata = np.genfromtxt(filecp, dtype=None, delimiter=';', names=True, encoding=None)
if len(txtdata) > 0:
df = pd.DataFrame(txtdata)
sh = txtdata.shape
print("txtdata shape is ", sh)
mydata['STATION_ID'][:] = df[0]
mydata['MESS_DATUM'][:] = df[1]
mydata['QN'][:] = df[2]
mydata['FF_10'][:] = df[3]
mydata['DD_10'][:] = df[4]
mydata.close()
filecp.close()
count +=1

Your problem is that you create the same file in the loop. So your file size is limited to the biggest initial data file.
Open the file once, and add each new data to the end of netcdf data arrays.
If you get 124 values in the first file, you put:
mydata['STATION_ID'][0:124] = df[0]
and you get 224 from the second file, you put
mydata['STATION_ID'][124:124+224] = df[0]
So, in case data files are downloaded from https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/10_minutes/wind/recent/ to <text file path>
import netCDF4
import codecs
import pandas as pd
import os
import numpy as np
mydata = netCDF4.Dataset('v5.nc', 'w', format='NETCDF4')
mydata.description = 'Wind Measurement Data'
mydata.createDimension('STATION_ID',None)
mydata.createDimension('MESS_DATUM',None)
mydata.createDimension('QN',None)
mydata.createDimension('FF_10',None)
mydata.createDimension('DD_10',None)
STATION_ID = mydata.createVariable('STATION_ID',np.short,('STATION_ID'))
MESS_DATUM = mydata.createVariable('MESS_DATUM',np.long,('MESS_DATUM'))
QN = mydata.createVariable('QN',np.byte,('QN'))
FF_10 = mydata.createVariable('FF_10',np.float64,('FF_10'))
DD_10 = mydata.createVariable('DD_10',np.short,('DD_10'))
STATION_ID.units = ''
MESS_DATUM.units = 'Central European Time yyyymmddhhmi'
QN.units = ''
FF_10.units = 'meters per second'
DD_10.units = "degree"
fpath = <text file path>
files = [f for f in os.listdir(fpath)]
count = 0
mydata_startindex=0
for f in files:
filecp = open(fpath+f, "r", encoding="ISO-8859-1")
txtdata = pd.read_csv(filecp, delimiter=';')
chunksize = len(txtdata)
if len(txtdata) > 0:
mydata['STATION_ID'][mydata_startindex:mydata_startindex+chunksize] = txtdata['STATIONS_ID']
mydata['MESS_DATUM'][mydata_startindex:mydata_startindex+chunksize] = txtdata['MESS_DATUM']
mydata['QN'][mydata_startindex:mydata_startindex+chunksize] = txtdata[' QN']
mydata['FF_10'][mydata_startindex:mydata_startindex+chunksize] = txtdata['FF_10']
mydata['DD_10'][mydata_startindex:mydata_startindex+chunksize] = txtdata['DD_10']
mydata_startindex += chunksize

Related

Perform code on multiple files 1 by 1 pandas

Hi I have code I have written to read a .csv file in a folder and add some required columns.
I now want to perform this code on multiple files within the path folder 1 by 1 and save each as a separate df.
My current code is as follows
import pandas as pd
import glob
import os
path = r'C:\Users\jake.jennings.BRONCO\Desktop\GPS Reports\Games\Inputs\2022-03-27 Vs
Cowboys\Test' # use your path
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
frame = pd.read_csv(filename, index_col=None, skiprows=8)
li.append(frame)
frame = pd.concat(li, axis=0, ignore_index=True)
frame['filename'] = os.path.basename
#Add odometer change and turn all accel values to positive
import numpy as np
frame['OdChange'] = frame['Odometer'].diff()
frame['accelpos'] = frame['Acceleration'].abs()
#Add column with OdChange # >5.5m/s
frame["new1"] = np.where(
(frame.Velocity >=5.5),
frame["OdChange"],
'0')
#Add column with accels/decels >2.5m.s.s for AccelDec/min
frame["new2"] = np.where(
(frame.accelpos >=2.5),
frame["accelpos"],
'0')
#Add column with accels/decels >2.5m.s.s for AccelDec/min
frame["new3"] = np.where(
(frame.Acceleration >=2.5),
'1',
'0')
s = frame['new3'].astype(int)
frame['new4'] = s.diff().fillna(s).eq(1).astype(int)
frame['new4']
#m/min peaks
frame['1minOD'] = frame['OdChange'].rolling(window=600, axis=0).sum()
#HSm/min peaks
frame['1minHS'] = frame['new1'].rolling(window=600, axis=0).sum()
#AccImpulse/min
frame['1minImp'] = frame['accelpos'].rolling(window=600, axis=0).mean() *60
#AccDec Peak Count
frame['1minAccCount'] = frame['new4'].rolling(window=600, axis=0).sum()
print (frame)
I am not sure if this is even the best way to do what I am trying to do. Any help would be appreciated!

Plotting multiple lines with a Nested Dictionary, and unknown variables to Line Graph

I was able to find somewhat of an answer to my question, but it was not as nested as my dictionary and so I am really unsure how to proceed as I am still very new to python. I currently have a nested dictionary like
{'140.10': {'46': {'1': '-49.50918', '2': '-50.223637', '3': '49.824406'}, '28': {'1': '-49.50918', '2': '-50.223637', '3': '49.824406'}}}:
I am wanting to plot it so that '140.10' becomes the title of the graph and '46' and '28' become the individual lines and key '1' for example is on the y axis and the x axis is the final number (in this case '-49.50918). Essentially a graph like this:
I generated this graph with a csv file that is written at another part of the code just with excel:
[![enter image description here][2]][2]
The problem I am running into is that these keys are autogenerated from a larger csv file and I will not know their exact value until the code has been run. As each of the keys are autogenerated in an earlier part of the script. As I will be running it over various files called the Graph name, and each file will have a different values for:
{key1:{key2_1: {key3_1: value1, key3_2: value2, key3_3: value3}, key_2_2 ...}}}
I have tried to do something like this:
for filename in os.listdir(Directory):
if filename.endswith('.csv'):
q = filename.split('.csv')[0]
s = q.split('_')[0]
if s in time_an_dict:
atom = list(time_an_dict[s])
ion = time_an_dict[s]
for f in time_an_dict[s]:
x_val = []
y_val = []
fz = ion[f]
for i in time_an_dict[s][f]:
pos = (fz[i])
frame = i
y_val.append(frame)
x_val.append(pos)
'''ions = atom
frame = frames
position = pos
plt.plot(frame, position, label = frames)
plt.xlabel("Frame")
plt.ylabel("Position")
plt.show()
#plt.savefig('{}_Pos.png'.format(s))'''
But it has not run as intended.
I have also tried:
for filename in os.listdir(Directory):
if filename.endswith('_Atom.csv'):
q = filename.split('.csv')[0]
s = q.split('_')[0]
if s in window_dict:
name = s + '_Atom.csv'
time_an_dict[s] = analyze_time(name,window_dict[s])
new = '{}_A_pos.csv'.format(s)
ions = list(time_an_dict.values())[0].keys()
for i in ions:
x_axis_values = []
y_axis_values = []
frame = list(time_an_dict[s][i])
x_axis_values.append(frame)
empty = []
print(x_axis_values)
for x in frame:
values = time_an_dict[s][i][x]
empty.append(values)
y_axis_values.append(empty)
plt.plot(x_axis_values, y_axis_values, label = x )
plt.show()
But keep getting the error:
Traceback (most recent call last): File "Atoms_pos.py", line 175, in
plt.plot(x_axis_values, y_axis_values, label = x ) File "/Users/hxb51/opt/anaconda3/lib/python3.8/site-packages/matplotlib/pyplot.py",
line 2840, in plot
return gca().plot( File "/Users/hxb51/opt/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_axes.py",
line 1743, in plot
lines = [*self._get_lines(*args, data=data, **kwargs)] File "/Users/hxb51/opt/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py",
line 273, in call
yield from self._plot_args(this, kwargs) File "/Users/hxb51/opt/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py",
line 394, in _plot_args
self.axes.xaxis.update_units(x) File "/Users/hxb51/opt/anaconda3/lib/python3.8/site-packages/matplotlib/axis.py",
line 1466, in update_units
default = self.converter.default_units(data, self) File "/Users/hxb51/opt/anaconda3/lib/python3.8/site-packages/matplotlib/category.py",
line 107, in default_units
axis.set_units(UnitData(data)) File "/Users/hxb51/opt/anaconda3/lib/python3.8/site-packages/matplotlib/category.py",
line 176, in init
self.update(data) File "/Users/hxb51/opt/anaconda3/lib/python3.8/site-packages/matplotlib/category.py",
line 209, in update
for val in OrderedDict.fromkeys(data): TypeError: unhashable type: 'numpy.ndarray'
Here is the remainder of the other parts of the code that generate the files and dictionaries I am using. I was told in another question I asked that this could be helpful.
# importing dependencies
import math
import sys
import pandas as pd
import MDAnalysis as mda
import os
import numpy as np
import csv
import matplotlib.pyplot as plt
################################################################################
###############################################################################
Directory = '/Users/hxb51/Desktop/Q_prof/Displacement_Charge/Blah'
os.chdir(Directory)
################################################################################
''' We are only looking at the positions of the CLAs and SODs and not the DRUDE counterparts. We are assuming the DRUDE
are very close and it is not something that needs to be concerned with'''
def Positions(dcd, topo):
fields = ['Window', 'ION', 'ResID', 'Location', 'Position', 'Frame', 'Final']
with open('{}_Atoms.csv'.format(s), 'a') as d:
writer = csv.writer(d)
writer.writerow(fields)
d.close()
CLAs = u.select_atoms('segid IONS and name CLA')
SODs = u.select_atoms('segid IONS and name SOD')
CLA_res = len(CLAs)
SOD_res = len(SODs)
frame = 0
for ts in u.trajectory[-10:]:
frame +=1
CLA_pos = CLAs.positions[:,2]
SOD_pos = SODs.positions[:,2]
for i in range(CLA_res):
ids = i + 46
if CLA_pos[i] < 0:
with open('{}_Atoms.csv'.format(s), 'a') as q:
new_line = [s,'CLA', ids, 'Bottom', CLA_pos[i], frame,10]
writes = csv.writer(q)
writes.writerow(new_line)
q.close()
else:
with open('{}_Atoms.csv'.format(s), 'a') as q:
new_line = [s,'CLA', ids, 'Top', CLA_pos[i], frame, 10]
writes = csv.writer(q)
writes.writerow(new_line)
q.close()
for i in range(SOD_res):
ids = i
if SOD_pos[i] < 0:
with open('{}_Atoms.csv'.format(s), 'a') as q:
new_line = [s,'SOD', ids, 'Bottom', SOD_pos[i], frame,10]
writes = csv.writer(q)
writes.writerow(new_line)
q.close()
else:
with open('{}_Atoms.csv'.format(s), 'a') as q:
new_line = [s,'SOD', ids, 'Top', SOD_pos[i], frame, 10]
writes = csv.writer(q)
writes.writerow(new_line)
q.close()
csv_Data = pd.read_csv('{}_Atoms.csv'.format(s))
filename = s + '_Atom.csv'
sorted_df = csv_Data.sort_values(["ION", "ResID", "Frame"],
ascending=[True, True, True])
sorted_df.to_csv(filename, index = False)
os.remove('{}_Atoms.csv'.format(s))
''' this function underneath looks at the ResIds, compares them to make sure they are the same and then counts how many
times the ion flip flops around the boundaries'''
def turn_dict(f):
read = open(f)
reader = csv.reader(read, delimiter=",", quotechar = '"')
my_dict = {}
new_list = []
for row in reader:
new_list.append(row)
for i in range(len(new_list[:])):
prev = i - 1
if new_list[i][2] == new_list[prev][2]:
if new_list[i][3] != new_list[prev][3]:
if new_list[i][2] in my_dict:
my_dict[new_list[i][2]] += 1
else:
my_dict[new_list[i][2]] = 1
return my_dict
def plot_flips(f):
dict = turn_dict(f)
ions = list(dict.keys())
occ = list(dict.values())
plt.bar(range(len(dict)), occ, tick_label = ions)
plt.title("{}".format(s))
plt.xlabel("Residue ID")
plt.ylabel("Boundary Crosses")
plt.savefig('{}_Flip.png'.format(s))
def analyze_time(f, dicts):
read = open(f)
reader = csv.reader(read, delimiter=",", quotechar='"')
new_list = []
keys = list(dicts.keys())
time_dict = {}
pos_matrix = {}
for row in reader:
new_list.append(row)
fields = ['ResID', 'Position', 'Frame']
with open('{}_A_pos.csv'.format(s), 'a') as k:
writer = csv.writer(k)
writer.writerow(fields)
k.close()
for i in range(len(new_list[:])):
if new_list[i][2] in keys:
with open('{}_A_pos.csv'.format(s), 'a') as k:
new_line = [new_list[i][2], new_list[i][4], new_list[i][5]]
writes = csv.writer(k)
writes.writerow(new_line)
k.close()
read = open('{}_A_pos.csv'.format(s))
reader = csv.reader(read, delimiter=",", quotechar='"')
time_list = []
for row in reader:
time_list.append(row)
for j in range(len(keys)):
for i in range(len(time_list[1:])):
if time_list[i][0] == keys[j]:
pos_matrix[time_list[i][2]] = time_list[i][1]
time_dict[keys[j]] = pos_matrix
return time_dict
window_dict = {}
for filename in os.listdir(Directory):
s = filename.split('.dcd')[0]
fors = s + '.txt'
topos = '/Users/hxb51/Desktop/Q_prof/Displacement_Charge/topo.psf'
if filename.endswith('.dcd'):
print('We are starting with {} \n '.format(s))
u = mda.Universe(topos, filename)
Positions(filename, topos)
name = s + '_Atom.csv'
plot_flips(name)
window_dict[s] = turn_dict(name)
continue
time_an_dict = {}
for filename in os.listdir(Directory):
if filename.endswith('.csv'):
q = filename.split('.csv')[0]
s = q.split('_')[0]
if s in window_dict:
name = s + '_Atom.csv'
time_an_dict[s] = analyze_time(name,window_dict[s])
for filename in os.listdir(Directory):
if filename.endswith('.csv'):
q = filename.split('.csv')[0]
s = q.split('_')[0]
if s in time_an_dict:
atom = list(time_an_dict[s])
ion = time_an_dict[s]
for f in time_an_dict[s]:
x_val = []
y_val = []
fz = ion[f]
for i in time_an_dict[s][f]:
pos = (fz[i])
frame = i
y_val.append(frame)
x_val.append(pos)
'''ions = atom
frame = frames
position = pos
plt.plot(frame, position, label = frames)
plt.xlabel("Frame")
plt.ylabel("Position")
plt.show()
#plt.savefig('{}_Pos.png'.format(s))'''
Everything here runs well except this last bottom block of code. That deals with trying to make a graph from a nested dictionary. Any help would be appreciated!
Thanks!
I figured out the answer:
for filename in os.listdir(Directory):
if filename.endswith('_Atom.csv'):
q = filename.split('.csv')[0]
s = q.split('_')[0]
if s in window_dict:
name = s + '_Atom.csv'
time_an_dict[s] = analyze_time(name,window_dict[s])
new = '{}_A_pos.csv'.format(s)
ions = list(time_an_dict[s])
plt.yticks(np.arange(-50, 50, 5))
plt.xlabel('Frame')
plt.ylabel('Z axis position(Ang)')
plt.title([s])
for i in ions:
x_value = []
y_value = []
time_frame =len(time_an_dict[s][i]) +1
for frame in range(1,time_frame):
frame = str(frame)
x_value.append(int(frame))
y_value.append(float(time_an_dict[s][i][frame]))
plt.plot(x_value, y_value, label=[i])
plt.xticks(np.arange(1, 11, 1))
plt.legend()
plt.savefig('{}_Positions.png'.format(s))
plt.clf()
os.remove("{}_A_pos.csv".format(s))
From there, with the combo of the other parts of the code, it produces these graphs:
For more than 1 file as long as there is more '.dcd' files.

Reading in multiple files in Python and saving them one by one in a different directory

import glob
import pandas as pd
import seaborn as sns
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
files = glob.glob("Angular_position_*_*.csv")
output = pd.DataFrame()
for f in files:
df = pd.read_csv(f)
time = df.iloc[:,0]
time = time.to_numpy()
ynew = df.iloc[:,1:]
ynew = ynew.to_numpy()
lowPassCutoffFreq = 6.0 # Cut off frequency
Sample_freq = 150; #Target sample frequency
N = 2 # Order of the filter; In this case 2nd order
Wn = lowPassCutoffFreq/(Sample_freq/2) #Normalize frequency
b, a = signal.butter(5, Wn, btype='low',analog=False,output='ba')
#scipy.signal.butter(N, Wn, btype='low', analog=False, output='ba', fs=None)
output = signal.filtfilt(b, a, ynew, axis=0)
np.savetxt("enter directory path/Filtered_files/Filtered_Angular_position_*_*", output, delimiter = ', ', newline = "\n")
I am trying to read in all files in a directory, they are then low pass filtered. After that the results are saved one after the other but not in one file. The result gives each files with 3 columns and ideally I would like them to named with headers e.g. col1, col2, col3.
Without using glob, I can filter all my files individually but I have more than 100 such files.
Any help would be appreciated.
best wishes,
I have partially solved the issue apart from the header names:
import glob
import pandas as pd
from tnorma import tnorma
import seaborn as sns
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
path = r'location_of_dir'
all_files = glob.glob(path + '/*.csv')
# yn = np.zeros(shape = (101,1))
# tn = np.zeros(shape = (101,1))
#ynew = []
yn = np.zeros(shape = (101,1))
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
print(filename)
foo = filename.split("/")[-1]
#df = pd.read_csv(f)
time = df.iloc[:,0]
time = time.to_numpy()
ynew = df.iloc[:,1:]
ynew = ynew.to_numpy()
#print(ynew)
lowPassCutoffFreq = 6.0 # Cut off frequency
Sample_freq = 150; #Target sample frequency
N = 2 # Order of the filter; In this case 2nd order
Wn = lowPassCutoffFreq/(Sample_freq/2) #Normalize frequency
b, a = signal.butter(5, Wn, btype='low',analog=False,output='ba')
#scipy.signal.butter(N, Wn, btype='low', analog=False, output='ba', fs=None)
output = signal.filtfilt(b, a, ynew, axis=0)
#print (output)
tn = np.linspace(0, 100, 101) # new time vector for the new time-normalized data
yn, tn, indie = tnorma(output, k=3, smooth =1, mask = None, show = False)
np.savetxt("path_name/foldername/file"+ foo, yn, delimiter = ', ', newline = "\n")
However, I am having difficulty in putting header names on the 3 columns per file.

How to convert a two dimensional data into three dimensional with third dimension as time with single value?

I have daily wind data from quickscat ftp://ftp.ifremer.fr/ifremer/cersat/products/gridded/mwf-quikscat/data/daily
The problem is zonal and meridional winds are two dimensional i.e. they contain only (lon, lat) as dimension not (time, lon,lat) as dimension. File contain all the information about time as variable and as dimension. I tried the copy all the dimension and variable data from input file to an output file but something goes wrong. It copy successfully lat, lon and time but does not copy the values of winds. In source file wind is 2-dimensional, but I want wind in output file as 3-dimensional with time as third dimension.Anyway time dimesion has length=1
import netCDF4 as nc
import numpy as np
import os
in_path = '2000'
out_path = '2000_new'
files = os.listdir(in_path)
fd=0
for names in files:
# print(names)
x_file = os.path.join(in_path,names)
y_file = os.path.join(out_path,names)
fd +=1
i_file = nc.Dataset(x_file, 'r')
z_w = i_file.variables['zonal_wind_speed'][:,:]
m_w = i_file.variables['meridional_wind_speed'][:,:]
y = i_file.variables['latitude'][:]
x = i_file.variables['longitude'][:]
t = i_file.variables['time'][:]
os.system("'rm y_file")
o_file = nc.Dataset(y_file, 'w', format='NETCDF4')
latitude = o_file.createDimension('latitude', y.size)
longitude = o_file.createDimension('longitude', x.size)
time = o_file.createDimension('time',None)
var = o_file.createVariable('latitude','f4',('latitude'), zlib=True)
o_file.variables['latitude'].units = 'degree_north'
o_file.variables['latitude'].long_name ='latitude'
o_file.variables['latitude'].axis = 'X'
var = o_file.createVariable('longitude','f4',('longitude'), zlib=True)
o_file.variables['longitude'].units = 'degree_east'
o_file.variables['longitude'].long_name = 'longitude'
o_file.variables['longitude'].axis = 'Y'
var = o_file.createVariable('time','d',('time'), zlib=True)
o_file.variables['time'].long_name = 'time'
o_file.variables['time'].units = "hours since 1900-1-1 0:0:0"
o_file.variables['time'].calendar = 'standard'
o_file.variables['time'].axis = 'T'
var = o_file.createVariable('u','f4',('time','latitude','longitude'),fill_value=-1.e+23, zlib=True)
o_file.variables['u'].long_name='zonal wind speed component'
o_file.variables['u'].units = 'meter second-1'
o_file.variables['u'].coordinates = 'longitude latitude'
o_file.variables['u'].time = 'time'
var = o_file.createVariable('v','f4',('time','latitude','longitude'),fill_value=-1.e+23, zlib = True)
o_file.variables['v'].long_name = 'meridional wind speed component'
o_file.variables['v'].units = 'meter second-1'
o_file.variables['v'].coordinates = 'longitude latitude'
o_file.variables['v'].time = 'time'
o_file.variables['latitude'][:] = y
o_file.variables['longitude'][:] =x
o_file.variables['time'][:] = t
o_file.variables['u'] = z_w
o_file.variables['v'] = m_w
i_file.close()
o_file.close()
Actually, your time dimension does not have length 1, it has unlimited length. If you actually want it to have length 1, you need to use
#time = o_file.createDimension('time',None)
time = o_file.createDimension('time',1)
instead.
Then, to set all data of the first (and only) time index to your values, use
o_file.variables['u'][0] = z_w
o_file.variables['v'][0] = m_w
If you do end up saving multiple times in the file, replace the 0 with the appropriate index for the data you are copying in.
Alternatively, because time dimension now has length 1, you could also copy it using numpy.expand_dims
o_file.variables['u'][:] = np.expand_dims(z_w, 0)
o_file.variables['v'][:] = np.expand_dims(m_w, 0)

how to repeat a script in another code with another input data

I am newbie in python I wrote a code that in this I load a txt file and I get my result in another txt file, and I want to repeat this code for other txt files that I have all of them in same folder. I want to load almost 300 txt files and do this, but I don't know how do that. thanks
dat = np.loadtxt('test1.txt')
x = dat[:, 0]
y = dat[:, 2]
peak = LorentzianModel()
constant = ConstantModel()
pars = peak.guess(y, x=x)
pars.update( constant.make_params())
pars['c'].set(1.04066)
mod = peak + constant
out=mod.fit(y, pars, x=x)
comps = out.eval_components(x=x)
writer = (out.fit_report(min_correl=0.25))
path = '/Users/dellcity/Desktop/'
filename = 'output.txt'
with open(path + filename, 'wt') as f:
f.write(writer)
you need to define a function that gets the filename as a parameter and in the main part of your programm create a loop in which you find all files which you want to load and then call the function, e.g.:
import os
def myFunction(filename):
dat = np.loadtxt(filename)
x = dat[:, 0]
y = dat[:, 2]
peak = LorentzianModel()
constant = ConstantModel()
pars = peak.guess(y, x=x)
pars.update( constant.make_params())
pars['c'].set(1.04066)
mod = peak + constant
out=mod.fit(y, pars, x=x)
comps = out.eval_components(x=x)
writer = (out.fit_report(min_correl=0.25))
path = '/Users/dellcity/Desktop/'
filename = 'output.txt'
# open in mode a = append
with open(path + filename, 'at') as f:
f.write(writer)
# the parameter of os.listdir is the path to your file,
# change to the path of your data files
for filename in os.listdir('.'):
if filename.endswith(".txt"):
myFunction(filename)

Resources