I'm trying to plot a csv file which is like this:
a 531049
b 122198
c 3411487
d 72420
e 1641
f 2181578
. .
. .
. .
but these values should be scaled using another csv file which is in the same format.
i.e other file
a 45
b 12...
I want to plot 531049/45 and so on. first column will be the x axis and the second is the y-axis
how can I do this without merging 2 files?
Gnuplot's using is meant to read data from a single file/stream so you need to merge the two files somehow. I would use python for this since it is my go-to tool for just about everything. I would write a script which reads from the 2 files and writes the data to standard output. Something like:
#merge.py
import sys
file1,scale_factor_file = sys.argv[1:]
#Read the scale factors into a dictionary
d = {}
with open(scale_factor_file) as sf:
for line in sf:
key,scale_factor = line.split()
d[key] = float(scale_factor)
#Now open the other file, scaling as we go:
with open(file1) as fin:
for line in fin:
key,value = line.split()
print key,float(value)/d.get(key,1.0)
Now you can use gnuplot's ability to read from pipes to plot your data:
plot '< python merge.py datafile file_with_scale_factors' using 2
Related
I have 2 types of encoded data
ibm037 encoded - a single delimiter variable - value is ###
UTF8 encoded - a pandas dataframe with 100s of columns.
Example dataframe:
Date Time
1 2
My goal is to write this data into a python file. The format should be:
### 1 2
In this way I need to have all the rows of the dataframe in a python file where the 1st character for every line is ###.
I tried to store this character at the first location in the pandas dataframe as a new column and then write to the file but it throws error saying that two different encodings can't be written to a file.
Tried another way to write it:
df_orig_data = pandas dataframe,
Record_Header = encoded delimiter
f = open("_All_DelimiterOfRecord.txt", "a")
for row in df_orig_data.itertuples(index=False):
f.write(Record_Header)
f.write(str(row))
f.close()
It also doesn't work.
Is this kind of data write even possible? How can I write these 2 encoded data in 1 file?
Edit:
StringData = StringIO(
"""Date,Time
1,2
1,2
"""
)
df_orig_data = pd.read_csv(StringData, sep=",")
Record_Header = "2 "
f = open("_All_DelimiterOfRecord.txt", "a")
for index, row in df_orig_data.iterrows():
f.write(
"\t".join(
[
str(Record_Header.encode("ibm037")),
str(row["Date"]),
str(row["Time"]),
]
)
)
f.close()
I would suggest doing the encoding yourself, and writing a bytes object to the file. This isn't a situation where you can rely on the built-in encoding do it.
That means that the program opens the file in binary mode (ab), all of the constants are byte-strings, and it works with byte-strings whenever possible.
The question doesn't say, but I assumed you probably wanted a UTF8 newline after each line, rather than an IBM newline.
I also replaced the file handling with a context manager, since that makes it impossible to forget to close a file after you're done.
import io
import pandas as pd
StringData = io.StringIO(
"""Date,Time
1,2
1,2
"""
)
df_orig_data = pd.read_csv(StringData, sep=",")
Record_Header = "2 "
with open("_All_DelimiterOfRecord.txt", "ab") as f:
for index, row in df_orig_data.iterrows():
f.write(Record_Header.encode("ibm037"))
row_bytes = [str(cell).encode('utf8') for cell in row]
f.write(b'\t'.join(row_bytes))
# Note: this is an UTF8 newline, not an IBM newline.
f.write(b'\n')
With this input
x 1
x 2
x 3
y 1
y 2
y 3
I'd like to have this output
x 1;2;3
y 1;2;3
Thank you in advance,
Simone
If by terminal you mean something natively built in you might not be in much luck, however you could run a python file from the terminal which could do want you want and more. If having a standalone file isn't possible then you can always run python in REPL mode for purely terminal usage.
If you have python installed all you would need to do to access REPL would be "py" and you could manually setup a processor. If you can use a file then something like this below should be able to take any input text and output the formatted text to the terminal.
file = open("data.txt","r")
lines = file.readlines()
same_starts = {}
#parse each line in the file and get the starting and trailing data for sorting
for line in lines:
#remove trailing/leading whitesapce and newlines
line_norm = line.strip()#.replace('\n','')
#splits data by the first space in the line
#formatting errors make the line get skipped
try:
data_split = line_norm.split(' ')
start = data_split[0]
end = data_split[1]
except:
continue
#check if dictionary same_starts already has this start
if same_starts.get(start):
same_starts[start].append(end)
else:
#add new list with first element being this ending
same_starts[start] = [end]
#print(same_starts)
#format the final data into the needed output
final_output = ""
for key in same_starts:
text = key + ' '
for element in same_starts[key]:
text += element + ";"
final_output += text + '\n'
print(final_output)
NOTE: final_output is the text in the final formatting
assuming you have python installed then this file would only need to be run with the current directory being the folder where it is stored along with a text file called "data.txt" in the same folder which contains the starting values you want processed. Then you would do "py FILE_NAME.ex" ensuring you replace FILE_NAME.ex with the exact same name as the python file, extension included.
Python3 :how to import data from one text file into a few different arrays? The number of arrays needs to be given by another parameter n ,and the shape of arrays are diffrent.
You can split line with , like
lines = text_file.read().split(',')
or with each line
f = open('file_name.ext', 'r')
x = f.readlines()
Check post for more
How to read text file into a list or array with Python
So I am attempting to iterate through a .csv file and do some calculations based off of it, my problem being that the file is 10001 lines long and when my program executes it only seems to read 5001 of those lines. Am I doing something wrong when reading in my data or is there a memory limit or some sort of other limitation I am running into? The calculations are fine but they are off from the expected results in some instances and thus I am lead to believe that the missing half of the data will solve this.
fileName = 'normal.csv' #input("Enter a file name: ").strip()
file = open(fileName, 'r') #open the file for reading
header = file.readline().strip().split(',') #Get the header line
data = [] #Initialise the dataset
for index in range(len(header)):
data.append([])
for yy in file:
ln = file.readline().strip().split(',') #Store the line
for xx in range(len(data)):
data[xx].append(float(ln[xx]))
And here is some sample output, yet to be completley formatted but it will be eventually:
"""The file normal.csv contains 3 columns and 5000 records.
Column Heading | Mean | Std. Dev.
--------------------+--------------------+--------------------
Width [mm]|999.9797|2.5273
Height [mm]|499.9662|1.6889
Thickness [mm]|12.0000|0.1869"""
As this is homework I would ask that you attempt to keep responses helpful but not outright the solution, thank you.
That's because you are asking Python to read lines in two different locations:
for yy in file:
and
ln = file.readline().strip().split(',') #Store the line
yy is already a line from the file, but you ignored it; iteration over a file object yields lines from the file. You then read another line using file.readline().
If you use iteration, don't use readline() as well, just use yy:
for yy in file:
ln = yy.strip().split(',') #Store the line
You are re-inventing the CSV-reading wheel, however. Just use the csv module instead.
You can read all data in a CSV file into a list per column with some zip() function trickery:
import csv
with open(fileName, 'r', newline='') as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # convert to float
header = next(reader, None) # read one row, the header, or None
data = list(zip(*reader)) # transpose rows to columns
I have 4 lists called I_list, Itiso, ItHDKR and Itperez and I would like to receive .txt output files with the data of these lists. I am trying to make Python rename automatically the name of the .txt output files in terms of some of my input data. In this way, the .txt output files will always have different names.
Now I am programming the following commands:
Horizontal_radiation = []
Isotropic_radiation = []
HDKR_radiation = []
Perez_radiation = []
Horizontal = open("outputHorizontal.txt", 'w')
Isotropic = open("outputIsotropic.txt", 'w')
HDKR = open("outputHDKR.txt", 'w')
Perez = open("outputPerez.txt", 'w')
for i in I_list:
Horizontal_radiation.append(i)
for x in Itiso:
Isotropic_radiation.append(x)
for y in ItHDKR:
HDKR_radiation.append(y)
for z in Itperez:
Perez_radiation.append(z)
Horizontal.write(str(Horizontal_radiation))
Isotropic.write(str(Isotropic_radiation))
HDKR.write(str(HDKR_radiation))
Perez.write(str(Perez_radiation))
Horizontal.close()
Isotropic.close()
HDKR.close()
Perez.close()
As you can see, the name of the .txt output file is fixed as "outputHorizontal.txt" (the first one). Is there any way to change this name and put it according to a input? For example, one of my inputs is the latitude, as 'lat'. I am trying to make the output file name be expressed in terms of 'lat', in this way everytime I run the program the name would be different, because now I always get the same name and the file is overwritten.
Thank you very much people, kind regards.
You can pass a string variable as the output file name. For example you could move the file declarations after you add elements to the lists (and before you write them) and use
Horizontal = open(str(Horizontal_radiation[0]), 'w')
Or just add a timestamp to the file name if it's all about don't overwriting files
Horizontal = open("horizontal-%s".format(datetime.today()), 'w')