Related
I am working with large Excel stocks data. I have the data in a format like this,
What I need to do is, I need to set that stock ticker name in front of the cell which loss is less than -10%.
I can try with the simple =IF(B2<-0.1, "AAL", "") formula, but this will work until the next stock starts, I mean in AADI also it will print "AAL", that's the problem. I need to print the right ticker if this condition is true. If it's AAPL, the ticker AAPL should print in front of the loss cell. So, How can I do that?
Don't know how to complete this while I am having millions of data points. I should know a good solution using Python, VB, or Excel formulas.
IIUC, here is a simple proposition using openpyxl :
from openpyxl import load_workbook
wb = load_workbook("file.xlsx")
ws = wb['Sheet1']
for num_row in range(1, ws.max_row+1):
cellB = ws.cell(row=num_row, column=2)
if isinstance(cellB.value, str):
ticker_name = cellB.value
else:
try:
cellC = ws.cell(row=num_row, column=3)
if cellC.value < 0.1:
ws.cell(row=num_row, column=4).value = ticker_name
except TypeError:
pass
wb.save("file.xlsx")
NB: Make sure to keep always a backup/copy of your original Excel file before running any kind of python/openpyxl's script.
# Output :
I have 24 more attempts to submit this task. I spent hours and my brain does not work anymore. I am a beginner with Python can you please help to figure out what is wrong? I would love to see the correct code if possible.
Here is the task itself and the code I wrote below.
Note that you can have access to all standard modules/packages/libraries of your language. But there is no access to additional libraries (numpy in python, boost in c++, etc).
You are given a content of CSV-file with information about set of trades. It contains the following columns:
TIME - Timestamp of a trade in format Hour:Minute:Second.Millisecond
PRICE - Price of one share
SIZE - Count of shares executed in this trade
EXCHANGE - The exchange that executed this trade
For each exchange find the one minute-window during which the largest number of trades took place on this exchange.
Note that:
You need to send source code of your program.
You have only 25 attempts to submit a solutions for this task.
You have access to all standart modules/packages/libraries of your language. But there is no access to additional libraries (numpy in python, boost in c++, etc).
Input format
Input contains several lines. You can read it from standart input or file “trades.csv”
Each line contains information about one trade: TIME, PRICE, SIZE and EXCHANGE. Numbers are separated by comma.
Lines are listed in ascending order of timestamps. Several lines can contain the same timestamp.
Size of input file does not exceed 5 MB.
See the example below to understand the exact input format.
Output format
If input contains information about k exchanges, print k lines to standart output.
Each line should contain the only number — maximum number of trades during one minute-window.
You should print answers for exchanges in lexicographical order of their names.
Sample
Input Output
09:30:01.034,36.99,100,V
09:30:55.000,37.08,205,V
09:30:55.554,36.90,54,V
09:30:55.556,36.91,99,D
09:31:01.033,36.94,100,D
09:31:01.034,36.95,900,V
2
3
Notes
In the example four trades were executed on exchange “V” and two trades were executed on exchange “D”. Not all of the “V”-trades fit in one minute-window, so the answer for “V” is three.
X = []
with open('trades.csv', 'r') as tr:
for line in tr:
line = line.strip('\xef\xbb\xbf\r\n ')
X.append(line.split(','))
dex = {}
for item in X:
dex[item[3]] = []
for item in X:
dex[item[3]].append(float(item[0][:2])*60.+float(item[0][3:5])+float(item[0][6:8])/60.+float(item[0][9:])/60000.)
for item in dex:
count = 1
ccount = 1
if dex[item][len(dex[item])-1]-dex[item][0] <1:
count = len(dex[item])
else:
for t in range(len(dex[item])-1):
for tt in range(len(dex[item])-t-1):
if dex[item][tt+t+1]-dex[item][t] <1:
ccount += 1
else: break
if ccount>count:
count=ccount
ccount=1
print(count)
First of all it is not necessary to use datetime and csv modules for such a simple case (like in Ed-Ward's example).
If we remove colon and dot signs from the time strings it could be converted to int() directly - easier way than you tried in your example.
CSV features like dialect and special formatting not used so i suggest to use simple split(",")
Now about efficiency. Efficiency means time complexity.
The more times you go through your array with dates from the beginning to the end, the more complicated the algorithm becomes.
So our goal is to minimize cycles count, best to make only one pass by all rows and especially avoid nested loops and passing through collections from beginning to the end.
For such a task it is better to use deque, instead of tuple or list, because you can pop() first element and append last element with complexity of O(1).
Just append every time for needed exchange to the end of the exchange's queue until difference between current and first elements becomes more than 1 minute. Then just remove first element with popleft() and continue comparison. After whole file done - length of each queue will be the max 1min window.
Example with linear time complexity O(n):
from collections import deque
ex_list = {}
s = open("trades.csv").read().replace(":", "").replace(".", "")
for line in s.splitlines():
s = line.split(",")
curr_tm = int(s[0])
curr_ex = s[3]
if curr_ex not in ex_list:
ex_list[curr_ex] = deque()
ex_list[curr_ex].append(curr_tm)
if curr_tm >= ex_list[curr_ex][0] + 100000:
ex_list[curr_ex].popleft()
print("\n".join([str(len(ex_list[k])) for k in sorted(ex_list.keys())]))
This code should work:
import csv
import datetime
diff = datetime.timedelta(minutes=1)
def date_calc(start, dates):
for i, date in enumerate(dates):
if date >= start + diff:
return i
return i + 1
exchanges = {}
with open("trades.csv") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
this_exchange = row[3]
if this_exchange not in exchanges:
exchanges[this_exchange] = []
time = datetime.datetime.strptime(row[0], "%H:%M:%S.%f")
exchanges[this_exchange].append(time)
ex_max = {}
for name, dates in exchanges.items():
ex_max[name] = 0
for i, d in enumerate(dates):
x = date_calc(d, dates[i:])
if x > ex_max[name]:
ex_max[name] = x
print('\n'.join([str(ex_max[k]) for k in sorted(ex_max.keys())]))
Output:
2
3
( obviously please check it for yourself before uploading it :) )
I think the issue with your current code is that you don't put the output in lexicographical order of their names...
If you want to use your current code, then here is a (hopefully) fixed version:
X = []
with open('trades.csv', 'r') as tr:
for line in tr:
line = line.strip('\xef\xbb\xbf\r\n ')
X.append(line.split(','))
dex = {}
counts = []
for item in X:
dex[item[3]] = []
for item in X:
dex[item[3]].append(float(item[0][:2])*60.+float(item[0][3:5])+float(item[0][6:8])/60.+float(item[0][9:])/60000.)
for item in dex:
count = 1
ccount = 1
if dex[item][len(dex[item])-1]-dex[item][0] <1:
count = len(dex[item])
else:
for t in range(len(dex[item])-1):
for tt in range(len(dex[item])-t-1):
if dex[item][tt+t+1]-dex[item][t] <1:
ccount += 1
else: break
if ccount>count:
count=ccount
ccount=1
counts.append((item, count))
counts.sort(key=lambda x: x[0])
print('\n'.join([str(x[1]) for x in counts]))
Output:
2
3
I do think you can make your life easier in the future by using Python's standard library, though :)
I've kind of run into a brick wall with one of my latest assignments where I have to calculate the class average from a text file that is created after a certain amount of inputs from a user.
Code:
f=open('class.txt','w')
title=['name','english','math','science']
f.write(str(title)+""+"\n")
name=input("enter student name:")
m=int(input("enter math score:"))
e=int(input("enter english score:"))
s=int(input("enter science score:"))
o=input("do you wish to continue?: y/n:")
f.write(name + " " +str(m)+ " "+str(e)+" "+str(s)+" "+"\n")
name =[]
while o !='n':
name=input("enter a student name:")
m=int(input("enter math score:"))
e=int(input("enter english score:"))
s=int(input("enter science score:"))
o=input("do you wish to continue?: y/n:")
f.write(name + " " +str(m)+ " "+str(e)+" "+str(s)+" "+"\n")
f.close()
Basically, the text file needs a header, hence the line with "title" in it, and after the user hits 'n' the text file gets saved.
Now I'm having trouble figuring out how to write the code that reads the text file, calculates the total score of each, calculates the average score of each student and then prints it all into three columns. If I could get any pointers as to how I should go about doing this it would be much appreciated! Thanks!
(I am not a phython programmer so the syntax may not be exactly right)
I am assuming that you are to write the code that produces the text file and calculates the average at the same time. If so then no need write file then re-read it, just keep running total and calculate average when you're done.
numberOfStudents = 0
int totalMathScore = 0
# Only showing math score add lines to do same with english / science
# see below about how loop should be structured
while continue != 'n':
numberOfStudents += 1
m=int(input("enter math score:"))
totalMathScore += m
# Now calculate average math score
averageMathScore = totalMathScore / numberOfStudents
Look for bits of repeated code and refactor. e.g. where you're getting the scores both outside and inside the loop. Thats poor style and should be either
a) Put that in a function
b) Or more likely for this simple example change loop to something like
continue = 'y'
while (continue != 'n'):
name=input("enter a student name:")
m=int(input("enter math score:"))
e=int(input("enter english score:"))
...
Other bonuses
Use descriptive variable names - e.g. mathScore rather than m
Error handling - what happens if someone types in "BANANA" for a score?
I am stuck trying to export matlab uitable data to excel. I tried many things, and It has been impossible to solve this problem. After many days, I tried the code below using windows and it does work perfect, however, using the same for Macintosh is not longer working. The output is as follows:
"Error using dlmwrite (line 118) The input cell array cannot be converted to a matrix"
Searching for more information, I found an answer here ,(Using "xlswrite" MATLABs for cell arrays containing strings of different size) which doesn't work perfect. Finally I found this method which applies only for matlab using windows (http://www.mathworks.es/matlabcentral/answers/20819-export-uitable-s-data-to-a-spreadsheet-excel).
I hope you can help me with this problem.
Thanks in advance
Hector
function Save_File
hf = figure;
hExportButton = uicontrol('Parent',hf,'Units',...
'normalized','Position',[0 0.81 0.22 0.18],'Style','Pushbutton',....
'String',' Export Data!','FontSize',20,'Callback',#ExportButton_Callback);
dat = rand(5,5);
t=uitable('Data',dat,'ColumnName',{'First','Second','Third','Fourth','Fifth'},...
'Position',[7 10 500 300]);
Data=get(t,'Data');
ColumnName=get(t,'ColumnName');
set(t,'ColumnWidth',{93.5})
function ExportButton_Callback(~,~)
NewData= num2cell(Data,ones(size(Data,1),1),ones(size(Data,2),1));
CombData=[ColumnName';NewData];
FileName = uiputfile('*.xls','Save as');
xlswrite(FileName,CombData);
end
end
You should be able to convert the cell array into a number array with a cell2mat command and then use csvwrite or dlmwrite.
If the combo of numbers and strings is the issue, as stated in my comment above, you can use some simple looping to do this all for you. I posted some sample code below.
% Creating some temporary data for proof of concept
mat = randi([1,5],10,2);
header = {'Col1','Col2'};
cellVals = [header;num2cell(mat)];
% the real code that does the writing
fh = fopen('temp.csv','w'); % open a file with write privileges, will overwrite old versions
for ii = 1:size(cellVals,1)
first = 1;
for jj = 1:size(cellVals,2)
if first
fwrite(fh,num2str(cellVals{ii,jj},'%f'));
first = 0;
else
fwrite(fh,[',',num2str(cellVals{ii,jj},'%f')]);
end
end
fwrite(fh,sprintf('\r\n')); % print line break
end
fclose(fh); % close file out when done writing
Can any one tell me, how can I write my output of Fortran program in CSV format? So I can open the CSV file in Excel for plotting data.
A slightly simpler version of the write statement could be:
write (1, '(1x, F, 3(",", F))') a(1), a(2), a(3), a(4)
Of course, this only works if your data is numeric or easily repeatable. You can leave the formatting to your spreadsheet program or be more explicit here.
I'd also recommend the csv_file module from FLIBS. Fortran is well equipped to read csv files, but not so much to write them. With the csv_file module, you put
use csv_file
at the beginning of your function/subroutine and then call it with:
call csv_write(unit, value, advance)
where unit = the file unit number, value = the array or scalar value you want to write, and advance = .true. or .false. depending on whether you want to advance to the next line or not.
Sample program:
program write_csv
use csv_file
implicit none
integer :: a(3), b(2)
open(unit=1,file='test.txt',status='unknown')
a = (/1,2,3/)
b = (/4,5/)
call csv_write(1,a,.true.)
call csv_write(1,b,.true.)
end program
output:
1,2,3
4,5
if you instead just want to use the write command, I think you have to do it like this:
write(1,'(I1,A,I1,A,I1)') a(1),',',a(2),',',a(3)
write(1,'(I1,A,I1)') b(1),',',b(2)
which is very convoluted and requires you to know the maximum number of digits your values will have.
I'd strongly suggest using the csv_file module. It's certainly saved me many hours of frustration.
The Intel and gfortran (5.5) compilers recognize:
write(unit,'(*(G0.6,:,","))')array or data structure
which doesn't have excess blanks, and the line can have more than 999 columns.
To remove excess blanks with F95, first write into a character buffer and then use your own CSV_write program to take out the excess blanks, like this:
write(Buf,'(999(G21.6,:,","))')array or data structure
call CSV_write(unit,Buf)
You can also use
write(Buf,*)array or data structure
call CSV_write(unit,Buf)
where your CSV_write program replaces whitespace with "," in Buf. This is problematic in that it doesn't separate character variables unless there are extra blanks (i.e. 'a ','abc ' is OK).
I thought a full simple example without any other library might help. I assume you are working with matrices, since you want to plot from Excel (in any case it should be easy to extend the example).
tl;dr
Print one row at a time in a loop using the format format(1x, *(g0, ", "))
Full story
The purpose of the code below is to write in CSV format (that you can easily import in Excel) a (3x4) matrix.
The important line is the one labeled 101. It sets the format.
program testcsv
IMPLICIT NONE
INTEGER :: i, nrow
REAL, DIMENSION(3,4) :: matrix
! Create a sample matrix
matrix = RESHAPE(source = (/1,2,3,4,5,6,7,8,9,10,11,12/), &
shape = (/ 3, 4 /))
! Store the number of rows
nrow = SIZE(matrix, 1)
! Formatting for CSV
101 format(1x, *(g0, ", "))
! Open connection (i.e. create file where to write)
OPEN(unit = 10, access = "sequential", action = "write", &
status = "replace", file = "data.csv", form = "formatted")
! Loop across rows
do i=1,3
WRITE(10, 101) matrix(i,:)
end do
! Close connection
CLOSE(10)
end program testcsv
We first create the sample matrix. Then store the number of rows in the variable nrow (this is useful when you are not sure of the matrix's dimension beforehand). Skip a second the format statement. What we do next is to open (create or replace) the CSV file, names data.csv. Then we loop over the rows (do statement) of the matrix to write a row at a time (write statement) in the CSV file; rows will be appended one after another.
In more details how the write statement works is: WRITE(U,FMT) WHAT. We write "what" (the i-th row of the matrix: matrix(i,:)), to connection U (the one we created with the open statement), formatting the WHAT according to FMT.
Note that in the example FMT=101, and 101 is the label of our format statement:
format(1x, *(g0, ", "))
what this does is: "1x" insert a white space at the beginning of the row; the "*" is used for unlimited format repetition, which means that the format in the following parentheses is repeated for all the data left in the object we are printing (i.e. all elements in the matrix's row). Thus, each row number is formatted as: 'g0, ", "'.
g is a general format descriptor that handles floats as well as characters, logicals and integers; the trailing 0 basically means: "use the least amount of space needed to contain the object to be formatted" (avoids unnecessary spaces). Then, after the formatted number, we require the comma plus a space: **", ". This produces our comma-separated values for a row of the matrix (you can use other separators instead of "," if you need). We repeat for every row and that's it.
(The spaces in the format are not really needed, thus one could use format(*(g0,","))
Reference: Metcalf, M., Reid, J., & Cohen, M. (2018). Modern Fortran Explained: Incorporating Fortran 2018. Oxford University Press.
Tens seconds work with a search engine finds me the FLIBS library, which includes a module called csv_file which will write strings, scalars and arrays out is CSV format.