If timestamp includes certain time, then trigger an action - python-3.x

I am trying to trigger an action, based on part of the timestamp.
Code (error is in line 3):
for ticker in tickers:
for i in range(1,len(ohlc_dict[ticker])):
if ohlc_dict[ticker]['Date'][i] == "21:50:00":
ohlc_dict:
Now there are 2 issues with this:
1: It doesn't recognize 'Date' (KeyError: 'Date')
2: it should get triggered at this time on every single day (if timestamp includes "21:50:00" THAN ...)

Just would like to answer an answer to my own question. I was over-complicating thing by wanting to do it all in that line. In stead, I went back to creating the dataframe and added this line:
ohlc_dict[ticker]['time'] = ohlc_dict[ticker].index.str[10:]
Causing 1 extra column in my DF with just the time, hence solving both issues.

Related

Python: How to excute a variable in a string in a for loop in a function?

I have an excel (output of survey) and I am trying to write codes to give marks based on the entries of a survey of different students.
Suppose the data at hand is given as follow
import warnings, pickle
import numpy as np, pandas as pd
warnings.filterwarnings('ignore')
A='pickle.loads(b\'\\x80\\x03cpandas.core.frame\\nDataFrame\\nq\\x00)\\x81q\\x01}q\\x02(X\\x05\\x00\\x00\\x00_dataq\\x03cpandas.core.internals.managers\\nBlockManager\\nq\\x04)\\x81q\\x05(]q\\x06(cpandas.core.indexes.base\\n_new_Index\\nq\\x07cpandas.core.indexes.base\\nIndex\\nq\\x08}q\\t(X\\x04\\x00\\x00\\x00dataq\\ncnumpy.core.multiarray\\n_reconstruct\\nq\\x0bcnumpy\\nndarray\\nq\\x0cK\\x00\\x85q\\rC\\x01bq\\x0e\\x87q\\x0fRq\\x10(K\\x01K\\x04\\x85q\\x11cnumpy\\ndtype\\nq\\x12X\\x02\\x00\\x00\\x00O8q\\x13K\\x00K\\x01\\x87q\\x14Rq\\x15(K\\x03X\\x01\\x00\\x00\\x00|q\\x16NNNJ\\xff\\xff\\xff\\xffJ\\xff\\xff\\xff\\xffK?tq\\x17b\\x89]q\\x18(X\\x04\\x00\\x00\\x00Nameq\\x19X \\x00\\x00\\x00Which companies have you chosen?q\\x1aX\\n\\x00\\x00\\x00Question 1q\\x1bX\\n\\x00\\x00\\x00Question 2q\\x1cetq\\x1dbX\\x04\\x00\\x00\\x00nameq\\x1eNu\\x86q\\x1fRq h\\x07cpandas.core.indexes.range\\nRangeIndex\\nq!}q"(h\\x1eNX\\x05\\x00\\x00\\x00startq#K\\x00X\\x04\\x00\\x00\\x00stopq$K\\x04X\\x04\\x00\\x00\\x00stepq%K\\x01u\\x86q&Rq\\\'e]q(h\\x0bh\\x0cK\\x00\\x85q)h\\x0e\\x87q*Rq+(K\\x01K\\x04K\\x04\\x86q,h\\x15\\x89]q-(X\\x03\\x00\\x00\\x00Ayaq.X\\x04\\x00\\x00\\x00Ramiq/X\\x07\\x00\\x00\\x00Geniousq0X\\x05\\x00\\x00\\x00Samirq1G\\x7f\\xf8\\x00\\x00\\x00\\x00\\x00\\x00X?\\x00\\x00\\x00 Mobil, ConsolidatedEdisonq2X?\\x00\\x00\\x00 DataGeneral, GeneralPublicUtilitiesq3h3X\\x11\\x00\\x00\\x00Uploaded the dataq4X\\xfe\\x00\\x00\\x00Uploaded the data,Specified the reason behind the selected stocks,You were successful in cleaning your data-sets,You have had a justification why you selected this particular time period of your chosen stock. This should have been answered in Question 2.q5X{\\x01\\x00\\x00Uploaded the data,Specified the reason behind the selected stocks,You selected both stocks from different industries,You were successful in cleaning your data-sets,You have had a justification why you selected this particular time period of your chosen stock. This should have been answered in Question 2.,You did extra work for this question and you deserve a round of applause.q6X\\xec\\x00\\x00\\x00Specified the reason behind the selected stocks,You were successful in cleaning your data-sets,You have had a justification why you selected this particular time period of your chosen stock. This should have been answered in Question 2.q7G\\x7f\\xf8\\x00\\x00\\x00\\x00\\x00\\x00X(\\x01\\x00\\x00You plot a graph over the time axis that demonstrate how your selected returns behaved in exciting periods,You produced a scatter plot that shows that you may change your x-axis from time to market return.,You also were able to compare your selected stock to other stocks within the same industryq8X\\xde\\x01\\x00\\x00You plot a graph over the time axis that demonstrate how your selected returns behaved in exciting periods,You produced a scatter plot that shows that you may change your x-axis from time to market return.,You have shown some references to what has been going in your chosen period and you gave a good justification.,You also were able to compare your selected stock to other stocks within the same industry,You did extra work for this question and you deserve a very high mark.q9h8etq:ba]q;h\\x07h\\x08}q<(h\\nh\\x0bh\\x0cK\\x00\\x85q=h\\x0e\\x87q>Rq?(K\\x01K\\x04\\x85q#h\\x15\\x89]qA(h\\x19h\\x1ah\\x1bh\\x1cetqBbh\\x1eNu\\x86qCRqDa}qEX\\x06\\x00\\x00\\x000.14.1qF}qG(X\\x04\\x00\\x00\\x00axesqHh\\x06X\\x06\\x00\\x00\\x00blocksqI]qJ}qK(X\\x06\\x00\\x00\\x00valuesqLh+X\\x08\\x00\\x00\\x00mgr_locsqMcbuiltins\\nslice\\nqNK\\x00K\\x04K\\x01\\x87qORqPuaustqQbX\\x04\\x00\\x00\\x00_typqRX\\t\\x00\\x00\\x00dataframeqSX\\t\\x00\\x00\\x00_metadataqT]qUub.\')'
df1=eval(A)
I wrote these function to help me understand the meaning of applying or getting results
# This function will give you the students's answer {Answer_Student_1_Q_1, Answer_Student_1_Q_2}
def Get_student_answers_per_question(i1, total_number_of_questions):
g=df1.index[df1['Name']!='Genious'][i1]
Gen1=df1.ix[g,:]
print(Gen1)
for i in range(total_number_of_questions):
if isinstance(Gen1[i+2], float):
foo='Answer_Student_'+str(i1+1)+'_Q_'+str(i+1)+ '= Gen1['+ str(i+2) +']'
else:
foo='Answer_Student_'+str(i1+1)+'_Q_'+str(i+1)+ '= Gen1['+ str(i+2) +'].split(",")'
exec(foo, locals(), globals())
pass
# Get student score for each answer in question
def Get_score_per_question(student, question, dictionary_mark, total_number_of_questions):
# Get_all_answers will get all the answers for student (student)
Get_all_answers=Get_student_answers_per_question(student-1,total_number_of_questions)
foo='Answer_Student_'+str(student)+'_Q_'+str(question)
print(foo)
v=[dictionary_mark.get(i) for i in exec(foo)]
return v
Now in the last function, Get_score_per_question, I was trying to code
v=[dictionary_mark.get(i) for i in exec(foo)]
where v is the score of the variable if available in the answer of the dictionary.
so depending on the entries in the string variable foo the results would be of same length with numbers
The example that I am trying to run is this
student=1
question=1
dictionary_mark={'Uploaded the data': 1,
'Specified the reason behind the selected stocks': 1,
'You selected both stocks from different industries': 1,
'You were successful in cleaning your data-sets': 2,
'You have had a justification why you selected this particular time period of your chosen stock. This should have been answered in Question 2.': 1,
'You did extra work for this question and you deserve a round of applause.': 1}
total_number_of_questions=2
Get_score_per_question(student, question, dictionary_mark, total_number_of_questions)
Where, as you can foresee, I will get the following error
TypeError: 'NoneType' object is not iterable
Can somebody help me in this regards, and is there any tutorial or page that someone could refer me to do a better coding in such surveys in python espcially when handling splits and so on.
Docs of exec() function : https://docs.python.org/3/library/functions.html#exec
Be aware that the return and yield statements may not be used outside of function definitions even within the context of code passed to the exec() function. The return value is None.
And you have used this piece of code : [dictionary_mark.get(i) for i in exec(foo)]
So obviously the None object doesn't implement __iter__ function and you get Type Error when you try to loop through

How to remove special usual characters from a pandas dataframe using Python

I have a file some crazy stuff in it. It looks like this:
I attempted to get rid of it using this:
df['firstname'] = map(lambda x: x.decode('utf-8','ignore'), df['firstname'])
But I wound up with this in my dataframe: <map object at 0x0000022141F637F0>
I got that example from another question and this seems to be the Python3 method for doing this but I'm not sure what I'm doing wrong.
Edit: For some odd reason someone thinks that this has something to do with getting a map to return a list. The central issue is getting rid of non UTF-8 characters. Whether or not I'm even doing that correctly has yet to be established.
As I understand it, I have to apply an operation to every character in a column of the dataframe. Is there another technique or is map the correct way and if it is, why am I getting the output I've indicated?
Edit2: For some reason, my machine wouldn't let me create an example. I can now. This is what i'm dealing with. All those weird characters need to go.
import pandas as pd
data = [['🦎Ale','Αλέξανδρα'],['��Grain','Girl🌾'],['Đỗ Vũ','ên Anh'],['Don','Johnson']]
df = pd.DataFrame(data,columns=['firstname','lastname'])
print(df)
Edit 3: I tired doing this using a reg ex and for some reason, it still didn't work.
df['firstname'] = df['firstname'].replace('[^a-zA-z\s]',' ')
This regex works FINE in another process, but here, it still leaves the ugly characters.
Edit 4: It turns out that it's image data that we're looking at.

How to use a conditional statement while scraping?

I'm trying to scrape the MTA website and need a little help scraping the "Train Lines Row." (Website for reference: https://advisory.mtanyct.info/EEoutage/EEOutageReport.aspx?StationID=All
The train line information is stored as image files (1 line subway, A line subway, etc) describing each line that's accessible through a particular station. I've had success scraping info out of rows in which only one train passes through, but I'm having difficulty figuring out how to iterate through the columns which have multiple trains passing through it...using a conditional statement to test for whether it has one line or multiple lines.
tableElements = table.find_elements_by_tag_name('tr')
that's the table i'm iterating through
tableElements[2].find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt')
this successfully gives me the values if only one value exists in the particular column
tableElements[8].find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')
this successfully gives me a list of values I can successfully iterate through to extract my needed values.
Now I try and combine these lines of code together in a forloop to extract all the information without stopping.
for info in tableElements[1:]:
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
for images in info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img'):
print(images.get_attribute('alt'))
else:
print(info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt'))
I'm getting the error message: "list index out of range." I dont know why, as every iteration done in isolation seems to work. My hunch is I haven't correctly used the boolean operation properly here. My idea was that if find_elements_by_tag_name had an index of [1] that would mean multiple image text for me to iterate through. Hence, why I want to use this boolean operation.
Hi All, thanks so much for your help. I've uploaded my full code to Github and attached a link for your reference: https://github.com/tsp2123/MTA-Scraping/blob/master/MTA.ElevatorData.ipynb
The end goal is going to be to put this information into a dataframe using some formulation of and having a for loop that will extract the image information that I want.
dataframe = []
for elements in tableElements:
row = {}
columnName1 = find_element_by_class_name('td')
..
Your logic isn't off here.
"My hunch is I haven't correctly used the boolean operation properly here. My idea was that if find_elements_by_tag_name had an index of [1] that would mean multiple image text for me to iterate through."
The problem is it can't check if the statement is True if there's nothing in index position [1]. Hence the error at this point.
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
What you want to do is use try: So something like:
for info in tableElements[1:]:
try:
if info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img')[1] == True:
for images in info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_elements_by_tag_name('img'):
print(images.get_attribute('alt'))
else:
print(info.find_elements_by_tag_name('td')[1].find_element_by_tag_name('h4').find_element_by_tag_name('img').get_attribute('alt'))
except:
#do something else
print ('Nothing found in index position.')
Is it also possible to back to your question and provide the full code? When I try this, I'm getting 11 table elements, so want to test it with the specific table you're trying to scrape.

Pandas Drop and Replace functions won't work within a UDF

I looked around at other questions but couldn't find out that addresses the issue I'm having. I am cleaning a data set in an ipython notebook. When I run the cleaning tasks individually they work as expected, but I am having trouble with the replace() and drop() functions when they are included in a UDF. Specifically, these lines aren't doing anything within the UDF, however, a dataframe is returned that completes the other tasks as expected (i.e. reads in the file, sets the index, and filters select dates out).
Any help is much appreciated!
Note that in this problem the df.drop() and df.replace() commands both work as expected when executed outside of the UDF. The function is below for your reference. The issue is with the last two lines "station.replace()" and "station.drop()".
def read_file(file_path):
'''Function to read in daily x data'''
if os.path.exists(os.getcwd()+'/'+file_path) == True:
station = pd.read_csv(file_path)
else:
!unzip alldata.zip
station = pd.read_csv(file_path)
station.set_index('date',inplace=True) #put date in the index
station = station_data[station_data.index > '1984-09-29'] #removes days where there is no y-data
station.replace('---','0',inplace=True)
station.drop(columns=['Unnamed: 0'],axis=1,inplace=True) #drop non-station columns
There was a mistake here:
station = station_data[station_data.index > '1984-09-29']
I was using an old table index. I corrected it to:
station = station[station.index > '1984-09-29']
Note, I had to restart the notebook and re-run it from the top for it to work. I believe it was an issue with conflicting table names in the UDF vs. what was already stored in memory.

Python KafkaConsumer start consuming messages from a timestamp

I'm planning to skip the start of the topic and only read messages from a certain timestamp to the end. Any hints on how to achieve this?
I'm guessing you are using kafka-python (https://github.com/dpkp/kafka-python) as you mentioned "KafkaConsumer".
You can use the offsets_for_times() method to retrieve the offset that matches a timestamp. https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html#kafka.KafkaConsumer.offsets_for_times
Following that just seek to that offset using seek(). https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html#kafka.KafkaConsumer.seek
Hope this helps!
I got around it, however I'm not sure about the values that I got from using the method.
I have a KafkaConsumer (ck), I got the partitions for the topic with the assignment() method. Thus, I can create a dictionary with the topics and the timestamp I'm interested into (in this case 100).
Side Question: Should I use 0 in order to get all the messages?.
I can use that dictionary as the argument in the offsets_for_times(). However, the values that I got are all None
zz = dict(zip(ck.assignment(), [100]*ck.assignment() ))
z = ck.offsets_for_times(zz)
z.values()
dict_values([None, None, None])

Resources