How to search tweets from an id to another id - python-3.x

I'm trying to get tweets using TwitterSearch in Python3.
So basically I want to get all tweets between these 2 IDs.
748843914254249984 ->760065085616250880
These 2 IDs are from the
Fri Jul 01 11:41:16 +0000 2016 to Mon Aug 01 10:50:12 +0000 2016
So here's the code I made.
crawl.py
#!/usr/bin/python3
# coding: utf-8
from TwitterSearch import *
import datetime
def crawl():
try:
tso = TwitterSearchOrder()
tso.set_keywords(["keyword"])
tso.set_since_id(748843914254249984)
tso.set_max_id(760065085616250880)
ACCESS_TOKEN = xxx
ACCESS_SECRET = xxx
CONSUMER_KEY = xxx
CONSUMER_SECRET = xxx
ts = TwitterSearch(
consumer_key = CONSUMER_KEY,
consumer_secret = CONSUMER_SECRET,
access_token = ACCESS_TOKEN,
access_token_secret = ACCESS_SECRET
)
for tweet in ts.search_tweets_iterable(tso):
print(tweet['id_str'], '-', tweet['created_at'])
except TwitterSearchException as e:
print( e )
if __name__ == '__main__':
crawl()
I'm not very familiar with Twitter API and searching with it. But this code should do the job.
But it's giving :
760058064816988160 - Mon Aug 01 10:22:18 +0000 2016
[...]
760065085616250880 - Mon Aug 01 10:50:12 +0000 2016
Many, many times... Like I got the same lines over and over again instead of getting everything between my two IDs.
So I'm not getting any of the July tweets, any idea why ?

TL;DR
Remove the tso.set_max_id(760065085616250880) line.
Explanation (as far as I understand it)
I have found your problem in the TwitterSearch Docs:
"The only parameter with a default value is count with 100. This is because it is the maximum of tweets returned by this very Twitter API endpoint."
If I check this in your code by creating a search URL, I get:
tso.create_search_url()
#?q=Vuitton&since_id=748843914254249984&count=100&max_id=760065085616250880
which contains count=100 (meaning it will get the first page of 100 tweets). And, in contrast with removing the set_since_id and set_max_id which also has count=100 and retrieves many more tweets, it stops at 100 tweets.
set_since_id without set_max_id works, the other way around doesn't. So removing the max_id=760065085616250880 from the search URL resulted in the results you want.
If anyone can explain why set_max_id is not working along, please edit my answer.

Related

how to get only date string from a long string

I know there are lots of Q&As to extract datetime from string, such as dateutil.parser, to extract datetime from a string
import dateutil.parser as dparser
dparser.parse('something sep 28 2017 something',fuzzy=True).date()
output: datetime.date(2017, 9, 28)
but my question is how to know which part of string results this extraction, e.g. i want a function that also returns me 'sep 28 2017'
datetime, datetime_str = get_date_str('something sep 28 2017 something')
outputs: datetime.date(2017, 9, 28), 'sep 28 2017'
any clue or any direction that i can search around?
Extend to the discussion with #Paul and following the solution from #alecxe, I have proposed the following solution, which works on a number of testing cases, I've made the problem slight challenger:
Step 1: get excluded tokens
import dateutil.parser as dparser
ostr = 'something sep 28 2017 something abcd'
_, excl_str = dparser.parse(ostr,fuzzy_with_tokens=True)
gives outputs of:
excl_str: ('something ', ' ', 'something abcd')
Step 2 : rank tokens by length
excl_str = list(excl_str)
excl_str.sort(reverse=True,key = len)
gives a sorted token list:
excl_str: ['something abcd', 'something ', ' ']
Step 3: delete tokens and ignore space element
for i in excl_str:
if i != ' ':
ostr = ostr.replace(i,'')
return ostr
gives a final output
ostr: 'sep 28 2017 '
Note: step 2 is required, because it will cause problem if any shorter token a subset of longer ones. e.g., in this case, if deletion follows an order of ('something ', ' ', 'something abcd'), the replacement process will remove something from something abcd, and abcd will never get deleted, ends up with 'sep 28 2017 abcd'
Interesting problem! There is no direct way to get the parsed out date string out of the bigger string with dateutil. The problem is that dateutil parser does not even have this string available as an intermediate result as it really builds parts of the future datetime object on the fly and character by character (source).
It, though, also collects a list of skipped tokens which is probably your best bet. As this list is ordered, you can loop over the tokens and replace the first occurrence of the token:
from dateutil import parser
s = 'something sep 28 2017 something'
parsed_datetime, tokens = parser.parse(s, fuzzy_with_tokens=True)
for token in tokens:
s = s.replace(token.lstrip(), "", 1)
print(s) # prints "sep 28 2017"
I am though not 100% sure if this would work in all the possible cases, especially, with the different whitespace characters (notice how I had to workaround things with .lstrip()).

Momentjs get time in current location

I'm trying to generate a momentjs object of a certain timestamp in the current day of a specified location. For example:
const timeNow = moment().tz('Africa/Cairo')
const startTime = moment('10:00 am', 'HH:mm a')
const endTime = moment('2:30 pm', 'HH:mm a')
Printing the above 3 variables outputs this:
Fri, 12:31 am
Thu, 10:00 am
Thu, 02:30 pm
Where the first result is in fact the current time in Cairo, However the other two results are the day before. How can I change it so that they return the current day?
You can simply do:
moment.tz('Africa/Cairo') // <= Moment Object
One small info: whenever you'll get to see some javascript date in a browser that will be shown in your system's time-zone. As javascript Date is UTC, browsers will show accordingly. Use moment.format() to get string values.

Write date and variable to file

I am trying to write a variable and the date and time on the same line to a file, which will simulate a log file.
Example: July 25 2018 6:00 pm - Variable contents here
So far I am able to write the variable to the file but I am unsure how to use the datetime library or other similar libraries. Some guidance would be appreciated.
Below is the current script.
import subprocess
import datetime
var = "test"
with open('auditlog.txt', 'a') as logfile:
logfile.write(var + "\n")
The fastest way I found is doing something like this:
import time
var = time.asctime()
print(var)
Result: Thu Jul 26 00:46:04 2018
If you want to change the placements of y/m/d etc. you can alternatively use this:
import time
var = time.strftime("%B %d %Y %H:%M pm", time.localtime())
print(var)
Result: July 26 2018 00:50 pm
Have a look here.
By the way, is the subprocess intended in your code? You don't need it to open/write to files. Also you should do logfile.close() in your code after you wrote to it.

why isn't my parsing from JSON working in python

with open('tweets.json') as json_data:
data = json.load(json_data)
print(data['text'])
I want to extract specific data/values but I keep getting this error:
print(data['text'])
TypeError: string indices must be integers
I am a beginner with python and i am trying to learn by using the twitter api
this is my json:
"{\"created_at\":\"Wed Feb 03 03:02:04 +0000 2016\",\"id\":694717462621884416,\"id_str\":\"694717462621884416\",\"text\":\"Finallyy #taylorcaniff Happy bday bae, I love you soooo much, keep smiling, I'm so proud of everything you've done\\u2661 https:\\/\\/t.co\\/uwjeASxsA3\",\"source\":\"\\u003ca href=\\\"http:\\/\\/twitter.com\\/download\\/android\\\" rel=\\\"nofollow\\\"\\u003eTwitter for Android\\u003c\\/a\\u003e\",\"truncated\":false,\"in_reply_to_status_id\":null,\"in_reply_to_status_id_str\":null,\"in_reply_to_user_id\":null,\"in_reply_to_user_id_str\":null,\"in_reply_to_screen_name\":null,\"user\":{\"id\":1364125758,\"id_str\":\"1364125758\",\"name\":\"C o l l i n e r\",\"screen_name\":\"HoodsPizzaxJCat\",\"location\":\"2\\/5 UJ | The Vamps DM\",\"url\":null,\"description\":\"\\u25a8Issa liked x2 & follow\\u25a8Brent Follow, liked x4 &DM\\u25a8Chris liked x3 and follows\\u25a8Taylor, Kizzy, Jacob, Caspar, King B. & Momma Collins follow\\u25a8Trevor Liked \\u25a8\",\"protected\":false,\"verified\":false,\"followers_count\":12136,\"friends_count\":13282,\"listed_count\":20,\"favourites_count\":29245,\"statuses_count\":46864,\"created_at\":\"Fri Apr 19 10:59:10 +0000 2013\",\"utc_offset\":-10800,\"time_zone\":\"Buenos Aires\",\"geo_enabled\":true,\"lang\":\"es\",\"contributors_enabled\":false,\"is_translator\":false,\"profile_background_color\":\"09ED92\",\"profile_background_image_url\":\"http:\\/\\/pbs.twimg.com\\/profile_background_images\\/506799872326893569\\/vdaHWDTj.jpeg\",\"profile_background_image_url_https\":\"https:\\/\\/pbs.twimg.com\\/profile_background_images\\/506799872326893569\\/vdaHWDTj.jpeg\",\"profile_background_tile\":true,\"profile_link_color\":\"4CC74C\",\"profile_sidebar_border_color\":\"FFFFFF\",\"profile_sidebar_fill_color\":\"DDEEF6\",\"profile_text_color\":\"333333\",\"profile_use_background_image\":true,\"profile_image_url\":\"http:\\/\\/pbs.twimg.com\\/profile_images\\/688994368057921536\\/IKy-2UYn_normal.jpg\",\"profile_image_url_https\":\"https:\\/\\/pbs.twimg.com\\/profile_images\\/688994368057921536\\/IKy-2UYn_normal.jpg\",\"profile_banner_url\":\"https:\\/\\/pbs.twimg.com\\/profile_banners\\/1364125758\\/1450566712\",\"default_profile\":false,\"default_profile_image\":false,\"following\":null,\"follow_request_sent\":null,\"notifications\":null},\"geo\":null,\"coordinates\":null,\"place\":null,\"contributors\":null,\"is_quote_status\":false,\"retweet_count\":0,\"favorite_count\":0,\"entities\":{\"hashtags\":[],\"urls\":[],\"user_mentions\":[{\"screen_name\":\"taylorcaniff\",\"name\":\"Taylor Caniff\",\"id\":1396698397,\"id_str\":\"1396698397\",\"indices\":[9,22]}],\"symbols\":[],\"media\":[{\"id\":694717457911693312,\"id_str\":\"694717457911693312\",\"indices\":[116,139],\"media_url\":\"http:\\/\\/pbs.twimg.com\\/media\\/CaQh2OIWwAA6G_C.jpg\",\"media_url_https\":\"https:\\/\\/pbs.twimg.com\\/media\\/CaQh2OIWwAA6G_C.jpg\",\"url\":\"https:\\/\\/t.co\\/uwjeASxsA3\",\"display_url\":\"pic.twitter.com\\/uwjeASxsA3\",\"expanded_url\":\"http:\\/\\/twitter.com\\/HoodsPizzaxJCat\\/status\\/694717462621884416\\/photo\\/1\",\"type\":\"photo\",\"sizes\":{\"large\":{\"w\":480,\"h\":800,\"resize\":\"fit\"},\"thumb\":{\"w\":150,\"h\":150,\"resize\":\"crop\"},\"small\":{\"w\":340,\"h\":566,\"resize\":\"fit\"},\"medium\":{\"w\":480,\"h\":800,\"resize\":\"fit\"}}}]},\"extended_entities\":{\"media\":[{\"id\":694717457911693312,\"id_str\":\"694717457911693312\",\"indices\":[116,139],\"media_url\":\"http:\\/\\/pbs.twimg.com\\/media\\/CaQh2OIWwAA6G_C.jpg\",\"media_url_https\":\"https:\\/\\/pbs.twimg.com\\/media\\/CaQh2OIWwAA6G_C.jpg\",\"url\":\"https:\\/\\/t.co\\/uwjeASxsA3\",\"display_url\":\"pic.twitter.com\\/uwjeASxsA3\",\"expanded_url\":\"http:\\/\\/twitter.com\\/HoodsPizzaxJCat\\/status\\/694717462621884416\\/photo\\/1\",\"type\":\"photo\",\"sizes\":{\"large\":{\"w\":480,\"h\":800,\"resize\":\"fit\"},\"thumb\":{\"w\":150,\"h\":150,\"resize\":\"crop\"},\"small\":{\"w\":340,\"h\":566,\"resize\":\"fit\"},\"medium\":{\"w\":480,\"h\":800,\"resize\":\"fit\"}}}]},\"favorited\":false,\"retweeted\":false,\"possibly_sensitive\":false,\"filter_level\":\"low\",\"lang\":\"en\",\"timestamp_ms\":\"1454468524972\"}\r\n"
Data is in text format. (not in JSON). So first you have to convert it in JSON.
>>> json.loads(data)["text"]
u"Finallyy #taylorcaniff Happy bday bae, I love you soooo much, keep smiling, I'm so proud of everything you've done\u2661 https://t.co/uwjeASxsA3"

How to get assertion value using groovy script

I have one test step which contains two assertion.
Not SOAP Fault
Contains. The Condition is that response should contain "Message Sent Successfully"
Now I have one groovy script, from where I am executing this test step. Using this groovy script I need to print assertion name, Value and Status. Below is the code I have written:
testStepSrc = testCase.getTestStepByName(testName)
Assertioncounter = testStepSrc.getAssertionList().size()
for (AssertionCount in 0..Assertioncounter-1)
{
log.info("Assertion :" + testStepSrc.getAssertionAt(AssertionCount).getName() + " :: " + testStepSrc.getAssertionAt(AssertionCount).getStatus())
error = testStepSrc.getAssertionAt(AssertionCount).getErrors()
if (error != null)
{
log.error(error[0].getMessage())
}
}
but in output it is displaying like:
Wed Sep 04 17:21:11 IST 2013:INFO:Assertion :Not SOAP Fault :: VALID
Wed Sep 04 17:21:11 IST 2013:INFO:Assertion :Contains :: VALID
As you can see, I am able to print assertion name and status but not the value of 'Contains' assertion. Please help me how to get the value of a particular assertion.
Thanks in advance.
So here is some things for you to read
http://www.soapui.org/forum/viewtopic.php?t=359
http://whathaveyoutried.com
and what i tried
def assertionsList = testRunner.getTestCase().getTestStepByName("Test Step Name").getAssertionList()
for( e in assertionsList){
log.info e.getToken() //gives the value of the content to search for
log.info e.DESCRIPTION
log.info e.ID
log.info e.LABEL
log.info e.toString()
}
This gives the following output
Wed Sep 04 15:12:19 ADT 2013:INFO:Abhishek //the contains assertion was checking for the word "Abhishek" in the response of my test step where the assertion was applied.
Wed Sep 04 15:12:19 ADT 2013:INFO:Searches for the existence of a string token in the property value, supports regular expressions. Applicable to any property.
Wed Sep 04 15:12:19 ADT 2013:INFO:Simple Contains
Wed Sep 04 15:12:19 ADT 2013:INFO:Contains
Wed Sep 04 15:12:19 ADT 2013:INFO:com.eviware.soapui.impl.wsdl.teststeps.assertions.basic.SimpleContainsAssertion#c4115f0
Abhishek's response did contain you answer I believe but just not in the format you were looking for.
I was looking for the same info for custom reporting and after digging through The SoapUI forms I stumbled upon this.
The piece of code that I believe you are looking for is:
log.info e.getToken()
however this is an example of how to retrieve it only when an error occurs but you can get it in a valid scenario using something similar to:
def iAssertionName = assertionNameList[j]
def iAssertionStatus = testStep.getAssertionAt(j).getStatus().toString()
def tstep = testStep.getName()
def gStatus = testStep.getAssertionAt(j).status
def expect = testStep.getAssertionAt(j).getToken()
log.info "Expected Content: " + expect
This is a subset of my code but produces the log message:
Fri Sep 20 11:04:09 CDT 2013:INFO:Expected Content: success
My SoapUI script assertion was checking to see if my response contained the string "success".
Thanks Abhishek for your response!

Resources