Trend Micro deepsecurity delete a computer: HTTP Status 400 – Bad Request - python-3.x

Trying to delete a computer from TMDS (Trend Micro Deep Security) with the python script provided.
The script was copied from TMDS and slightly altered. I added a line where the document which opens also is being read.
tmds_del_comp_list.txt => contains the computername to delete. eg: computername.domain.domain
creds.py contains the api_key.
The ipaddress and port has been changed for obvious reasons.
To confirm, the deepsecurity module has been installed in the same directory.
Directories deepsecurity, deep_security_api.egg-info, build and pycache are present.
# DEL operation, /api/computers/{computerID}
from __future__ import print_function
import sys, warnings
import deepsecurity
from deepsecurity.rest import ApiException
import creds
# Setup
if not sys.warnoptions:
warnings.simplefilter("ignore")
configuration = deepsecurity.Configuration()
configuration.host = 'https://ipaddress:1234/api/computers/{computerID}'
# Authentication
configuration.api_key['api-secret-key'] = creds.api_key
# Initialization
# Set Any Required Values
# api_version = 'v4'
reed = open('tmds_del_comp_list.txt', mode='r' )
computer_id = reed.read()
api_instance = deepsecurity.ComputersApi(deepsecurity.ApiClient(configuration))
api_version = 'v1'
try:
api_instance.delete_computer(computer_id, api_version)
except ApiException as e:
print("An exception occurred when calling ComputersApi.delete_computer: %s\n" % e)
error received:
An exception occurred when calling ComputersApi.delete_computer: (400)
Reason:
HTTP response headers: HTTPHeaderDict({'Content-Type': 'text/html;charset=utf-8', 'Content-Language': 'en', 'Content-Length': '435', 'Date': 'Thu, 06 Oct 2022 09:24:22 GMT', 'Connection': 'close'})
HTTP response body: <!doctype html><html lang="en"><head><title>HTTP Status 400 – Bad Request</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 400 – Bad Request</h1></body></html>
Before this error i received different errors, which i corrected. I had a .env file instead of creds.py.
I have tested out if my tmds_del_comp_list.txt file was actually being read, it was not so that's why i added the line with the read function. ( when i did print print(reed) nothing came up)
The api_version was wrong, from the documentation i understood that TMDS version 20 corresponds to version v4 in the API. After changing it to v1, no more error. When double checking the version in the browser https://ipaddress:4119/rest/apiVersion I get 4. Bit baffled by this.
'https://ipaddress:1234/api/computers/{computerID}'
I find the url weird. The {computerID} bit is what i find weird since it does not correspond to any variable. I do not see how it works together with the rest of the code, unless api_instance.delete_computer adds computer_id to {computerID}. There's no indication that what i think is correct or not.
api_instance.delete_computer(computer_id, api_version)
Googling does not really bring any relevant information up.
I'm a beginner with python, api's and deepsecurity.
Any leads, pointing to the obvious and constructive help/comments/etc are very welcome.
Edit1: looking back at all the docs available, i see that "computerID" should be an integer, which in our organisation, it is not a number or integer but a vm name + domain name.
Maybe there's a number connected to every VM reporting to TMDS. Maybe i'll try to Get/List all computers to see what id's they have.
I did try that and i could not find an ID number with just a number.
This is probably not the issue.
path Parameters
computerID
required
integer <int32> \d+
The ID number of the computer to delete.
Example: 1
Edit2: When clicking on the arrow down next to GET/computers from link <https://automation.deepsecurity.trendmicro.com/article/12_0/api-reference/tag/Computers#operation/listComputers>
I get to see this link https://automation.deepsecurity.trendmicro.com/#operation/searchComputers/computers which points to the correct endpoint i presume and where #operation should be replaced by GET. When doing so i get a 404 response. I got a 404 when changing the endpoint to /get/computers.
Conclusion, end-point is probably correct, when it is not, i do get an error that the url is wrong.
error:
An exception occurred when calling ComputersApi.list_computers: (404)
Reason:
HTTP response headers: HTTPHeaderDict({'X-Frame-Options': 'SAMEORIGIN', 'X-XSS-Protection': '1;mode=block', 'Set-Cookie': 'JSESSIONID=codewithlettersandnumbers; Path=/; Secure; HttpOnly; SameSite=Lax', 'Content-Type': 'text/html;charset=ISO-8859-1', 'Content-Length': '105', 'Date': 'Thu, 06 Oct 2022 14:56:58 GMT'})
HTTP response body:
<html>
<head>
<meta http-equiv="REFRESH" content="0;url=/SignIn.screen">
</head>
<body>
</body>
</html>
Edit3: Testing a simple GET/computers with Postman gave me a clue that the actual key was wrong. I corrected that and got a 200 response. So the key was wrong. I corrected that on my python script but i still get the same 400 error.

Related

How to fetch only parts of json file in python3 requests module

So, I am writing a program in Python to fetch data from google classroom API using requests module. I am getting the full json response from the classroom as follows :
{'announcements': [{'courseId': '#############', 'id': '###########', 'text': 'This is a test','state': 'PUBLISHED', 'alternateLink': 'https://classroom.google.com/c/##########/p/###########', 'creationTime': '2021-04-11T10:25:54.135Z', 'updateTime': '2021-04-11T10:25:53.029Z', 'creatorUserId': '###############'}, {'courseId': '############', 'id': '#############', 'text': 'Hello everyone', 'state': 'PUBLISHED', 'alternateLink': 'https://classroom.google.com/c/#############/p/##################', 'creationTime': '2021-04-11T10:24:30.952Z', 'updateTime': '2021-04-11T10:24:48.880Z', 'creatorUserId': '##############'}, {'courseId': '##################', 'id': '############', 'text': 'Hello everyone', 'state': 'PUBLISHED', 'alternateLink': 'https://classroom.google.com/c/##############/p/################', 'creationTime': '2021-04-11T10:23:42.977Z', 'updateTime': '2021-04-11T10:23:42.920Z', 'creatorUserId': '##############'}]}
I was actually unable to convert this into a pretty format so just pasting it as I got it from the http request. What I actually wish to do is just request the first few announcements (say 1, 2, 3 whatever depending upon the requirement) from the service while what I'm getting are all the announcements (as in the sample 3 announcements) that had been made ever since the classroom was created. Now, I believe that fetching all the announcements might make the program slower and so I would prefer if I could get only the required ones. Is there any way to do this by passing some arguments or anything? There are a few direct functions provided by google classroom however I came across those a little later and have already written everything using the requests module which would require changing a lot of things which I would like to avoid. However if unavoidable I would go that route as well.
Answer:
Use the pageSize field to limit the number of responses you want in the announcements: list request, with an orderBy parameter of updateTime asc.
More Information:
As per the documentation:
orderBy: string
Optional sort ordering for results. A comma-separated list of fields with an optional sort direction keyword. Supported field is updateTime. Supported direction keywords are asc and desc. If not specified, updateTime desc is the default behavior. Examples: updateTime asc, updateTime
and:
pageSize: integer
Maximum number of items to return. Zero or unspecified indicates that the server may assign a maximum.
So, let's say you want the first 3 announcements for a course, you would use a pageSize of 3, and an orderBy of updateTime asc:
# Copyright 2021 Google LLC.
# SPDX-License-Identifier: Apache-2.0
service = build('classroom', 'v1', credentials=creds)
asc = "updateTime asc"
pageSize = 3
# Call the Classroom API
results = service.courses().announcements().list(pageSize=3, orderBy=asc ).execute()
or an HTTP request example:
GET https://classroom.googleapis.com/v1/courses/[COURSE_ID]/announcements
?orderBy=updateTime%20asc
&pageSize=2
&key=[YOUR_API_KEY] HTTP/1.1
Authorization: Bearer [YOUR_ACCESS_TOKEN]
Accept: application/json
References:
Method: announcements.list | Classroom API | Google Developers

find_all() in BeautifulSoup returns empty ResultSet

I am trying to scrape data from a website for practicing web scraping.But the findall() returns empty set. How can I resolve this issue?
#importing required modules
import requests,bs4
#sending request to the server
req = requests.get("https://www.udemy.com/courses/search/?q=python")
# checking the status on the request
print(req.status_code)
req.raise_for_status()
#converting using BeautifulSoup
soup = bs4.BeautifulSoup(req.text,'html.parser')
#Trying to scrape the particular div with the class but returning 0
container = soup.find_all('div',class_='popover--popover--t3rNO popover--popover-hover--14ngr')
#trying to print the number of container returned.
print(len(container))
Output :
200
0
See my comment about it being entirely javascript driven content. Modern websites often will use javascript to invoke HTTP requests to the server to grab data on demand when needed. Here if you disable javascript which you can easily do in chrome by going to more settings when you inspect the page. You will see that NO text is available on this website. Which is probably much different to imdb as you pointed out. If you check the beautifulsoup parsed html, you'll see you don't have any of the actual page source derived with javascript.
There are two ways to get data from a javascript rendered website
Mimic the HTTP request to the server
Browser automation package like selenium
The first option is better and more efficient, as the second option is more brittle and not great for larger data sets.
Fortunately udemy is getting the data you want from an API endpoint which it uses javascript to make HTTP requests to and the response back gets fed to the browser.
Code Example
import requests
cookies = {
'__udmy_2_v57r': '4f711b308da548b49394854a189d3179',
'ud_firstvisit': '2020-05-29T13:48:56.584511+00:00:1jefNY:9F1BJVEUJpv7gmNPgYNini76UaE',
'existing_user': 'true',
'optimizelyEndUserId': 'oeu1590760136407r0.2130390415126655',
'EUCookieMessageShown': 'true',
'_ga': 'GA1.2.1359933509.1590760142',
'_pxvid': '26d89ed1-a1b3-11ea-9179-cb750fa4136b',
'_ym_uid': '1585144165890161851',
'_ym_d': '1590760145',
'__ssid': 'd191bc02a1063fd2c75fbab525ededc',
'stc111655': 'env:1592304425%7C20200717104705%7C20200616111705%7C1%7C1014616:20210616104705|uid:1590760145861.374775813.04725504.111655.1839745362:20210616104705|srchist:1069270%3A1%3A20200629134905%7C1014624%3A1592252104%3A20200716201504%7C1014616%3A1592304425%3A20200717104705:20210616104705|tsa:0:20200616111705',
'ki_t': '1590760146239%3B1592304425954%3B1592304425954%3B3%3B5',
'ki_r': 'aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8%3D',
'IR_PI': '00aea1e6-9da9-11ea-af3a-42010a24660a%7C1592390825988',
'_gac_UA-12366301-1': '1.1592304441.CjwKCAjw26H3BRB2EiwAy32zhfcltNEr_HHFK5JRaJar5qxUn4ifG9FVFctWyTUXigNZvKeOCz7PgxoCAfAQAvD_BwE',
'csrftoken': 'pPOdtdbH0HPaHvDfAZMzEOdvWqKZuQWufu8dUrEeXuy5mOOrnFRbWZ9vq8Dfd2ts',
'__cfruid': 'f1963d736e3891a2e307ebc9f918c89065ffe40f-1596962093',
'__cfduid': 'df4d951c87bc195c73b2f12b5e29568381597085850',
'ud_cache_price_country': 'GB',
'ud_cache_device': 'desktop',
'ud_cache_language': 'en',
'ud_cache_logged_in': '0',
'ud_cache_release': '0804b40d37e001f97dfa',
'ud_cache_modern_browser': '1',
'ud_cache_marketplace_country': 'GB',
'ud_cache_brand': 'GBen_US',
'ud_cache_version': '1',
'ud_cache_user': '',
'seen': '1',
'eventing_session_id': '66otW5O9TQWd5BYq1_etrA-1597087737933',
'ud_cache_campaign_code': '',
'exaff': '%7B%22start_date%22%3A%222020-08-09T08%3A52%3A04.083577Z%22%2C%22code%22%3A%22_7fFXpljNdk-m3_OJPaWBwAQc5gVKutaSg%22%2C%22merchant_id%22%3A39197%2C%22aff_type%22%3A%22LS%22%2C%22aff_id%22%3A60680%7D:1k5D3W:2PemPLTm4xaHixBYRvRyBaAukL4',
'evi': 'SlFfLh4RBzwTSVBjXFdHehNJUGMYQE99HVFdIExYQ3gARVY8QkAWIEEDCXsVQEd0BEsJexVAA24LQgdjGANXdgZBG3ETH1luRBdHKBoHV3ZKURl5XVBXdkpRXWNUU1luRxIJe1lTQXhMDgdjHRAFbgsICXNWVk1uCwgJN0xYRGATBUpjVFVEdAEOB2NcWkR+E0lQYxhAT30dUV0gTFhCfAhDVm1MUEJ0B1EROkwUV3YAXwk3D0BPewFAHzxCQEd0BUcJexVAA24LQgdjGANXdgZCHHETTld+BkUdY1QZVzoTSRptTBQUbgtFEnleHwhgEwBcY1QZV34HShtjVBlXOhNJE21MFBRuC0UceV4fWW4DSxh3TFgObkdREXBCQAMtE0kccFtUCGATQR54VkBPNxMFCXtfTlc6UFERd1tUTTEdURlzX1JXdkpRXWNUU1luRxIJe1tXQnpMXwlzVldDbgsICTdMWEdgEwVKY1RVRHUJDgdjXFdCdBNJUGMYQE99HVFdIExYQ3kCQ1Y8Ew==',
'ud_rule_vars': 'eJyFjkuOwyAQBa9isZ04agyYz1ksIYxxjOIRGmhPFlHuHvKVRrPItvWqus4EXT4EDJP9jSViyobPktKRgZqc4GrkmmmuBHdU6YlRqY1P6RgDMQ05D2SOueCDtZPDMNT7QDrooAXRdrqhzHBlRL8XUjPgXwAGYCC7ulpdRX3acglPA8bvPwbVgm6g4p0Bvqeyhsh_BkybXyxmN8_R21J9vvpcjm5cn7ZDTidc7G2xxnvlm87hZwvlU7wE2VP1en0hlyuoG10j:1k5D3W:nxRv-tyLU7lxhsF2jRYvkJA53uM',
}
headers = {
'authority': 'www.udemy.com',
'x-udemy-cache-release': '0804b40d37e001f97dfa',
'x-udemy-cache-language': 'en',
'x-udemy-cache-user': '',
'x-udemy-cache-modern-browser': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
'accept': 'application/json, text/plain, */*',
'x-udemy-cache-brand': 'GBen_US',
'x-udemy-cache-version': '1',
'x-requested-with': 'XMLHttpRequest',
'x-udemy-cache-logged-in': '0',
'x-udemy-cache-price-country': 'GB',
'x-udemy-cache-device': 'desktop',
'x-udemy-cache-marketplace-country': 'GB',
'x-udemy-cache-campaign-code': '',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.udemy.com/courses/search/?q=python',
'accept-language': 'en-US,en;q=0.9',
}
params = (
('q', 'python'),
('skip_price', 'false'),
)
response = requests.get('https://www.udemy.com/api-2.0/search-courses/', headers=headers, params=params, cookies=cookies)
ids = []
titles = []
durations = []
ratings = []
for a in response.json()['courses']:
title = a['title']
duration =int(a['estimated_content_length']) / 60
rating = a['rating']
id = str(a['id'])
titles.append(title)
ids.append(id)
durations.append(duration)
ratings.append(rating)
clean_ids = ','.join(ids)
params2 = (
('course_ids', clean_ids),
('fields/[pricing_result/]', 'price,discount_price,list_price,price_detail,price_serve_tracking_id'),
)
response = requests.get('https://www.udemy.com/api-2.0/pricing/', params=params2)
data = response.json()['courses']
prices = []
for a in ids:
price = response.json()['courses'][a]['price']['amount']
prices.append(price)
data = zip(titles, durations,ratings, prices)
for a in data:
print(a)
Output
('Learn Python Programming Masterclass', 56.53333333333333, 4.54487, 14.99)
('The Python Mega Course: Build 10 Real World Applications', 25.3, 4.51476, 16.99)
('Python for Beginners: Learn Python Programming (Python 3)', 2.8833333333333333, 4.4391, 17.99)
('The Python Bible™ | Everything You Need to Program in Python', 9.15, 4.64238, 17.99)
('Python for Absolute Beginners', 3.066666666666667, 4.42209, 14.99)
('The Modern Python 3 Bootcamp', 30.3, 4.64714, 16.99)
('Python for Finance: Investment Fundamentals & Data Analytics', 8.25, 4.52908, 12.99)
('The Complete Python Course | Learn Python by Doing', 35.31666666666667, 4.58885, 17.99)
('REST APIs with Flask and Python', 17.033333333333335, 4.61233, 12.99)
('Python for Financial Analysis and Algorithmic Trading', 16.916666666666668, 4.53173, 12.99)
('Python for Beginners with Examples', 4.25, 4.27316, 12.99)
('Python OOP : Four Pillars of OOP in Python 3 for Beginners', 2.6166666666666667, 4.46451, 12.99)
('Python Bootcamp 2020 Build 15 working Applications and Games', 32.13333333333333, 4.2519, 14.99)
('The Complete Python Masterclass: Learn Python From Scratch', 32.36666666666667, 4.39151, 16.99)
('Learn Python MADE EASY : A Concise Python Course in Python 3', 2.1166666666666667, 4.76601, 12.99)
('Complete Python Web Course: Build 8 Python Web Apps', 15.65, 4.37577, 13.99)
('Python for Excel: Use xlwings for Data Science and Finance', 16.116666666666667, 4.92293, 12.99)
('Python 3 Network Programming - Build 5 Network Applications', 12.216666666666667, 4.66143, 12.99)
('The Complete Python & PostgreSQL Developer Course', 21.833333333333332, 4.5664, 12.99)
('The Complete Python Programmer Bootcamp 2020', 13.233333333333333, 4.63859, 12.99)
Explanation
There are two ways to do this, here is re-engineering the requests which is the more efficient solution. To get the necessary information, you'll need to inspect the page and look at which HTTP requests give which information. You can do this through the network tools --> XHR when you inspect the page. You can see there are two requests that give you information. My suggestion would be look at the previews of the responses on the right hand side when you select the request. The first gives you the title, duration, price, ratings and the second request you need the id's of the courses to get the prices of the courses.
I usually copy the CURL of the HTTP requests the javascript invokes into curl.trillworks.com and this converts the necessary headers, parameters and cookies to python format.
In the first request, headers, cookies and parameters are required. THe second request, only requires the parameters.
The response you get is a json object. response.json() converts this into a python dictionary. You have to do abit of digging in this dictionary to get what you want. But for each item in response.json()['courses'] all the necessary data for each 'card' on the website is there. So we do a for loop around where the data sits in the dictionary we've created. I would play around the with response.json() till you get a feel for what the object gives you to understand the code.
The duration comes in minutes therefore I've done a quick convert to hours here. Also the id's need to be a string because in the second request we use them as parameters to get the necessary prices for the courses. We convert ids into a string and feed this as a parameter.
The second request then gives us the necessary prices, again you have to go digging in the dictionary object and I suggest you do this yourself to confirm that nested in that is the price.
The data we zip up to combine all the lists of data and then I've done a for loop to print it all. You could feed this into pandas if you wanted etc...
To get required data you need to send requests to appropriate API. For that you need to create Session:
import requests
s = requests.Session()
cookies = s.get('https://www.udemy.com').cookies
headers={"Referer": "https://www.udemy.com/courses/search/?q=python&skip_price=false"}
for page_counter in range(1, 500):
data = s.get('https://www.udemy.com/api-2.0/search-courses/?p={}&q=python&skip_price=false'.format(page_counter), cookies=cookies, headers=headers).json()
for course in data['courses']:
params = {'course_ids': [str(course['id']),],
'fields/[pricing_result/]': ['price',]}
title = course['title']
price = s.get('https://www.udemy.com/api-2.0/pricing/', params=params, cookies=cookies).json()['courses'][str(course['id'])]['price']['amount']
print({'title': title, 'price': price})

How do I declare and use a variable in the yaml file that is formatted for pyresttest?

So, a brief description of what I want, what my issue is, and what I have tried.
I want to declare and use a dictionary variable for my tests in pyrest, specifically for the [url, body] section so that I can conduct my POST tests targeting a specific endpoint and with a preformatted body.
Here is how mytest.yml file is structured:
- data:
- id: 63
- rate: 25
... a sizable set of field for reasons ...
- type: lab_test_authorization
- modified_at: ansible_date_time.datetime # Useful way to generate
- test:
- url: "some-valid-url/{the_url_question}" # data['special_key']
- method: 'POST'
- headers : {etc..etc}
- body: '{ "data": ${the_body_question} }' # data (the content)
Now the problem starts in my lack of understanding why (if true) does pyrest does not have support for dictionary mappings. I understand yaml supports these feature but am not sure if pyrest can parse through it. Knowing how to call and use dictionary variable in my url and body tags would be significantly helpful.
As of right now, if I try to convert my data Sequence into a data Dictionary, I will get an error stating:
yaml.parser.ParserError: while parsing a block mapping
in "<unicode string>", line 4, column 1:
data:
^
expected <block end>, but found '-'
in "<unicode string>", line 36, column 1:
- config:
I'm pretty sure there are gaps in my knowledge regarding how yaml and pyresttest interact with each other, so any insight would be greatly appreciated.

cant call curl from python3

I am trying to call this curl from python3. This, from bash, is working fine.
curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.1103/PhysRevLett.117.126802
yielding the expected result:
#article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J. K. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H. W.}, year={2016}, month={Sep}}
in python3, I am doing:
import subprocess
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
try:
subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
except ExplicitException:
print("DOI is not available")
self.Messages.on_warn_clicked("DOI is not given",
"Search google instead")
which is giving error:
<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>
whats going wrong here?
You have 3 problems here:
don't quote your arguments in subprocess, it already does that for you when necessary, since you pass the arguments and not the unsplitted command line (good practice, keep it on, but drop the unneccessary quoting).
then, subprocess.call does not allow to parse/store the output in python, which is problematic for number 3:
and last: your site answers with rubbish HTML (java stacktrace) randomly. This explains why you're getting different output in python, but you can get it in bash as well.
Problem #1
subprocess.call(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi])
should be
subprocess.call(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi])
Else, quotes are applied twice and your Accept: xxx argument has quotes around it, which is unexpected by curl
demo of the non-working quote part:
import subprocess,os
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
#### this is wrong because of the quoting ####
p = subprocess.Popen(["curl", "-LH", '"Accept: text/bibliography; style=bibtex"', doi],stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
[output,error] = p.communicate()
print(output)
result:
b' some stats then ... <html><body><h1>400 Bad request</h1>\nYour browser sent an invalid request.\n</body></html>\n\r\n'
Problems #2 and #3
I have implemented a retry mechanism which parses the output and retries until correct output is found:
import subprocess,os,sys
doi = "http://dx.doi.org/10.1103/PhysRevLett.117.126802"
while True:
p = subprocess.Popen(["curl", "-LH", 'Accept: text/bibliography; style=bibtex', doi],stdout=subprocess.PIPE)
[output,error] = p.communicate()
output = output.decode("latin-1")
if "java.util.concurrent.FutureTask.run" in output:
# site crashed when responding: junk HTML output: retry
sys.stderr.write("Wrong answer: retrying\n")
else:
print(output)
break
result:
Wrong answer: retrying <==== here the site throwed a big HTML exception output
#article{Chang_2016, title={Observation of the Quantum Anomalous Hall Insulator to Anderson Insulator Quantum Phase Transition and its Scaling Behavior}, volume={117}, ISSN={1079-7114}, url={http://dx.doi.org/10.1103/PhysRevLett.117.126802}, DOI={10.1103/physrevlett.117.126802}, number={12}, journal={Physical Review Letters}, publisher={American Physical Society (APS)}, author={Chang, Cui-Zu and Zhao, Weiwei and Li, Jian and Jain, J.âK. and Liu, Chaoxing and Moodera, Jagadeesh S. and Chan, Moses H.âW.}, year={2016}, month={Sep}}
So it works, it's just a site problem, but with my python wrapper you are able to re-submit the request until it yields the proper answer.

How to search tweets from an id to another id

I'm trying to get tweets using TwitterSearch in Python3.
So basically I want to get all tweets between these 2 IDs.
748843914254249984 ->760065085616250880
These 2 IDs are from the
Fri Jul 01 11:41:16 +0000 2016 to Mon Aug 01 10:50:12 +0000 2016
So here's the code I made.
crawl.py
#!/usr/bin/python3
# coding: utf-8
from TwitterSearch import *
import datetime
def crawl():
try:
tso = TwitterSearchOrder()
tso.set_keywords(["keyword"])
tso.set_since_id(748843914254249984)
tso.set_max_id(760065085616250880)
ACCESS_TOKEN = xxx
ACCESS_SECRET = xxx
CONSUMER_KEY = xxx
CONSUMER_SECRET = xxx
ts = TwitterSearch(
consumer_key = CONSUMER_KEY,
consumer_secret = CONSUMER_SECRET,
access_token = ACCESS_TOKEN,
access_token_secret = ACCESS_SECRET
)
for tweet in ts.search_tweets_iterable(tso):
print(tweet['id_str'], '-', tweet['created_at'])
except TwitterSearchException as e:
print( e )
if __name__ == '__main__':
crawl()
I'm not very familiar with Twitter API and searching with it. But this code should do the job.
But it's giving :
760058064816988160 - Mon Aug 01 10:22:18 +0000 2016
[...]
760065085616250880 - Mon Aug 01 10:50:12 +0000 2016
Many, many times... Like I got the same lines over and over again instead of getting everything between my two IDs.
So I'm not getting any of the July tweets, any idea why ?
TL;DR
Remove the tso.set_max_id(760065085616250880) line.
Explanation (as far as I understand it)
I have found your problem in the TwitterSearch Docs:
"The only parameter with a default value is count with 100. This is because it is the maximum of tweets returned by this very Twitter API endpoint."
If I check this in your code by creating a search URL, I get:
tso.create_search_url()
#?q=Vuitton&since_id=748843914254249984&count=100&max_id=760065085616250880
which contains count=100 (meaning it will get the first page of 100 tweets). And, in contrast with removing the set_since_id and set_max_id which also has count=100 and retrieves many more tweets, it stops at 100 tweets.
set_since_id without set_max_id works, the other way around doesn't. So removing the max_id=760065085616250880 from the search URL resulted in the results you want.
If anyone can explain why set_max_id is not working along, please edit my answer.

Resources