WebDriverException: Message: unknown error: bad inspector message error while printing HTML content using ChromeDriver Chrome through Selenium Python - python-3.x

I am scraping some HTML content..
for i, c in enumerate(cards[75:77]):
print(i)
a = c.find_element_by_class_name("influencer-stagename")
print(a.get_attribute('innerHTML'))
Works fine for all records except the 76th one. Output before error...
0
b'<a class="influencer-analytics-link" href="/influencers/sophiewilling"><h5><span>SOPHIE WILLING</span></h5></a>'
1
b'<a class="influencer-analytics-link" href="/influencers/ferntaylorr"><h5><span>Fern Taylor.</span></h5></a>'
2
b'<a class="influencer-analytics-link" href="/influencers/officialshaniceslatter"><h5><span>Shanice Slatter</span></h5></a>'
3
Stacktrace...
> -------------------------------------------------------------------------
WebDriverException Traceback (most recent call last) <ipython-input-484-0a80d1af1568> in <module>
3 #print(c.find_element_by_class_name("influencer-stagename").text)
4 a = c.find_element_by_class_name("influencer-stagename")
----> 5 print(a.get_attribute('innerHTML').encode('ascii', 'ignore'))
~/anaconda3/envs/py3-env/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in get_attribute(self, name)
141 self, name)
142 else:
--> 143 resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
144 attributeValue = resp.get('value')
145 if attributeValue is not None:
~/anaconda3/envs/py3-env/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in _execute(self, command, params)
631 params = {}
632 params['id'] = self._id
--> 633 return self._parent.execute(command, params)
634
635 def find_element(self, by=By.ID, value=None):
~/anaconda3/envs/py3-env/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
319 response = self.command_executor.execute(driver_command, params)
320 if response:
--> 321 self.error_handler.check_response(response)
322 response['value'] = self._unwrap_value(
323 response.get('value', None))
~/anaconda3/envs/py3-env/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
240 alert_text = value['alert'].get('text')
241 raise exception_class(message, screen, stacktrace, alert_text)
--> 242 raise exception_class(message, screen, stacktrace)
243
244 def _value_or_default(self, obj, key, default):
WebDriverException: Message: unknown error: bad inspector message: {"id":110297,"result":{"result":{"type":"object","value":{"status":0,"value":"<a class=\"influencer-analytics-link\" href=\"/influencers/bookishemily\"><h5><span>Emily | 18 | GB | Student\uD83C...</span></h5></a>"}}}} (Session info: chrome=75.0.3770.100) (Driver info: chromedriver=2.40.565386 (45a059dc425e08165f9a10324bd1380cc13ca363),platform=Mac OS X 10.14.0 x86_64)
I suspect it is an invalid character in
value":"Emily | 18 | GB | Student\uD83C..."
Specifically I suspect "\uD83C"
Adding
.encode("utf-8") OR .encode('ascii', 'ignore')
to the second print statement changes nothing.
Any thoughts on how to solve this??
UPDATE: The problem is with Emoji characters. I have found 3 examples to far and each has an emoji (pink flower 🌸, russian flag 🇷🇺 and swirling leaves 🍃). If I edit them out with Chrome inspector my code runs fine but this is not a solution that works at scale

This error message...
WebDriverException: Message: unknown error: bad inspector message: {"id":110297,"result":{"result":{"type":"object","value":{"status":0,"value":"<a class=\"influencer-analytics-link\" href=\"/influencers/bookishemily\"><h5><span>Emily | 18 | GB | Student\uD83C...</span></h5></a>"}}}} (Session info: chrome=75.0.3770.100) (Driver info: chromedriver=2.40.565386 (45a059dc425e08165f9a10324bd1380cc13ca363),platform=Mac OS X 10.14.0 x86_64)
...implies that the ChromeDriver was unable to parse some non-UTF-8 characters due to JSON encoding/decoding issue.
Deep Dive
As per the discussion in Issue 723592: 'Bad inspector message' errors when running URL web-platform-tests via webdriver John Chen (Owner - WebDriver for Google Chrome) in his comment mentioned:
A JSON encoding/decoding issue caused the "Bad inspector message" error reported at https://travis-ci.org/w3c/web-platform-tests/jobs/232845351. Part of the error message from part 1 contains an invalid Unicode character \uFDD0 (from https://github.com/w3c/web-platform-tests/blob/34435a4/url/urltestdata.json#L3596). The JSON encoder inside Chrome didn't detect such error, and passed it through in the JSON blob sent to ChromeDriver. ChromeDriver uses base/json/json_parser.cc to parse the JSON string. This parser does a more thorough error detection, notices that \uFDD0 is an invalid character, and reports an error. I think our JSON encoder and decoder should have exactly the same amount of error checking. It's problematic that the encoder can create a blob that is rejected by the decoder.
Analysis
John Chen (Owner - WebDriver for Google Chrome) further added:
The JSON encoding happens in protocol layout of DevTools, just before the result is sent back to ChromeDriver. The relevant code is in https://cs.chromium.org/chromium/src/out/Debug/gen/v8/src/inspector/protocol/Protocol.cpp. In particular, escapeStringForJSON function is responsible for encoding strings. It's actually quite conservative. Anything above 126 is encoded in \uXXXX format. (Note that Protocol.cpp is a generated file. The real source is https://cs.chromium.org/chromium/src/v8/third_party/inspector_protocol/lib/Values_cpp.template.)
The error occurs in the JSON parser used by ChromeDriver. The decoding of \uXXXX sequence happens at https://cs.chromium.org/chromium/src/base/json/json_parser.cc?l=564 and https://cs.chromium.org/chromium/src/base/json/json_parser.cc?l=670. After decoding an escape sequence, the decoder rejects anything that's not a valid Unicode character.
I noticed that there was a recent change to prevent a JSON encoder from emitting invalid Unicode code point: https://crrev.com/478900. Unfortunately it's not the JSON encoder used by the code involved in this bug, so it doesn't help us directly, but it's an indication that we're not the only ones affected by this type of issue.
Solution
This issue was addressed replacing invalid UTF-16 escape sequences when decoding invalid UTF strings in chromedriver as Web platform tests may use ECMAScript strings which aren't necessarily utf-16 characters through this revision / commit.
So a quick solution would be to ensure the following and re-execute your tests:
Selenium is upgraded to current levels Version 3.141.59.
ChromeDriver is updated to current ChromeDriver v79.0.3945.36 level.
Chrome is updated to current Chrome Version 79.0 level. (as per ChromeDriver v79.0 release notes)
Alternative
As an alternative you can use GeckoDriver / Firefox combination and you can find a relevant discussion in Chromedriver only supports characters in the BMP error while sending Emoji with ChromeDriver Chrome using Selenium Python to Tkinter's label() textbox

Related

My ElasticsearchDocumentStore() is not initialized

from haystack.document_stores import ElasticsearchDocumentStore
doc_store= ElasticsearchDocumentStore(host='localhost', username='', password='', index='document')
I'm trying to build Question Answering system and using Haystack document store for indexing the document. But i'm not able to initialize the ElasticsearchDocumentStore module from haystack. While trying to create new index in document store from ElasticSearch Document store, but it throws an error saying: "Mapping definition for [embedding] has unsupported parameters"
WARNING - elasticsearch - PUT http://localhost:9200/document [status:400 request:0.008s]
---------------------------------------------------------------------------
RequestError Traceback (most recent call last)
File ~\anaconda3\lib\site-packages\elasticsearch\connection\base.py:315, in Connection._raise_error(self, status_code, raw_data)
312 except (ValueError, TypeError) as err:
313 logger.warning("Undecodable raw error response from server: %s", err)
--> 315 raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
316 status_code, error_message, additional_info
317 )
[RequestError: RequestError(400, 'mapper_parsing_exception', 'Mapping definition for [embedding] has unsupported parameters: [dims : 768]')][1]
Dense vector support has been introduced to Elasticsearch in version 7.0 as preview and in version 7.3 as release feature (see https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch). You'd need to upgrade your Elasticsearch instance in order to use it with Haystack if you're using an older version.

ClientError: An error occurred (InvalidTextEncoding) when calling the SelectObjectContent operation: UTF-8 encoding is required. reading gzip file

I am getting the above error in my code. encoding=latin-1 needs to be included as a parameter somewhere in select-object-content. Since I am new to this, I am not sure, where to add it.
Can anyone help me in this?
Code:
client = boto3.client('s3',aws_access_key_id,aws_secret_access_key',region_name)
resp = client.select_object_content(
Bucket='mybucket',
Key='path_to_file/file_name.gz',
ExpressionType='SQL',
Expression=query,
InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}, 'CompressionType': compressionType},
OutputSerialization = {'CSV': {}},
)
Traceback:
ClientError Traceback (most recent call last)
C:\path/3649752754.py in <module>
78 Expression=SQL,
79 InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}, 'CompressionType': compression},
---> 80 OutputSerialization = {'CSV': {}},
81 )
82
ClientError: An error occurred (InvalidTextEncoding) when calling the SelectObjectContent operation: UTF-8 encoding is required. The text encoding error was found near byte 90,112.
You need to save your CSV file with UTF-8 encoding. For example, with Notepad++ or Excel->Save As->Select from the dropdown.

Using Python-pptx, what conditions could a PowerPoint have that give KeyError?

I have a PowerPoint that I would like to open, amend, and save as a different filename. However, I'm getting a KeyError.
I tried this code with a blank PowerPoint presentation and it works perfectly. However, when I use the code to ope an existing PowerPoint presentation and try to run the same code, I get a KeyError.
KeyError: "There is no item named 'ppt/slides/NULL' in the archive"
#Replace Source Text
import re
#s = "string. With. Punctuation?"
#s = re.sub(r'[^\w\s]','',s)
search_str = '{{{FILTER}}}'
repl_str = re.sub(r'[^\w\s]','',(str(list(dashboard_filter2.values()))))
ppt = Presentation('HispPres1.pptx')
for slide in ppt.slides:
for shape in slide.shapes:
if shape.has_text_frame:
shape.text = shape.text.replace(search_str, repl_str)
ppt.save('HispPresSourceUpdate.pptx')
I expect to have the existing PowerPoint amended by finding all the instances of {{{FILTER}}} and replacing it with the value listed. However, it looks like there's a problem using my existing PowerPoint presentation. I don't have this issue with a blank presentation.
So, I'm wondering what would cause an existing PowerPoint presentation to raise an error??? I plan on making several "templates" to start with and really need to know if there are any hardfast rules to adhere to.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-42-41deffabe2f9> in <module>()
7 search_str = '{{{FILTER}}}'
8 repl_str = re.sub(r'[^\w\s]','',(str(list(dashboard_filter2.values()))))
----> 9 ppt = Presentation('HispPres1.pptx')
10
11 for slide in ppt.slides:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\api.py in Presentation(pptx)
28 pptx = _default_pptx_path()
29
---> 30 presentation_part = Package.open(pptx).main_document_part
31
32 if not _is_pptx_package(presentation_part):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\package.py in open(cls, pkg_file)
120 *pkg_file*.
121 """
--> 122 pkg_reader = PackageReader.from_file(pkg_file)
123 package = cls()
124 Unmarshaller.unmarshal(pkg_reader, package, PartFactory)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\pkgreader.py in from_file(pkg_file)
34 pkg_srels = PackageReader._srels_for(phys_reader, PACKAGE_URI)
35 sparts = PackageReader._load_serialized_parts(
---> 36 phys_reader, pkg_srels, content_types
37 )
38 phys_reader.close()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\pkgreader.py in _load_serialized_parts(phys_reader, pkg_srels, content_types)
67 sparts = []
68 part_walker = PackageReader._walk_phys_parts(phys_reader, pkg_srels)
---> 69 for partname, blob, srels in part_walker:
70 content_type = content_types[partname]
71 spart = _SerializedPart(partname, content_type, blob, srels)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)
102 yield (partname, blob, part_srels)
103 for partname, blob, srels in PackageReader._walk_phys_parts(
--> 104 phys_reader, part_srels, visited_partnames):
105 yield (partname, blob, srels)
106
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)
102 yield (partname, blob, part_srels)
103 for partname, blob, srels in PackageReader._walk_phys_parts(
--> 104 phys_reader, part_srels, visited_partnames):
105 yield (partname, blob, srels)
106
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)
99 visited_partnames.append(partname)
100 part_srels = PackageReader._srels_for(phys_reader, partname)
--> 101 blob = phys_reader.blob_for(partname)
102 yield (partname, blob, part_srels)
103 for partname, blob, srels in PackageReader._walk_phys_parts(
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\phys_pkg.py in blob_for(self, pack_uri)
107 matching member is present in zip archive.
108 """
--> 109 return self._zipf.read(pack_uri.membername)
110
111 def close(self):
~\AppData\Local\Continuum\anaconda3\lib\zipfile.py in read(self, name, pwd)
1312 def read(self, name, pwd=None):
1313 """Return file bytes (as a string) for name."""
-> 1314 with self.open(name, "r", pwd) as fp:
1315 return fp.read()
1316
~\AppData\Local\Continuum\anaconda3\lib\zipfile.py in open(self, name, mode, pwd, force_zip64)
1350 else:
1351 # Get info object for name
-> 1352 zinfo = self.getinfo(name)
1353
1354 if mode == 'w':
~\AppData\Local\Continuum\anaconda3\lib\zipfile.py in getinfo(self, name)
1279 if info is None:
1280 raise KeyError(
-> 1281 'There is no item named %r in the archive' % name)
1282
1283 return info
KeyError: "There is no item named 'ppt/slides/NULL' in the archive"
Yeah, this is a bit of a thorny problem. The spec doesn't provide for a "broken" relationship (one that refers to a package-part that doesn't exist), but at least one library (Java-based if I recall correctly) does not clean up relationships properly in some cases, perhaps a slide delete operation in this case.
The gist of the explanation is this:
A PPTX file is an Open Packaging Convention (OPC) package. DOCX and XLSX files are other examples of OPC packages.
An OPC package is a Zip archive of multiple parts (official term, perhaps package-part more precisely). Each part is essentially a file, so something like slide1.xml, and they are arranged in a "directory structure".
One part can be related to other parts. For example, a presentation part (presentation.xml) is related to each of its slide parts. These relationships are stored in a file like presentation.xml.rels. The relationship is keyed with a string like "rId3" and identifies the related part by its path in the package.
One part refers to another using the key in its XML (e.g. <p:sldId r:id="rId3"/>). The target part is "looked-up" in the .rels file to find its path and get to it that way.
The KeyError you're getting means that the .rels file has a <Relationship> element referring to the part ppt/slides/NULL (instead of something like ppt/slides/slide3.xml). Since there is no such part in the package, the lookup fails.
If you open the "template" file in PowerPoint and save it, I think it will repair itself. You might need to rearrange a slide and move it back to jostle that part of the code.
If that doesn't work, you'll need to patch the package by hand, removing any broken references and relationships. opc-diag can be handy for that.
You can clean the PPTX from the dangling relations through:
File -> Info -> Check for Issues -> Inspect Document.
Clean up, save, replay python script.
So, thanks Scanny for the help. You're exactly right. The lookup was looking for ppt/slides/slide#.xml and it wasn't finding a relationship for it. The reason is because the relationships are coded as just slides/slide#.xml (without ppt/). I did get into the opc-diag to see what I could do there, but I found an easy fix.
My previous code had a line that said for slide in ppt.slides: and this was the error: KeyError: "There is no item named 'ppt/slides/NULL' in the archive". When browsed the PresentationML using opc-diag, I found that the relationship was set up like this: <Relationship Id="x" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide1.xml"/>\n. The relationship does not include ppt.
So, to get rid of that lookup and have it match the way PowerPoint stores the slide relationships, I changed these lines:
ppt = Presentation('HispPres1.pptx')
for slide in ppt.slides:
to this
ppt = Presentation('HispPres1.pptx')
slides = ppt.slides
for slide in slides:

geograpy3 library for extracting the locations in the text, gives UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 276

I am trying to extract location from the text using the geography3 library in python.
import geograpy
address = 'Jersey City New Jersey 07306'
places = geograpy.get_place_context(text = address)
To which i get the below error UnicodeDecodeError:
~\Anaconda\lib\site-packages\geograpy\places.py in populate_db(self)
28 with open(cur_dir + "/data/GeoLite2-City-Locations.csv") as info:
29 reader = csv.reader(info)
---> 30 for row in reader:
31 print(row)
32 cur.execute("INSERT INTO cities VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?);", row)
~\Anaconda\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return
codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 276: character maps to <undefined>
After some investigation, i tried to modify the places.py file and added encoding = "utf-8" in the line -----> 30
with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:
But it still gives me the same error.
I also tried to save the GeoLite2-City-Locations.csv on my Desktop and then tried to read it using the same code.
with open("GeoLite2-City-Locations.csv", encoding="utf-8") as info:
reader = csv.reader(info)
for row in reader:
print(row)
which works absolutely fine and prints all the rows of the GeoLite2-City-Locations.csv.
I fail to understand the problem!
As a committer of geograpy3 to reproduce your issue i added a test to the most recent geograpy3 https://github.com/somnathrakshit/geograpy3/blob/master/tests/test_extractor.py:
with the result:
['Jersey', 'City'
so you might simply switch to the latest version.
def testStackoverflow54077973(self):
'''
see https://stackoverflow.com/questions/54077973/geograpy3-library-for-extracting-the-locations-in-the-text-gives-unicodedecodee
'''
address = 'Jersey City New Jersey 07306'
e=Extractor(text=address)
e.find_entities()
self.check(e.places,['Jersey','City'])
you should specify encoding encoding='utf-8' like you did, although in correct_country_mispelling(self, s) method in places.py (49 row)
After some investigation, this is a Windows vs Linux error in some cases. Even using the
with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:
I could not resolve the error on my Windows computer. However, the exact same code ran fine on a Linux computer I use as well. I looked in the the City-Locations.csv file on Linux, and it appeared LibreOffice automatically encoded and/or resolved all the characters. Where as looking at the same file in Excel, I would still have all the funky characters causing the error. Excel for some reason insists on keeping the odd characters.

Python 3: Requests response.iter_content: ChunkedEncodingError

I am using requests stream for preforming a 'GET' download of a remote very large CSVs, then chunking response using response.iter_content(). This has been working for multiple data providers.
However, for one remote data provider, when using response.iter_content(), occasionally I am getting a ChunkedEncodingError, specifically:
ChunkedEncodingError: (
ProtocolError(
'Connection broken: IncompleteRead(921 bytes read, 103 more expected)',
IncompleteRead(921 bytes read, 103 more expected)),
)
Here is the Python 3 code, and I would like to know of an alternative to resolving this chunking exception problem:
tmp_csv_chunk_sum = 0
with open(
file=tmp_csv_file_path,
mode='wb',
encoding=encoding_write
) as csv_file_wb:
try:
for chunk in response.iter_content(chunk_size=8192):
if not chunk:
break
tmp_csv_chunk_sum += 8192
csv_file_wb.write(chunk)
csv_file_wb.flush()
os.fsync(csv_file_wb.fileno())
except Exception as ex:
self.logger.error(
"Request CSV Download: Exception",
extra={
'error_details': str(ex),
'chunk_total_sum': tmp_csv_chunk_sum
}
)
raise
I truly appreciate assistance, Thank you

Resources