Databricks DataLakeFileClient Returns Error - azure

I have a databricks notebook running every 5 mins, part of the functionality is to connect to a file in Azure Data Lake Storage Gen2 (ADLS Gen2).
I get the following error in the code, but it seems to have "come out of nowhere" as the process was previously working fine. the "file = " part is written by me, all the parameters are as expected and matching the correct file names/containers and do exist in the data lake.
---> 92 file = DataLakeFileClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName="+storage_account_name+";AccountKey=" + storage_account_access_key,
93 file_system_name=azure_container, file_path=location_to_write)
94
/databricks/python/lib/python3.8/site-packages/azure/storage/filedatalake/_data_lake_file_client.py in from_connection_string(cls, conn_str, file_system_name, file_path, credential, **kwargs)
116 :rtype ~azure.storage.filedatalake.DataLakeFileClient
117 """
--> 118 account_url, _, credential = parse_connection_str(conn_str, credential, 'dfs')
119 return cls(
120 account_url, file_system_name=file_system_name, file_path=file_path,
/databricks/python/lib/python3.8/site-packages/azure/storage/filedatalake/_shared/base_client.py in parse_connection_str(conn_str, credential, service)
402 if service == "dfs":
403 primary = primary.replace(".blob.", ".dfs.")
--> 404 secondary = secondary.replace(".blob.", ".dfs.")
405 return primary, secondary, credential
Any thoughts/help? The actual error is in the base_client.py code, but I don't even know what "secondary" is supposed to be and why there would be an error there.

For some reason, after restarting the cluster, something changed and the following "endpoint suffix" was required for this to continue working, couldn't find any docs on why it would work without this, but until a few days ago, it had always worked:
"DefaultEndpointsProtocol=https;AccountName="+storage_account_name+";AccountKey="+storage_account_access_key+";EndpointSuffix=core.windows.net"

Related

Create an AzureBlobDatastore() with SDK-V2

I am trying to create an AzureBlobDatastore() via the azure-sdk-v2. Previously, I successfully managed to perform the same operation via azure-sdk-v1 (from tutorial link in the next paragraph).
I am following this tutorial : https://learn.microsoft.com/en-us/azure/machine-learning/migrate-to-v2-resource-datastore#create-a-datastore-from-an-azure-blob-container-via-account_key in order to set up/create my AzureBlobDatastore().
This is the code that I am using (like in the tutorial, only updating the parameters in MLClient.from_config() (if I don't use the credential parameter I get an error stating that the parameter is empty):
ml_client = MLClient.from_config(credential=DefaultAzureCredential(),
path="./workspace_config_sdk2.json")
store = AzureBlobDatastore(
name="azureml_sdk2_blob",
description="Datastore created with sdkv2",
account_name=storage_account_name,
container_name=container_name,
protocol="wasbs",
credentials={
"account_key": "..my_account_key.."
},
)
ml_client.create_or_update(store)
I get the following error:
AttributeError: 'dict' object has no attribute
'_to_datastore_rest_object'
Note that the workspace_config_sdk2.json config has the following scheme:
{
"subscription_id": "...",
"resource_group": "...",
"workspace_name": "..."
}
How can I solve this error?
EDIT: On investigating the issue, it seems that it falls back to some code in "azure\ai\ml\entities\_datastore\azure_storage.py"
175 def _to_rest_object(self) -> DatastoreData:
176 blob_ds = RestAzureBlobDatastore(
177 account_name=self.account_name,
178 container_name=self.container_name,
--> 179 credentials=self.credentials._to_datastore_rest_object(),
180 endpoint=self.endpoint,
181 protocol=self.protocol,
182 tags=self.tags,
183 description=self.description,
184 )
185 return DatastoreData(properties=blob_ds)
AttributeError: 'dict' object has no attribute '_to_datastore_rest_object'

Using Python-pptx, what conditions could a PowerPoint have that give KeyError?

I have a PowerPoint that I would like to open, amend, and save as a different filename. However, I'm getting a KeyError.
I tried this code with a blank PowerPoint presentation and it works perfectly. However, when I use the code to ope an existing PowerPoint presentation and try to run the same code, I get a KeyError.
KeyError: "There is no item named 'ppt/slides/NULL' in the archive"
#Replace Source Text
import re
#s = "string. With. Punctuation?"
#s = re.sub(r'[^\w\s]','',s)
search_str = '{{{FILTER}}}'
repl_str = re.sub(r'[^\w\s]','',(str(list(dashboard_filter2.values()))))
ppt = Presentation('HispPres1.pptx')
for slide in ppt.slides:
for shape in slide.shapes:
if shape.has_text_frame:
shape.text = shape.text.replace(search_str, repl_str)
ppt.save('HispPresSourceUpdate.pptx')
I expect to have the existing PowerPoint amended by finding all the instances of {{{FILTER}}} and replacing it with the value listed. However, it looks like there's a problem using my existing PowerPoint presentation. I don't have this issue with a blank presentation.
So, I'm wondering what would cause an existing PowerPoint presentation to raise an error??? I plan on making several "templates" to start with and really need to know if there are any hardfast rules to adhere to.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-42-41deffabe2f9> in <module>()
7 search_str = '{{{FILTER}}}'
8 repl_str = re.sub(r'[^\w\s]','',(str(list(dashboard_filter2.values()))))
----> 9 ppt = Presentation('HispPres1.pptx')
10
11 for slide in ppt.slides:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\api.py in Presentation(pptx)
28 pptx = _default_pptx_path()
29
---> 30 presentation_part = Package.open(pptx).main_document_part
31
32 if not _is_pptx_package(presentation_part):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\package.py in open(cls, pkg_file)
120 *pkg_file*.
121 """
--> 122 pkg_reader = PackageReader.from_file(pkg_file)
123 package = cls()
124 Unmarshaller.unmarshal(pkg_reader, package, PartFactory)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\pkgreader.py in from_file(pkg_file)
34 pkg_srels = PackageReader._srels_for(phys_reader, PACKAGE_URI)
35 sparts = PackageReader._load_serialized_parts(
---> 36 phys_reader, pkg_srels, content_types
37 )
38 phys_reader.close()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\pkgreader.py in _load_serialized_parts(phys_reader, pkg_srels, content_types)
67 sparts = []
68 part_walker = PackageReader._walk_phys_parts(phys_reader, pkg_srels)
---> 69 for partname, blob, srels in part_walker:
70 content_type = content_types[partname]
71 spart = _SerializedPart(partname, content_type, blob, srels)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)
102 yield (partname, blob, part_srels)
103 for partname, blob, srels in PackageReader._walk_phys_parts(
--> 104 phys_reader, part_srels, visited_partnames):
105 yield (partname, blob, srels)
106
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)
102 yield (partname, blob, part_srels)
103 for partname, blob, srels in PackageReader._walk_phys_parts(
--> 104 phys_reader, part_srels, visited_partnames):
105 yield (partname, blob, srels)
106
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\pkgreader.py in _walk_phys_parts(phys_reader, srels, visited_partnames)
99 visited_partnames.append(partname)
100 part_srels = PackageReader._srels_for(phys_reader, partname)
--> 101 blob = phys_reader.blob_for(partname)
102 yield (partname, blob, part_srels)
103 for partname, blob, srels in PackageReader._walk_phys_parts(
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pptx\opc\phys_pkg.py in blob_for(self, pack_uri)
107 matching member is present in zip archive.
108 """
--> 109 return self._zipf.read(pack_uri.membername)
110
111 def close(self):
~\AppData\Local\Continuum\anaconda3\lib\zipfile.py in read(self, name, pwd)
1312 def read(self, name, pwd=None):
1313 """Return file bytes (as a string) for name."""
-> 1314 with self.open(name, "r", pwd) as fp:
1315 return fp.read()
1316
~\AppData\Local\Continuum\anaconda3\lib\zipfile.py in open(self, name, mode, pwd, force_zip64)
1350 else:
1351 # Get info object for name
-> 1352 zinfo = self.getinfo(name)
1353
1354 if mode == 'w':
~\AppData\Local\Continuum\anaconda3\lib\zipfile.py in getinfo(self, name)
1279 if info is None:
1280 raise KeyError(
-> 1281 'There is no item named %r in the archive' % name)
1282
1283 return info
KeyError: "There is no item named 'ppt/slides/NULL' in the archive"
Yeah, this is a bit of a thorny problem. The spec doesn't provide for a "broken" relationship (one that refers to a package-part that doesn't exist), but at least one library (Java-based if I recall correctly) does not clean up relationships properly in some cases, perhaps a slide delete operation in this case.
The gist of the explanation is this:
A PPTX file is an Open Packaging Convention (OPC) package. DOCX and XLSX files are other examples of OPC packages.
An OPC package is a Zip archive of multiple parts (official term, perhaps package-part more precisely). Each part is essentially a file, so something like slide1.xml, and they are arranged in a "directory structure".
One part can be related to other parts. For example, a presentation part (presentation.xml) is related to each of its slide parts. These relationships are stored in a file like presentation.xml.rels. The relationship is keyed with a string like "rId3" and identifies the related part by its path in the package.
One part refers to another using the key in its XML (e.g. <p:sldId r:id="rId3"/>). The target part is "looked-up" in the .rels file to find its path and get to it that way.
The KeyError you're getting means that the .rels file has a <Relationship> element referring to the part ppt/slides/NULL (instead of something like ppt/slides/slide3.xml). Since there is no such part in the package, the lookup fails.
If you open the "template" file in PowerPoint and save it, I think it will repair itself. You might need to rearrange a slide and move it back to jostle that part of the code.
If that doesn't work, you'll need to patch the package by hand, removing any broken references and relationships. opc-diag can be handy for that.
You can clean the PPTX from the dangling relations through:
File -> Info -> Check for Issues -> Inspect Document.
Clean up, save, replay python script.
So, thanks Scanny for the help. You're exactly right. The lookup was looking for ppt/slides/slide#.xml and it wasn't finding a relationship for it. The reason is because the relationships are coded as just slides/slide#.xml (without ppt/). I did get into the opc-diag to see what I could do there, but I found an easy fix.
My previous code had a line that said for slide in ppt.slides: and this was the error: KeyError: "There is no item named 'ppt/slides/NULL' in the archive". When browsed the PresentationML using opc-diag, I found that the relationship was set up like this: <Relationship Id="x" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide1.xml"/>\n. The relationship does not include ppt.
So, to get rid of that lookup and have it match the way PowerPoint stores the slide relationships, I changed these lines:
ppt = Presentation('HispPres1.pptx')
for slide in ppt.slides:
to this
ppt = Presentation('HispPres1.pptx')
slides = ppt.slides
for slide in slides:

Python 3: Requests response.iter_content: ChunkedEncodingError

I am using requests stream for preforming a 'GET' download of a remote very large CSVs, then chunking response using response.iter_content(). This has been working for multiple data providers.
However, for one remote data provider, when using response.iter_content(), occasionally I am getting a ChunkedEncodingError, specifically:
ChunkedEncodingError: (
ProtocolError(
'Connection broken: IncompleteRead(921 bytes read, 103 more expected)',
IncompleteRead(921 bytes read, 103 more expected)),
)
Here is the Python 3 code, and I would like to know of an alternative to resolving this chunking exception problem:
tmp_csv_chunk_sum = 0
with open(
file=tmp_csv_file_path,
mode='wb',
encoding=encoding_write
) as csv_file_wb:
try:
for chunk in response.iter_content(chunk_size=8192):
if not chunk:
break
tmp_csv_chunk_sum += 8192
csv_file_wb.write(chunk)
csv_file_wb.flush()
os.fsync(csv_file_wb.fileno())
except Exception as ex:
self.logger.error(
"Request CSV Download: Exception",
extra={
'error_details': str(ex),
'chunk_total_sum': tmp_csv_chunk_sum
}
)
raise
I truly appreciate assistance, Thank you

Zurb Foundation for Apps - CLI Fails2

At the risk of posting a duplicate, I am new here and don't have a rating yet so it wouldn't let me comment on the only relevant similar question I did find here:
Zurb Foundation for Apps - CLI Fails.
Zurb Foundation for Apps - CLI Fails
However I tried the answer there and I still get the same fail.
My message is :
(I don't have a reputation so I can't post images "!##"):
but it is essentially the same as the other post except mine mentions line 118 of foundationCLI.js where theirs notes line 139. Also the answer said to fix line 97 but in mine that code is on line 99.
92 // NEW
93 // Clones the Foundation for Apps template and installs dependencies
94 module.exports.new = function(args, options) {
95 var projectName = args[0];
96 var gitClone = ['git', 'clone', 'https://github.com/zurb/foundation-apps-template.git', args[0]];
97 var npmInstall = [npm, 'install'];
98 var bowerInstall = [bower, 'install'];
99 var bundleInstall = [bundle.bat];
100 if (isRoot()) bowerInstall.push('--allow-root');
101
102 // Show help screen if the user didn't enter a project name
103 if (typeof projectName === 'undefined') {
104 this.help('new');
105 process.exit();
106 }
107
108 yeti([
109 'Thanks for using Foundation for Apps!',
110 '-------------------------------------',
111 'Let\'s set up a new project.',
112 'It shouldn\'t take more than a minute.'
113 ]);
114
115 // Clone the template repo
116 process.stdout.write("\nDownloading the Foundation for Apps template...".cyan);
117 exec(gitClone, function(err, out, code) {
118 if (err instanceof Error) throw err;
119
120 process.stdout.write([
121 "\nDone downloading!".green,
122 "\n\nInstalling dependencies...".cyan,
123 "\n"
124 ].join(''));
I also posted an error log here
https://github.com/npm/npm/issues/7024
yesterday, as directed in the following error message: (which I am unable to post the image of "!##").
But I have yet to receive a response there.
Any idea how I can get past this so I can start an app?
Thanks, A
You may need to also install the git command-line client. On line 117, the foundation-cli.js is trying to run git clone and this is failing.
Could you please run
git --version
and paste the text (not image) of the output you see?
If you have installed git already (e.g., because you have Github for Windows < https://windows.github.com/ > ) then you may need to use the Git Shell shortcut or close/re-open your command prompt window in order to use git on the command line.
Once you've installed git and closed/reopened your shell, try the command
foundation-apps new myApp again.

Components.interfaces.nsIProcess2 in Firefox 3.6 -- where did it go?

I am beta testing an application that includes a Firefox extension as one component. It was originally deployed when FF3.5.5 was the latest version, and survived 3.5.6 and 3.5.7. However on FF3.6 I'm getting the following in my error console:
Warning: reference to undefined property Components.interfaces.nsIProcess2
Source file: chrome://overthewall/content/otwhelper.js
Line: 55
Error: Component returned failure code: 0x80570018 (NS_ERROR_XPC_BAD_IID)
[nsIJSCID.createInstance]
Source file: chrome://overthewall/content/otwhelper.js
Line: 55
The function throwing the error is:
48 function otwRunHelper(cmd, aCallback) {
49 var file =
50 Components.classes["#mozilla.org/file/local;1"].
51 createInstance(Components.interfaces.nsILocalFile);
52 file.initWithPath(otwRegInstallDir+'otwhelper.exe');
53
54 otwProcess = Components.classes["#mozilla.org/process/util;1"]
55 .createInstance(Components.interfaces.nsIProcess2);
56
57 otwProcess.init(file);
58 var params = new Array();
59 params = cmd.split(' ');
60
61 otwNextCallback = aCallback;
62 otwObserver = new otwHelperProcess();
63 otwProcess.runAsync(params, params.length, otwObserver, false);
64 }
As you can see, all this function does is run an external EXE helper file (located by a registry key) with some command line parameters and sets up an Observer to asynchronously wait for a response and process the Exit code.
The offending line implies that Components.interfaces.nsIProcess2 is no longer defined in FF3.6. Where did it go? I can't find anything in the Mozilla documentation indicating that it has been changed in the latest release.
The method on nsIProcess2 was moved to nsIProcess. For your code to work in both versions, change this line:
otwProcess = Components.classes["#mozilla.org/process/util;1"]
.createInstance(Components.interfaces.nsIProcess2);
to this:
otwProcess = Components.classes["#mozilla.org/process/util;1"]
.createInstance(Components.interfaces.nsIProcess2 || Components.interfaces.nsIProcess);
You will still get the warning, but the error will go away, and your code will work just fine in both versions. You could also store the interface iid in a variable and use the variable:
let iid = ("nsIProcess2" in Components.interfaces) ?
Components.interfaces.nsIProcess2 :
Components.interfaces.nsIProcess;
otwProcess = Components.classes["#mozilla.org/process/util;1"]
.createInstance(iid);

Resources