Get auto-generated OutputFileDatasetConfig destination - azure-machine-learning-service

From the OutputFileDatasetConfig documentation for the destination class member,
If set to None, we will copy the output to the workspaceblobstore datastore, under the path /dataset/{run-id}/{output-name}
Given I have the handle to such OutputFileDatasetConfig with destination set to None, how can I get the generated destination without recomputing the default myself as this can be subject to change.

If you do not want to pass a name and path, then in that scenario the run details should provide the run id and the path can be created using the same. In an ideal scenario you would like to pass these details, if they are not passed the recommended approach is to use them in a intermediate step so the SDK can handle this for you, as seen in this PythonScriptStep()
from azureml.data import OutputFileDatasetConfig
dataprep_output = OutputFileDatasetConfig()
input_dataset = Dataset.get_by_name(workspace, 'raw_data')
dataprep_step = PythonScriptStep(
name="prep_data",
script_name="dataprep.py",
compute_target=cluster,
arguments=[input_dataset.as_named_input('raw_data').as_mount(), dataprep_output]
)

output = OutputFileDatasetConfig()
src = ScriptRunConfig(source_directory=path,
script='script.py',
compute_target=ct,
environment=env,
arguments = ["--output_path", output])
run = exp.submit(src, tags=tags)
###############INSIDE script.py
mount_point = os.path.dirname(args.output_path)
os.makedirs(mount_point, exist_ok=True)
print("mount_point : " + mount_point)

Related

Creating an empty folder on Dropbox with Python. Is there a simpler way?

Here's my sample code which works:
import os, io, dropbox
def createFolder(dropboxBaseFolder, newFolder):
# creating a temp dummy destination file path
dummyFileTo = dropboxBaseFolder + newFolder + '/' + 'temp.bin'
# creating a virtual in-memory binary file
f = io.BytesIO(b"\x00")
# uploading the dummy file in order to cause creation of the containing folder
dbx.files_upload(f.read(), dummyFileTo)
# now that the folder is created, delete the dummy file
dbx.files_delete_v2(dummyFileTo)
accessToken = '....'
dbx = dropbox.Dropbox(accessToken)
dropboxBaseDir = '/test_dropbox'
dropboxNewSubDir = '/new_empty_sub_dir'
createFolder(dropboxBaseDir, dropboxNewSubDir)
But is there a more efficient/simpler way to do the task ?
Yes, as Ronald mentioned in the comments, you can use the files_create_folder_v2 method to create a new folder.
That would look like this, modifying your code:
import dropbox
accessToken = '....'
dbx = dropbox.Dropbox(accessToken)
dropboxBaseDir = '/test_dropbox'
dropboxNewSubDir = '/new_empty_sub_dir'
res = dbx.files_create_folder_v2(dropboxBaseDir + dropboxNewSubDir)
# access the information for the newly created folder in `res`

Getting owner of file from smb share, by using python on linux

I need to find out for a script I'm writing who is the true owner of a file in an smb share (mounted using mount -t cifs of course on my server and using net use through windows machines).
Turns out it is a real challenge finding this information out using python on a linux server.
I tried using many many smb libraries (such as smbprotocol, smbclient and others), nothing worked.
I find few solutions for windows, they all use pywin32 or another windows specific package.
And I also managed to do it from bash using smbcalcs but couldn't do it cleanly but using subprocess.popen('smbcacls')..
Any idea on how to solve it?
This was unbelievably not a trivial task, and unfortunately the answer isn't simple as I hoped it would be..
I'm posting this answer if someone will be stuck with this same problem in the future, but hope maybe someone would post a better solution earlier
In order to find the owner I used this library with its examples:
from smb.SMBConnection import SMBConnection
conn = SMBConnection(username='<username>', password='<password>', domain=<domain>', my_name='<some pc name>', remote_name='<server name>')
conn.connect('<server name>')
sec_att = conn.getSecurity('<share name>', r'\some\file\path')
owner_sid = sec_att.owner
The problem is that pysmb package will only give you the owner's SID and not his name.
In order to get his name you need to make an ldap query like in this answer (reposting the code):
from ldap3 import Server, Connection, ALL
from ldap3.utils.conv import escape_bytes
s = Server('my_server', get_info=ALL)
c = Connection(s, 'my_user', 'my_password')
c.bind()
binary_sid = b'....' # your sid must be in binary format
c.search('my_base', '(objectsid=' + escape_bytes(binary_sid) + ')', attributes=['objectsid', 'samaccountname'])
print(c.entries)
But of course nothing will be easy, it took me hours to find a way to convert a string SID to binary SID in python, and in the end this solved it:
# posting the needed functions and omitting the class part
def byte(strsid):
'''
Convert a SID into bytes
strdsid - SID to convert into bytes
'''
sid = str.split(strsid, '-')
ret = bytearray()
sid.remove('S')
for i in range(len(sid)):
sid[i] = int(sid[i])
sid.insert(1, len(sid)-2)
ret += longToByte(sid[0], size=1)
ret += longToByte(sid[1], size=1)
ret += longToByte(sid[2], False, 6)
for i in range(3, len(sid)):
ret += cls.longToByte(sid[i])
return ret
def byteToLong(byte, little_endian=True):
'''
Convert bytes into a Python integer
byte - bytes to convert
little_endian - True (default) or False for little or big endian
'''
if len(byte) > 8:
raise Exception('Bytes too long. Needs to be <= 8 or 64bit')
else:
if little_endian:
a = byte.ljust(8, b'\x00')
return struct.unpack('<q', a)[0]
else:
a = byte.rjust(8, b'\x00')
return struct.unpack('>q', a)[0]
... AND finally you have the full solution! enjoy :(
I'm adding this answer to let you know of the option of using smbprotocol; as well as expand in case of misunderstood terminology.
SMBProtocol Owner Info
It is possible to get the SID using the smbprotocol library as well (just like with the pysmb library).
This was brought up in the github issues section of the smbprotocol repo, along with an example of how to do it. The example provided is fantastic and works perfectly. An extremely stripped down version
However, this also just retrieves a SID and will need a secondary library to perform a lookup.
Here's a function to get the owner SID (just wrapped what's in the gist in a function. Including here in case the gist is deleted or lost for any reason).
import smbclient
from ldap3 import Server, Connection, ALL,NTLM,SUBTREE
def getFileOwner(smb: smbclient, conn: Connection, filePath: str):
from smbprotocol.file_info import InfoType
from smbprotocol.open import FilePipePrinterAccessMask,SMB2QueryInfoRequest, SMB2QueryInfoResponse
from smbprotocol.security_descriptor import SMB2CreateSDBuffer
class SecurityInfo:
# 100% just pulled from gist example
Owner = 0x00000001
Group = 0x00000002
Dacl = 0x00000004
Sacl = 0x00000008
Label = 0x00000010
Attribute = 0x00000020
Scope = 0x00000040
Backup = 0x00010000
def guid2hex(text_sid):
"""convert the text string SID to a hex encoded string"""
s = ['\\{:02X}'.format(ord(x)) for x in text_sid]
return ''.join(s)
def get_sd(fd, info):
""" Get the Security Descriptor for the opened file. """
query_req = SMB2QueryInfoRequest()
query_req['info_type'] = InfoType.SMB2_0_INFO_SECURITY
query_req['output_buffer_length'] = 65535
query_req['additional_information'] = info
query_req['file_id'] = fd.file_id
req = fd.connection.send(query_req, sid=fd.tree_connect.session.session_id, tid=fd.tree_connect.tree_connect_id)
resp = fd.connection.receive(req)
query_resp = SMB2QueryInfoResponse()
query_resp.unpack(resp['data'].get_value())
security_descriptor = SMB2CreateSDBuffer()
security_descriptor.unpack(query_resp['buffer'].get_value())
return security_descriptor
with smbclient.open_file(filePath, mode='rb', buffering=0,
desired_access=FilePipePrinterAccessMask.READ_CONTROL) as fd:
sd = get_sd(fd.fd, SecurityInfo.Owner | SecurityInfo.Dacl)
# returns SID
_sid = sd.get_owner()
try:
# Don't forget to convert the SID string-like object to a string
# or you get an error related to "0" not existing
sid = guid2hex(str(_sid))
except:
print(f"Failed to convert SID {_sid} to HEX")
raise
conn.search('DC=dell,DC=com',f"(&(objectSid={sid}))",SUBTREE)
# Will return an empty array if no results are found
return [res['dn'].split(",")[0].replace("CN=","") for res in conn.response if 'dn' in res]
to use:
# Client config is required if on linux, not if running on windows
smbclient.ClientConfig(username=username, password=password)
# Setup LDAP session
server = Server('mydomain.com',get_info=ALL,use_ssl = True)
# you can turn off raise_exceptions, or leave it out of the ldap connection
# but I prefer to know when there are issues vs. silently failing
conn = Connection(server, user="domain\username", password=password, raise_exceptions=True,authentication=NTLM)
conn.start_tls()
conn.open()
conn.bind()
# Run the check
fileCheck = r"\\shareserver.server.com\someNetworkShare\someFile.txt"
owner = getFileOwner(smbclient, conn, fileCheck)
# Unbind ldap session
# I'm not clear if this is 100% required, I don't THINK so
# but better safe than sorry
conn.unbind()
# Print results
print(owner)
Now, this isn't super efficient. It takes 6 seconds for me to run this one a SINGLE file. So if you wanted to run some kind of ownership scan, then you probably want to just write the program in C++ or some other low-level language instead of trying to use python. But for something quick and dirty this does work. You could also setup a threading pool and run batches. The piece that takes longest is connecting to the file itself, not running the ldap query, so if you can find a more efficient way to do that you'll be golden.
Terminology Warning, Owner != Creator/Author
Last note on this. Owner != File Author. Many domain environments, and in particular SMB shares, automatically alter ownership from the creator to a group. In my case the results of the above is:
What I was actually looking for was the creator of the file. File creator and modifier aren't attributes which windows keeps track of by default. An administrator can enable policies to audit file changes in a share, or auditing can be enabled on a file-by-file basis using the Security->Advanced->Auditing functionality for an individual file (which does nothing to help you determine the creator).
That being said, some applications store that information for themselves. For example, if you're looking for Excel this answer provides a method for which to get the creator of any xls or xlsx files (doesn't work for xlsb due to the binary nature of the files). Unfortunately few files store this kind of information. In my case I was hoping to get that info for tblu, pbix, and other reporting type files. However, they don't contain this information (which is good from a privacy perspective).
So in case anyone finds this answer trying to solve the same kind of thing I did - Your best bet (to get actual authorship information) is to work with your domain IT administrators to get auditing setup.

how to read different block setting from kinto.ini file

I created a different block in my kinto.ini file and i want to use those setting in my program.
#kinto.ini
[mysetting]
name = json
username = jsonmellow
password = *********
[app:main]
use = egg:kinto
kinto.storage_url = postgre//
if we use 'config.get_setting' function of kinto it gives me the setting of the default block "app:main" only. so how can i get the other setting from "mysetting" block.
you can use prefix for your settings like:
[app:main]
...
mysetting.name = json
mysetting.username = jsonmellow
But if you still need some extra section in ini: How can I access a custom section in a Pyramid .ini file?

Setting the current test insertion within the DUT model

We have evolved our Origen usage such that we have a params file and a flow file for each test module (scan, mbist, etc.). We are now at the point where we need to take into account the test insertion when handling the DUT model and the test flow generation. I can see here that using a job flag is the preferred method for specifying test insertion specifics into the flow file. And this video shows how to specify a test insertion when simulating the test flow. My question is how can a test insertion be specified when not generating a flow, only loading params files into the DUT model? Take this parameter set that defines some test conditions for a scan/ATPG test module.
scan.define_params :test_flows do |p|
p.flows.ws1.chain = [:vmin, :vmax]
p.flows.ft1.chain = [:vmin, :vmax]
p.flows.ws1.logic = [:vmin, :vmax]
p.flows.ft1.logic = [:vmin]
p.flows.ws1.delay = [:pmax]
p.flows.ft1.delay = [:pmin]
end
You can see in the parameter set hierarchy that there are two test insertions defined: 'ws1' and 'ft1'. Am I right to assume that the --job option only sets a flag somewhere when used with the origen testers:run command? Or can this option be applied to origen i, such that just loading some parameter sets will have access to the job selected?
thx
There's no built-in way to do what you want here, but given that you are using parameters in this example the way I would do it would be to align your parameter contexts to the job name:
scan.define_params :ws1 do |p|
p.flows.chain = [:vmin, :vmax]
p.flows.logic = [:vmin, :vmax]
p.flows.delay = [:pmax]
end
scan.define_params :ft1 do |p|
p.flows.chain = [:vmin, :vmax]
p.flows.logic = [:vmin]
p.flows.delay = [:pmin]
end
There are various ways to actually set the current context, one way would be to have a target setup per job:
# target/ws1.rb
MyDUT.new
dut.params = :ws1
# target/ft1.rb
MyDUT.new
dut.params = :ft1
Here it is assuming that the scan object is configured to track the context of the top-level DUT - http://origen-sdk.org/origen//guides/models/parameters/#Tracking_the_Context_of_Another_Object

How to copy files in Groovy

I need to copy a file in Groovy and saw some ways to achieve it on the web:
1
new AntBuilder().copy( file:"$sourceFile.canonicalPath",
tofile:"$destFile.canonicalPath")
2
command = ["sh", "-c", "cp src/*.txt dst/"]
Runtime.getRuntime().exec((String[]) command.toArray())
3
destination.withDataOutputStream { os->
source.withDataInputStream { is->
os << is
}
}
4
import java.nio.file.Files
import java.nio.file.Paths
Files.copy(Paths.get(a), Paths.get(b))
The 4th way seems cleanest to me as I am not sure how good is it to use AntBuilder and how heavy it is, I saw some people reporting issues with Groovy version change.
2nd way is OS dependent, 3rd might not be efficient.
Is there something in Groovy to just copy files like in the 4th statement or should I just use Java for it?
If you have Java 7, I would definitely go with
Path source = ...
Path target = ...
Files.copy(source, target)
With the java.nio.file.Path class, it can work with symbolic and hard links. From java.nio.file.Files:
This class consists exclusively of static methods that operate on
files, directories, or other types of files. In most cases, the
methods defined here will delegate to the associated file system
provider to perform the file operations.
Just as references:
Copy files from one folder to another with Groovy
http://groovyconsole.appspot.com/view.groovy?id=8001
My second option would be the ant task with AntBuilder.
If you are doing this in code, just use something like:
new File('copy.bin').bytes = new File('orig.bin').bytes
If this is for build-related code, this would also work, or use the Ant builder.
Note, if you are sure the files are textual you can use .text rather than .bytes.
If it is a text file, I would go with:
def src = new File('src.txt')
def dst = new File('dst.txt')
dst << src.text
I prefer this way:
def file = new File("old.file")
def newFile = new File("new.file")
Files.copy(file.toPath(), newFile.toPath())
To append to existing file :
def src = new File('src.txt')
def dest = new File('dest.txt')
dest << src.text
To overwrite if file exists :
def src = new File('src.txt')
def dest = new File('dest.txt')
dest.write(src.text)
I'm using AntBuilder for such tasks. It's simple, consistent, 'battle-proven' and fun.
2nd approach is too OS-specific (Linux-only in your case)
3rd it too low-level and it eats up more resources. It's useful if you need to transform the file on the way: change encoding for example
4th looks overcomplicated to me... NIO package is relatively new in JDK.
In the end of the day, I'd go for 1st option. There you can switch from copy to scp task, without re-developing the script almost from scratch
This is the way using platform independent groovy script. If anyone has questions please ask in the comments.
def file = new File("java/jcifs-1.3.18.jar")
this.class.classLoader.rootLoader.addURL(file.toURI().toURL())
def auth_server = Class.forName("jcifs.smb.NtlmPasswordAuthentication").newInstance("domain", "username", "password")
def auth_local = Class.forName("jcifs.smb.NtlmPasswordAuthentication").newInstance(null, "local_username", "local_password")
def source_url = args[0]
def dest_url = args[1]
def auth = auth_server
//prepare source file
if(!source_url.startsWith("\\\\"))
{
source_url = "\\\\localhost\\"+ source_url.substring(0, 1) + "\$" + source_url.substring(1, source_url.length());
auth = auth_local
}
source_url = "smb:"+source_url.replace("\\","/");
println("Copying from Source -> " + source_url);
println("Connecting to Source..");
def source = Class.forName("jcifs.smb.SmbFile").newInstance(source_url,auth)
println(source.canRead());
// Reset the authentication to default
auth = auth_server
//prepare destination file
if(!dest_url.startsWith("\\\\"))
{
dest_url = "\\\\localhost\\"+ dest_url.substring(0, 1) + "\$" +dest_url.substring(2, dest_url.length());
auth = auth_local
}
def dest = null
dest_url = "smb:"+dest_url.replace("\\","/");
println("Copying To Destination-> " + dest_url);
println("Connecting to Destination..");
dest = Class.forName("jcifs.smb.SmbFile").newInstance(dest_url,auth)
println(dest.canWrite());
if (dest.exists()){
println("Destination folder already exists");
}
source.copyTo(dest);
For copying files in Jenkins Groovy
For Linux:
try {
echo 'Copying the files to the required location'
sh '''cd /install/opt/
cp /install/opt/ssl.ks /var/local/system/'''
echo 'File is copied successfully'
}
catch(Exception e) {
error 'Copying file was unsuccessful'
}
**For Windows:**
try {
echo 'Copying the files to the required location'
bat '''#echo off
copy C:\\Program Files\\install\\opt\\ssl.ks C:\\ProgramData\\install\\opt'''
echo 'File is copied successfully'
}
catch(Exception e) {
error 'Copying file was unsuccessful'
}

Resources