MafftCommandline and io.StringIO - io

I've been trying to use the Mafft alignment tool from Bio.Align.Applications. Currently, I've had success writing my sequence information out to temporary text files that are then read by MafftCommandline(). However, I'd like to avoid redundant steps as much as possible, so I've been trying to write to a memory file instead using io.StringIO(). This is where I've been having problems. I can't get MafftCommandline() to read internal files made by io.StringIO(). I've confirmed that the internal files are compatible with functions such as AlignIO.read(). The following is my test code:
from Bio.Align.Applications import MafftCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import io
from Bio import AlignIO
sequences1 = ["AGGGGC",
"AGGGC",
"AGGGGGC",
"AGGAGC",
"AGGGGG"]
longest_length = max(len(s) for s in sequences1)
padded_sequences = [s.ljust(longest_length, '-') for s in sequences1] #padded sequences used to test compatibilty with AlignIO
ioSeq = ''
for items in padded_sequences:
ioSeq += '>unknown\n'
ioSeq += items + '\n'
newC = io.StringIO(ioSeq)
cLoc = str(newC).strip()
cLocEdit = cLoc[:len(cLoc)] #create string to remove < and >
test1Handle = AlignIO.read(newC, "fasta")
#test1HandleString = AlignIO.read(cLocEdit, "fasta") #fails to interpret cLocEdit string
records = (SeqRecord(Seq(s)) for s in padded_sequences)
SeqIO.write(records, "msa_example.fasta", "fasta")
test1Handle1 = AlignIO.read("msa_example.fasta", "fasta") #alignIO same for both #demonstrates working AlignIO
in_file = '.../msa_example.fasta'
mafft_exe = '/usr/local/bin/mafft'
mafft_cline = MafftCommandline(mafft_exe, input=in_file) #have to change file path
mafft_cline1 = MafftCommandline(mafft_exe, input=cLocEdit) #fails to read string (same as AlignIO)
mafft_cline2 = MafftCommandline(mafft_exe, input=newC)
stdout, stderr = mafft_cline()
print(stdout) #corresponds to MafftCommandline with input file
stdout1, stderr1 = mafft_cline1()
print(stdout1) #corresponds to MafftCommandline with internal file
I get the following error messages:
ApplicationError: Non-zero return code 2 from '/usr/local/bin/mafft <_io.StringIO object at 0x10f439798>', message "/bin/sh: -c: line 0: syntax error near unexpected token `newline'"
I believe this results due to the arrows ('<' and '>') present in the file path.
ApplicationError: Non-zero return code 1 from '/usr/local/bin/mafft "_io.StringIO object at 0x10f439af8"', message '/usr/local/bin/mafft: Cannot open _io.StringIO object at 0x10f439af8.'
Attempting to remove the arrows by converting the file path to a string and indexing resulted in the above error.
Ultimately my goal is to reduce computation time. I hope to accomplish this by calling internal memory instead of writing out to a separate text file. Any advice or feedback regarding my goal is much appreciated. Thanks in advance.

I can't get MafftCommandline() to read internal files made by
io.StringIO().
This is not surprising for a couple of reasons:
As you're aware, Biopython doesn't implement Mafft, it simply
provides a convenient interface to setup a call to mafft in
/usr/local/bin. The mafft executable runs as a separate process
that does not have access to your Python program's internal memory,
including your StringIO file.
The mafft program only works with an input file, it doesn't even
allow stdin as a data source. (Though it does allow stdout as a
data sink.) So ultimately, there must be a file in the file system
for mafft to open. Thus the need for your temporary file.
Perhaps tempfile.NamedTemporaryFile() or tempfile.mkstemp() might be a reasonable compromise.

Related

Alternative methods for deprecated gym.wrappers.Monitor()

The example is from Chapter 8 of Grokking Deep Reinforcement Learning written by Miguel Morales.
Please click here
The wrappers.Monitor is deprecated after the book is published. The code in question is as below:
env = wrappers.Monitor(
env, mdir, force=True,
mode=monitor_mode,
video_callable=lambda e_idx: record) if monitor_mode else env
I searched the internet and tried 2 methods but failed.
1- gnwrapper.Monitor
I pip install gym-notebook-wrapper and import gnwrapper, then rewrite the code
env = gnwrapper.Monitor(gym.make(env_name),directory="./")
A [WinError 2] The system cannot find the file specified error message is returned.
2- gym.wrappers.RecordVideo
I from gym.wrappers import RecordVideo then rewrite the code
env = RecordVideo(gym.make(env_name), "./")
A AttributeError: 'CartPoleEnv' object has no attribute 'videos' error message is returned.
Is there any way to solve the problem?
After some experiments, the best way to work around the problem is to downgrade the gym version to 0.20.0, in order to preserve the wrappers.Monitor function.
Moreover, the subprocess.Popen and subprocess.check_output does not work as the original author suggested, so I use moviepy (please pip install moviepy if you do not have this libarary and from moviepy.editor import VideoFileClip at the very beginning) to change MP4 to GIF files. The code in get_gif_html is amended as follows.
gif_path = basename + '.gif'
if not os.path.exists(gif_path):
videoClip = VideoFileClip(video_path)
videoClip.write_gif(gif_path, logger=None)
gif = io.open(gif_path, 'r+b').read()
The program now completely works.
Edit (24 Jul 2022):
The following solution is for gym version 0.25.0, as I hope the program works for later versions. Windows 10 and Juptyer Notebook is used for the demonstration.
(1) Maintain the moviepy improvement stated above
(2) import from gym.wrappers.record_video import RecordVideo and substitute
env = wrappers.Monitor(
env, mdir, force=True,
mode=monitor_mode,
video_callable=lambda e_idx: record) if monitor_mode else
with this
if monitor_mode:
env = RecordVideo(env, mdir, episode_trigger=lambda e_idx:record)
env.reset()
env.start_video_recorder()
else:
env = env
RecordVideo function takes (env, video_folder, episode_trigger) arguments, and video_callable and episode_trigger are the same thing.
(3) Loss of env.videos in get_gif_html function in new versions.
In the old setting, after the video is recorded, the MP4 and meta.json files are stored in the following address with a random file name
C:\Users\xxx\AppData\Local\Temp
https://i.stack.imgur.com/Za3Sd.jpg
This env.videos function returns a list of tuples, each tuples consisted of address path of MP4 and meta.json.
I tried to recreate the env.videos by
(a) declare the mdir as global variable
def get_make_env_fn(**kargs):
def make_env_fn(env_name, seed=None, render=None, record=False,
unwrapped=False, monitor_mode=None,
inner_wrappers=None, outer_wrappers=None):
global mdir
mdir = tempfile.mkdtemp()
(b) in the demo_progression and demo_last function, grouping all MP4 and meta_json files, zipping them and use list comprehension to make a new env_videos variable (as opposed to env.videos)
env.close()
mp4_files = ([(mdir + '\\' + file) for file in os.listdir(mdir) if file.endswith('.mp4')])
meta_json_files = ([(mdir + '\\' + file) for file in os.listdir(mdir) if file.endswith('.meta.json')])
env_videos = [(mp4, meta_json) for mp4, meta_json in zip(mp4_files, meta_json_files)]
data = get_gif_html(env_videos=env_videos,
These are all the changes.

Vertex AI scheduled notebooks doesn't recognize existence of folders

I have a managed jupyter notebook in Vertex AI that I want to schedule. The notebook works just fine as long as I start it manually, but as soon as it is scheduled, it fails. There are in fact many things that go wrong when scheduled, some of them are fixable. Before explaining what my trouble is, let me first give some details of the context.
The notebook gathers information from an API for several stores and saves the data in different folders before processing it, saving csv-files to store-specific folders and to bigquery. So, in the location of the notebook, I have:
The notebook
Functions needed for the handling of data (as *.py files)
A series of folders, some of which have subfolders which also have subfolders
When I execute this manually, no problem. Everything works well and all files end up exactly where they should, as well as in different bigQuery tables.
However, when scheduling the execution of the notebook, everything goes wrong. First, the files *.py cannot be read (as import). No problem, I added the functions in the notebook.
Now, the following error is where I am at a loss, because I have no idea why it does work or how to fix it. The code that leads to the error is the following:
internal = "https://api.************************"
df_descriptions = []
storess = internal
response_stores = requests.get(storess,auth = HTTPBasicAuth(userInternal, keyInternal))
pathlib.Path("stores/request_1.json").write_bytes(response_stores.content)
filepath = "stores"
files = os.listdir(filepath)
for file in files:
with open(filepath + "/"+file) as json_string:
jsonstr = json.load(json_string)
information = pd.json_normalize(jsonstr)
df_descriptions.append(information)
StoreINFO = pd.concat(df_descriptions)
StoreINFO = StoreINFO.dropna()
StoreINFO = StoreINFO[StoreINFO['storeIdMappings'].map(lambda d: len(d)) > 0]
cloud_store_ids = list(set(StoreINFO.cloudStoreId))
LastWeek = datetime.date.today()- timedelta(days=2)
LastWeek =np.datetime64(LastWeek)
and the error reported is:
FileNotFoundError Traceback (most recent call last)
/tmp/ipykernel_165/2970402631.py in <module>
5 storess = internal
6 response_stores = requests.get(storess,auth = HTTPBasicAuth(userInternal, keyInternal))
----> 7 pathlib.Path("stores/request_1.json").write_bytes(response_stores.content)
8
9 filepath = "stores"
/opt/conda/lib/python3.7/pathlib.py in write_bytes(self, data)
1228 # type-check for the buffer interface before truncating the file
1229 view = memoryview(data)
-> 1230 with self.open(mode='wb') as f:
1231 return f.write(view)
1232
/opt/conda/lib/python3.7/pathlib.py in open(self, mode, buffering, encoding, errors, newline)
1206 self._raise_closed()
1207 return io.open(self, mode, buffering, encoding, errors, newline,
-> 1208 opener=self._opener)
1209
1210 def read_bytes(self):
/opt/conda/lib/python3.7/pathlib.py in _opener(self, name, flags, mode)
1061 def _opener(self, name, flags, mode=0o666):
1062 # A stub for the opener argument to built-in open()
-> 1063 return self._accessor.open(self, flags, mode)
1064
1065 def _raw_open(self, flags, mode=0o777):
FileNotFoundError: [Errno 2] No such file or directory: 'stores/request_1.json'
I would gladly find another way to do this, for instance by using GCS buckets, but my issue is the existence of sub-folders. There are many stores and I do not wish to do this operation manually because some retailers for which I am doing this have over 1000 stores. My python code generates all these folders and as I understand it, this is not feasible in GCS.
How can I solve this issue?
GCS uses a flat namespace, so folders don't actually exist, but can be simulated as given in this documentation.For your requirement, you can either use absolute path (starting with "/" -- not relative) or create the "stores" directory (with "mkdir"). For more information you can check this blog.

Angr can't solve the googlectf beginner problem

I am a student studying angr, first time.
I'm watching the code in this url.
https://github.com/Dvd848/CTFs/blob/master/2020_GoogleCTF/Beginner.md
import angr
import claripy
FLAG_LEN = 15
STDIN_FD = 0
base_addr = 0x100000 # To match addresses to Ghidra
proj = angr.Project("./a.out", main_opts={'base_addr': base_addr})
flag_chars = [claripy.BVS('flag_%d' % i, 8) for i in range(FLAG_LEN)]
flag = claripy.Concat( *flag_chars + [claripy.BVV(b'\n')]) # Add \n for scanf() to accept the input
state = proj.factory.full_init_state(
args=['./a.out'],
add_options=angr.options.unicorn,
stdin=flag,
)
# Add constraints that all characters are printable
for k in flag_chars:
state.solver.add(k >= ord('!'))
state.solver.add(k <= ord('~'))
simgr = proj.factory.simulation_manager(state)
find_addr = 0x101124 # SUCCESS
avoid_addr = 0x10110d # FAILURE
simgr.explore(find=find_addr, avoid=avoid_addr)
if (len(simgr.found) > 0):
for found in simgr.found:
print(found.posix.dumps(STDIN_FD))
https://github.com/google/google-ctf/tree/master/2020/quals/reversing-beginner/attachments
Which is the answer of googlectf beginner.
But, the above code does not work. It doesn't give me the answer.
I want to know why the code is not working.
When I execute this code, the output was empty.
I run the code with python3 in Ubuntu 20.04 in wsl2
Thank you.
I believe this script isn't printing anything because angr fails to find a solution and then exits. You can prove this by appending the following to your script:
else:
raise Exception('Could not find the solution')
If the exception raises, a valid solution was not found.
In terms of why it doesn't work, this code looks like copy & paste from a few different sources, and so it's fairly convoluted.
For example, the way the flag symbol is passed to stdin is not ideal. By default, stdin is a SimPackets, so it's best to keep it that way.
The following script solves the challenge, I have commented it to help you understand. You will notice that changing stdin=angr.SimPackets(name='stdin', content=[(flag, 15)]) to stdin=flag will cause the script to fail, due to the reason mentioned above.
import angr
import claripy
base = 0x400000 # Default angr base
project = angr.Project("./a.out")
flag = claripy.BVS("flag", 15 * 8) # length is expected in bits here
initial_state = project.factory.full_init_state(
stdin=angr.SimPackets(name='stdin', content=[(flag, 15)]), # provide symbol and length (in bytes)
add_options ={
angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS
}
)
# constrain flag to common alphanumeric / punctuation characters
[initial_state.solver.add(byte >= 0x20, byte <= 0x7f) for byte in flag.chop(8)]
sim = project.factory.simgr(initial_state)
sim.explore(
find=lambda s: b"SUCCESS" in s.posix.dumps(1), # search for a state with this result
avoid=lambda s: b"FAILURE" in s.posix.dumps(1) # states that meet this constraint will be added to the avoid stash
)
if sim.found:
solution_state = sim.found[0]
print(f"[+] Success! Solution is: {solution_state.posix.dumps(0)}") # dump whatever was sent to stdin to reach this state
else:
raise Exception('Could not find the solution') # Tell us if angr failed to find a solution state
A bit of Trivia - there are actually multiple 'solutions' that the program would accept, I guess the CTF flag server only accepts one though.
❯ echo -ne 'CTF{\x00\xe0MD\x17\xd1\x93\x1b\x00n)' | ./a.out
Flag: SUCCESS

From SSH not decoded from bytes to ASCII?

Good afternoon.
I get the example below from SSH:
b"rxmop:moty=rxotg;\x1b[61C\r\nRADIO X-CEIVER ADMINISTRATION\x1b[50C\r\nMANAGED OBJECT DATA\x1b[60C\r\n\x1b[79C\r\nMO\x1b[9;19HRSITE\x1b[9;55HCOMB FHOP MODEL\x1b[8C\r\nRXOTG-58\x1b[10;19H54045_1800\x1b[10;55HHYB"
I process ssh.recv (99999) .decode ('ASCII')
but some characters are not decoded for example:
\x1b[61C
\x1b[50C
\x1b[9;55H
\x1b[9;19H
The article below explains that these are ANSI escape codes that appear since I use invoke_shell. Previously everything worked until it moved to another server.
Is there a simple way to get rid of junk values that come when you SSH using Python's Paramiko library and fetch output from CLI of a remote machine?
When I write to the file, I also get:
rxmop:moty=rxotg;[61C
RADIO X-CEIVER ADMINISTRATION[50C
MANAGED OBJECT DATA[60C
[79C
MO[9;19HRSITE[9;55HCOMB FHOP MODEL[8C
RXOTG-58[10;19H54045_1800[10;55HHYB
If you use PuTTY everything is clear and beautiful.
I can't get away from invoke_shell because the connection is being thrown from one server to another.
Sample code below:
# coding:ascii
import paramiko
port = 22
data = ""
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(hostname=host, username=user, password=secret, port=port, timeout=10)
ssh = client.invoke_shell()
ssh.send("rxmop:moty=rxotg;\n")
while data.find("<") == -1:
time.sleep(0.1)
data += ssh.recv(99999).decode('ascii')
ssh.close()
client.close()
f = open('text.txt', 'w')
f.write(data)
f.close()
The normal output is below:
MO RSITE COMB FHOP MODEL
RXOTG-58 54045_1800 HYB BB G12
SWVERREPL SWVERDLD SWVERACT TMODE
B1314R081D TDM
CONFMD CONFACT TRACO ABISALLOC CLUSTERID SCGR
NODEL 4 POOL FLEXIBLE
DAMRCR CLTGINST CCCHCMD SWVERCHG
NORMAL UNLOCKED
PTA JBSDL PAL JBPTA
TGFID SIGDEL BSSWANTED PACKALG
H'0001-19B3 NORMAL
What can you recommend in order to return normal output, so that all characters are processed?
Regular expressions do not help, since the structure of the record is shifted, then characters from certain positions are selected in the code.
PS try to use ssh.invoke_shell (term='xterm') don't work.
There is an answer here:
How can I remove the ANSI escape sequences from a string in python
There are other ways...
https://unix.stackexchange.com/questions/14684/removing-control-chars-including-console-codes-colours-from-script-output
Essentially, you are 'screen-scraping' input, and you need to strip the ANSI codes. So, grab the input, and then strip the codes.
import re
... (your ssh connection here)
data = ""
while data.find("<") == -1:
time.sleep(0.1)
chunk = ssh.recv(99999)
data += chunk
... (your ssh connection cleanup here)
ansi_escape = re.compile(r'\x1B(?:[#-Z\\-_]|\[[0-?]*[ -/]*[#-~])')
data = ansi_escape.sub('', data)

Getting owner of file from smb share, by using python on linux

I need to find out for a script I'm writing who is the true owner of a file in an smb share (mounted using mount -t cifs of course on my server and using net use through windows machines).
Turns out it is a real challenge finding this information out using python on a linux server.
I tried using many many smb libraries (such as smbprotocol, smbclient and others), nothing worked.
I find few solutions for windows, they all use pywin32 or another windows specific package.
And I also managed to do it from bash using smbcalcs but couldn't do it cleanly but using subprocess.popen('smbcacls')..
Any idea on how to solve it?
This was unbelievably not a trivial task, and unfortunately the answer isn't simple as I hoped it would be..
I'm posting this answer if someone will be stuck with this same problem in the future, but hope maybe someone would post a better solution earlier
In order to find the owner I used this library with its examples:
from smb.SMBConnection import SMBConnection
conn = SMBConnection(username='<username>', password='<password>', domain=<domain>', my_name='<some pc name>', remote_name='<server name>')
conn.connect('<server name>')
sec_att = conn.getSecurity('<share name>', r'\some\file\path')
owner_sid = sec_att.owner
The problem is that pysmb package will only give you the owner's SID and not his name.
In order to get his name you need to make an ldap query like in this answer (reposting the code):
from ldap3 import Server, Connection, ALL
from ldap3.utils.conv import escape_bytes
s = Server('my_server', get_info=ALL)
c = Connection(s, 'my_user', 'my_password')
c.bind()
binary_sid = b'....' # your sid must be in binary format
c.search('my_base', '(objectsid=' + escape_bytes(binary_sid) + ')', attributes=['objectsid', 'samaccountname'])
print(c.entries)
But of course nothing will be easy, it took me hours to find a way to convert a string SID to binary SID in python, and in the end this solved it:
# posting the needed functions and omitting the class part
def byte(strsid):
'''
Convert a SID into bytes
strdsid - SID to convert into bytes
'''
sid = str.split(strsid, '-')
ret = bytearray()
sid.remove('S')
for i in range(len(sid)):
sid[i] = int(sid[i])
sid.insert(1, len(sid)-2)
ret += longToByte(sid[0], size=1)
ret += longToByte(sid[1], size=1)
ret += longToByte(sid[2], False, 6)
for i in range(3, len(sid)):
ret += cls.longToByte(sid[i])
return ret
def byteToLong(byte, little_endian=True):
'''
Convert bytes into a Python integer
byte - bytes to convert
little_endian - True (default) or False for little or big endian
'''
if len(byte) > 8:
raise Exception('Bytes too long. Needs to be <= 8 or 64bit')
else:
if little_endian:
a = byte.ljust(8, b'\x00')
return struct.unpack('<q', a)[0]
else:
a = byte.rjust(8, b'\x00')
return struct.unpack('>q', a)[0]
... AND finally you have the full solution! enjoy :(
I'm adding this answer to let you know of the option of using smbprotocol; as well as expand in case of misunderstood terminology.
SMBProtocol Owner Info
It is possible to get the SID using the smbprotocol library as well (just like with the pysmb library).
This was brought up in the github issues section of the smbprotocol repo, along with an example of how to do it. The example provided is fantastic and works perfectly. An extremely stripped down version
However, this also just retrieves a SID and will need a secondary library to perform a lookup.
Here's a function to get the owner SID (just wrapped what's in the gist in a function. Including here in case the gist is deleted or lost for any reason).
import smbclient
from ldap3 import Server, Connection, ALL,NTLM,SUBTREE
def getFileOwner(smb: smbclient, conn: Connection, filePath: str):
from smbprotocol.file_info import InfoType
from smbprotocol.open import FilePipePrinterAccessMask,SMB2QueryInfoRequest, SMB2QueryInfoResponse
from smbprotocol.security_descriptor import SMB2CreateSDBuffer
class SecurityInfo:
# 100% just pulled from gist example
Owner = 0x00000001
Group = 0x00000002
Dacl = 0x00000004
Sacl = 0x00000008
Label = 0x00000010
Attribute = 0x00000020
Scope = 0x00000040
Backup = 0x00010000
def guid2hex(text_sid):
"""convert the text string SID to a hex encoded string"""
s = ['\\{:02X}'.format(ord(x)) for x in text_sid]
return ''.join(s)
def get_sd(fd, info):
""" Get the Security Descriptor for the opened file. """
query_req = SMB2QueryInfoRequest()
query_req['info_type'] = InfoType.SMB2_0_INFO_SECURITY
query_req['output_buffer_length'] = 65535
query_req['additional_information'] = info
query_req['file_id'] = fd.file_id
req = fd.connection.send(query_req, sid=fd.tree_connect.session.session_id, tid=fd.tree_connect.tree_connect_id)
resp = fd.connection.receive(req)
query_resp = SMB2QueryInfoResponse()
query_resp.unpack(resp['data'].get_value())
security_descriptor = SMB2CreateSDBuffer()
security_descriptor.unpack(query_resp['buffer'].get_value())
return security_descriptor
with smbclient.open_file(filePath, mode='rb', buffering=0,
desired_access=FilePipePrinterAccessMask.READ_CONTROL) as fd:
sd = get_sd(fd.fd, SecurityInfo.Owner | SecurityInfo.Dacl)
# returns SID
_sid = sd.get_owner()
try:
# Don't forget to convert the SID string-like object to a string
# or you get an error related to "0" not existing
sid = guid2hex(str(_sid))
except:
print(f"Failed to convert SID {_sid} to HEX")
raise
conn.search('DC=dell,DC=com',f"(&(objectSid={sid}))",SUBTREE)
# Will return an empty array if no results are found
return [res['dn'].split(",")[0].replace("CN=","") for res in conn.response if 'dn' in res]
to use:
# Client config is required if on linux, not if running on windows
smbclient.ClientConfig(username=username, password=password)
# Setup LDAP session
server = Server('mydomain.com',get_info=ALL,use_ssl = True)
# you can turn off raise_exceptions, or leave it out of the ldap connection
# but I prefer to know when there are issues vs. silently failing
conn = Connection(server, user="domain\username", password=password, raise_exceptions=True,authentication=NTLM)
conn.start_tls()
conn.open()
conn.bind()
# Run the check
fileCheck = r"\\shareserver.server.com\someNetworkShare\someFile.txt"
owner = getFileOwner(smbclient, conn, fileCheck)
# Unbind ldap session
# I'm not clear if this is 100% required, I don't THINK so
# but better safe than sorry
conn.unbind()
# Print results
print(owner)
Now, this isn't super efficient. It takes 6 seconds for me to run this one a SINGLE file. So if you wanted to run some kind of ownership scan, then you probably want to just write the program in C++ or some other low-level language instead of trying to use python. But for something quick and dirty this does work. You could also setup a threading pool and run batches. The piece that takes longest is connecting to the file itself, not running the ldap query, so if you can find a more efficient way to do that you'll be golden.
Terminology Warning, Owner != Creator/Author
Last note on this. Owner != File Author. Many domain environments, and in particular SMB shares, automatically alter ownership from the creator to a group. In my case the results of the above is:
What I was actually looking for was the creator of the file. File creator and modifier aren't attributes which windows keeps track of by default. An administrator can enable policies to audit file changes in a share, or auditing can be enabled on a file-by-file basis using the Security->Advanced->Auditing functionality for an individual file (which does nothing to help you determine the creator).
That being said, some applications store that information for themselves. For example, if you're looking for Excel this answer provides a method for which to get the creator of any xls or xlsx files (doesn't work for xlsb due to the binary nature of the files). Unfortunately few files store this kind of information. In my case I was hoping to get that info for tblu, pbix, and other reporting type files. However, they don't contain this information (which is good from a privacy perspective).
So in case anyone finds this answer trying to solve the same kind of thing I did - Your best bet (to get actual authorship information) is to work with your domain IT administrators to get auditing setup.

Resources