Angr can't solve the googlectf beginner problem - angr

I am a student studying angr, first time.
I'm watching the code in this url.
https://github.com/Dvd848/CTFs/blob/master/2020_GoogleCTF/Beginner.md
import angr
import claripy
FLAG_LEN = 15
STDIN_FD = 0
base_addr = 0x100000 # To match addresses to Ghidra
proj = angr.Project("./a.out", main_opts={'base_addr': base_addr})
flag_chars = [claripy.BVS('flag_%d' % i, 8) for i in range(FLAG_LEN)]
flag = claripy.Concat( *flag_chars + [claripy.BVV(b'\n')]) # Add \n for scanf() to accept the input
state = proj.factory.full_init_state(
args=['./a.out'],
add_options=angr.options.unicorn,
stdin=flag,
)
# Add constraints that all characters are printable
for k in flag_chars:
state.solver.add(k >= ord('!'))
state.solver.add(k <= ord('~'))
simgr = proj.factory.simulation_manager(state)
find_addr = 0x101124 # SUCCESS
avoid_addr = 0x10110d # FAILURE
simgr.explore(find=find_addr, avoid=avoid_addr)
if (len(simgr.found) > 0):
for found in simgr.found:
print(found.posix.dumps(STDIN_FD))
https://github.com/google/google-ctf/tree/master/2020/quals/reversing-beginner/attachments
Which is the answer of googlectf beginner.
But, the above code does not work. It doesn't give me the answer.
I want to know why the code is not working.
When I execute this code, the output was empty.
I run the code with python3 in Ubuntu 20.04 in wsl2
Thank you.

I believe this script isn't printing anything because angr fails to find a solution and then exits. You can prove this by appending the following to your script:
else:
raise Exception('Could not find the solution')
If the exception raises, a valid solution was not found.
In terms of why it doesn't work, this code looks like copy & paste from a few different sources, and so it's fairly convoluted.
For example, the way the flag symbol is passed to stdin is not ideal. By default, stdin is a SimPackets, so it's best to keep it that way.
The following script solves the challenge, I have commented it to help you understand. You will notice that changing stdin=angr.SimPackets(name='stdin', content=[(flag, 15)]) to stdin=flag will cause the script to fail, due to the reason mentioned above.
import angr
import claripy
base = 0x400000 # Default angr base
project = angr.Project("./a.out")
flag = claripy.BVS("flag", 15 * 8) # length is expected in bits here
initial_state = project.factory.full_init_state(
stdin=angr.SimPackets(name='stdin', content=[(flag, 15)]), # provide symbol and length (in bytes)
add_options ={
angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS
}
)
# constrain flag to common alphanumeric / punctuation characters
[initial_state.solver.add(byte >= 0x20, byte <= 0x7f) for byte in flag.chop(8)]
sim = project.factory.simgr(initial_state)
sim.explore(
find=lambda s: b"SUCCESS" in s.posix.dumps(1), # search for a state with this result
avoid=lambda s: b"FAILURE" in s.posix.dumps(1) # states that meet this constraint will be added to the avoid stash
)
if sim.found:
solution_state = sim.found[0]
print(f"[+] Success! Solution is: {solution_state.posix.dumps(0)}") # dump whatever was sent to stdin to reach this state
else:
raise Exception('Could not find the solution') # Tell us if angr failed to find a solution state
A bit of Trivia - there are actually multiple 'solutions' that the program would accept, I guess the CTF flag server only accepts one though.
❯ echo -ne 'CTF{\x00\xe0MD\x17\xd1\x93\x1b\x00n)' | ./a.out
Flag: SUCCESS

Related

From SSH not decoded from bytes to ASCII?

Good afternoon.
I get the example below from SSH:
b"rxmop:moty=rxotg;\x1b[61C\r\nRADIO X-CEIVER ADMINISTRATION\x1b[50C\r\nMANAGED OBJECT DATA\x1b[60C\r\n\x1b[79C\r\nMO\x1b[9;19HRSITE\x1b[9;55HCOMB FHOP MODEL\x1b[8C\r\nRXOTG-58\x1b[10;19H54045_1800\x1b[10;55HHYB"
I process ssh.recv (99999) .decode ('ASCII')
but some characters are not decoded for example:
\x1b[61C
\x1b[50C
\x1b[9;55H
\x1b[9;19H
The article below explains that these are ANSI escape codes that appear since I use invoke_shell. Previously everything worked until it moved to another server.
Is there a simple way to get rid of junk values that come when you SSH using Python's Paramiko library and fetch output from CLI of a remote machine?
When I write to the file, I also get:
rxmop:moty=rxotg;[61C
RADIO X-CEIVER ADMINISTRATION[50C
MANAGED OBJECT DATA[60C
[79C
MO[9;19HRSITE[9;55HCOMB FHOP MODEL[8C
RXOTG-58[10;19H54045_1800[10;55HHYB
If you use PuTTY everything is clear and beautiful.
I can't get away from invoke_shell because the connection is being thrown from one server to another.
Sample code below:
# coding:ascii
import paramiko
port = 22
data = ""
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(hostname=host, username=user, password=secret, port=port, timeout=10)
ssh = client.invoke_shell()
ssh.send("rxmop:moty=rxotg;\n")
while data.find("<") == -1:
time.sleep(0.1)
data += ssh.recv(99999).decode('ascii')
ssh.close()
client.close()
f = open('text.txt', 'w')
f.write(data)
f.close()
The normal output is below:
MO RSITE COMB FHOP MODEL
RXOTG-58 54045_1800 HYB BB G12
SWVERREPL SWVERDLD SWVERACT TMODE
B1314R081D TDM
CONFMD CONFACT TRACO ABISALLOC CLUSTERID SCGR
NODEL 4 POOL FLEXIBLE
DAMRCR CLTGINST CCCHCMD SWVERCHG
NORMAL UNLOCKED
PTA JBSDL PAL JBPTA
TGFID SIGDEL BSSWANTED PACKALG
H'0001-19B3 NORMAL
What can you recommend in order to return normal output, so that all characters are processed?
Regular expressions do not help, since the structure of the record is shifted, then characters from certain positions are selected in the code.
PS try to use ssh.invoke_shell (term='xterm') don't work.
There is an answer here:
How can I remove the ANSI escape sequences from a string in python
There are other ways...
https://unix.stackexchange.com/questions/14684/removing-control-chars-including-console-codes-colours-from-script-output
Essentially, you are 'screen-scraping' input, and you need to strip the ANSI codes. So, grab the input, and then strip the codes.
import re
... (your ssh connection here)
data = ""
while data.find("<") == -1:
time.sleep(0.1)
chunk = ssh.recv(99999)
data += chunk
... (your ssh connection cleanup here)
ansi_escape = re.compile(r'\x1B(?:[#-Z\\-_]|\[[0-?]*[ -/]*[#-~])')
data = ansi_escape.sub('', data)

IPv4Network in Python - Calculating next minimum subnet of different size?

I'm working with the ipaddress module in Python and trying to figure out a way of calculating the next available subnet (of either the same prefix or a different prefix) that doesn't overlap the existing subnet (the new subnet MUST be greater than the old one).
Lets say I start with network:
from ipaddress import IPv4Network
# From 10.90.1.0 to 10.90.1.31
main_net = IPv4Network("10.90.1.0/27")
I know the next available address is going to be 10.90.1.32, I can even figure this out quite easily by doing:
next_ip = main_net.broadcast_address + 1
# will output 10.90.1.32
print(next_ip)
If I wanted to find the next /27, I just create a new network like so:
# From 10.90.1.32 to 10.90.1.63
new_net = IPv4Network(f"{next_ip}/27")
This is all very straightforward so far, but now what if the next subnet I am looking for is a /26 or a /28 - how can I find the next minimum start IP address for either of these cases in Python?
I have explored using the supernet method, for example I could do something like this:
# Will print 10.90.1.0/26
print(main_net.supernet(new_prefix=27))
The problem with this method is that it will print 10.90.1.0/26 which overlaps the existing 10.90.1.0/27 network, I could make a loop and get it to keep generated the next /26 until they stop overlapping, but it seems inefficient to me. Surely, there is a better way?
Thanks to the help of Ron Maupin's helpful comment leading to a useful guide, I have managed to make a function that does this. I still want to test it a bit more but believe it is correct:
def calculate_next_ip_network(ip_bytes, current_prefix, next_prefix):
next_prefix_mask = (~((1 << (32 - next_prefix)) - 1)) & 0xFFFFFFFF
if next_prefix <= current_prefix:
bit_shift = 32 - next_prefix
else:
bit_shift = 32 - current_prefix
new_ip = (((next_prefix_mask & ip_bytes) >> bit_shift) + 1) << bit_shift
return bytes([new_ip >> i & 0xFF for i in (24, 16, 8, 0)])
Usage:
nt = IPv4Network("10.90.1.56/29")
current_prefix = nt.prefixlen
next_prefix = 25
ip_bytes = int.from_bytes(nt.network_address.packed, byteorder="big")
next_ip = calculate_next_ip_network(ip_bytes, current_prefix, next_prefix)
print(IPv4Address(next_ip))
# Should print "10.90.1.128"

Getting owner of file from smb share, by using python on linux

I need to find out for a script I'm writing who is the true owner of a file in an smb share (mounted using mount -t cifs of course on my server and using net use through windows machines).
Turns out it is a real challenge finding this information out using python on a linux server.
I tried using many many smb libraries (such as smbprotocol, smbclient and others), nothing worked.
I find few solutions for windows, they all use pywin32 or another windows specific package.
And I also managed to do it from bash using smbcalcs but couldn't do it cleanly but using subprocess.popen('smbcacls')..
Any idea on how to solve it?
This was unbelievably not a trivial task, and unfortunately the answer isn't simple as I hoped it would be..
I'm posting this answer if someone will be stuck with this same problem in the future, but hope maybe someone would post a better solution earlier
In order to find the owner I used this library with its examples:
from smb.SMBConnection import SMBConnection
conn = SMBConnection(username='<username>', password='<password>', domain=<domain>', my_name='<some pc name>', remote_name='<server name>')
conn.connect('<server name>')
sec_att = conn.getSecurity('<share name>', r'\some\file\path')
owner_sid = sec_att.owner
The problem is that pysmb package will only give you the owner's SID and not his name.
In order to get his name you need to make an ldap query like in this answer (reposting the code):
from ldap3 import Server, Connection, ALL
from ldap3.utils.conv import escape_bytes
s = Server('my_server', get_info=ALL)
c = Connection(s, 'my_user', 'my_password')
c.bind()
binary_sid = b'....' # your sid must be in binary format
c.search('my_base', '(objectsid=' + escape_bytes(binary_sid) + ')', attributes=['objectsid', 'samaccountname'])
print(c.entries)
But of course nothing will be easy, it took me hours to find a way to convert a string SID to binary SID in python, and in the end this solved it:
# posting the needed functions and omitting the class part
def byte(strsid):
'''
Convert a SID into bytes
strdsid - SID to convert into bytes
'''
sid = str.split(strsid, '-')
ret = bytearray()
sid.remove('S')
for i in range(len(sid)):
sid[i] = int(sid[i])
sid.insert(1, len(sid)-2)
ret += longToByte(sid[0], size=1)
ret += longToByte(sid[1], size=1)
ret += longToByte(sid[2], False, 6)
for i in range(3, len(sid)):
ret += cls.longToByte(sid[i])
return ret
def byteToLong(byte, little_endian=True):
'''
Convert bytes into a Python integer
byte - bytes to convert
little_endian - True (default) or False for little or big endian
'''
if len(byte) > 8:
raise Exception('Bytes too long. Needs to be <= 8 or 64bit')
else:
if little_endian:
a = byte.ljust(8, b'\x00')
return struct.unpack('<q', a)[0]
else:
a = byte.rjust(8, b'\x00')
return struct.unpack('>q', a)[0]
... AND finally you have the full solution! enjoy :(
I'm adding this answer to let you know of the option of using smbprotocol; as well as expand in case of misunderstood terminology.
SMBProtocol Owner Info
It is possible to get the SID using the smbprotocol library as well (just like with the pysmb library).
This was brought up in the github issues section of the smbprotocol repo, along with an example of how to do it. The example provided is fantastic and works perfectly. An extremely stripped down version
However, this also just retrieves a SID and will need a secondary library to perform a lookup.
Here's a function to get the owner SID (just wrapped what's in the gist in a function. Including here in case the gist is deleted or lost for any reason).
import smbclient
from ldap3 import Server, Connection, ALL,NTLM,SUBTREE
def getFileOwner(smb: smbclient, conn: Connection, filePath: str):
from smbprotocol.file_info import InfoType
from smbprotocol.open import FilePipePrinterAccessMask,SMB2QueryInfoRequest, SMB2QueryInfoResponse
from smbprotocol.security_descriptor import SMB2CreateSDBuffer
class SecurityInfo:
# 100% just pulled from gist example
Owner = 0x00000001
Group = 0x00000002
Dacl = 0x00000004
Sacl = 0x00000008
Label = 0x00000010
Attribute = 0x00000020
Scope = 0x00000040
Backup = 0x00010000
def guid2hex(text_sid):
"""convert the text string SID to a hex encoded string"""
s = ['\\{:02X}'.format(ord(x)) for x in text_sid]
return ''.join(s)
def get_sd(fd, info):
""" Get the Security Descriptor for the opened file. """
query_req = SMB2QueryInfoRequest()
query_req['info_type'] = InfoType.SMB2_0_INFO_SECURITY
query_req['output_buffer_length'] = 65535
query_req['additional_information'] = info
query_req['file_id'] = fd.file_id
req = fd.connection.send(query_req, sid=fd.tree_connect.session.session_id, tid=fd.tree_connect.tree_connect_id)
resp = fd.connection.receive(req)
query_resp = SMB2QueryInfoResponse()
query_resp.unpack(resp['data'].get_value())
security_descriptor = SMB2CreateSDBuffer()
security_descriptor.unpack(query_resp['buffer'].get_value())
return security_descriptor
with smbclient.open_file(filePath, mode='rb', buffering=0,
desired_access=FilePipePrinterAccessMask.READ_CONTROL) as fd:
sd = get_sd(fd.fd, SecurityInfo.Owner | SecurityInfo.Dacl)
# returns SID
_sid = sd.get_owner()
try:
# Don't forget to convert the SID string-like object to a string
# or you get an error related to "0" not existing
sid = guid2hex(str(_sid))
except:
print(f"Failed to convert SID {_sid} to HEX")
raise
conn.search('DC=dell,DC=com',f"(&(objectSid={sid}))",SUBTREE)
# Will return an empty array if no results are found
return [res['dn'].split(",")[0].replace("CN=","") for res in conn.response if 'dn' in res]
to use:
# Client config is required if on linux, not if running on windows
smbclient.ClientConfig(username=username, password=password)
# Setup LDAP session
server = Server('mydomain.com',get_info=ALL,use_ssl = True)
# you can turn off raise_exceptions, or leave it out of the ldap connection
# but I prefer to know when there are issues vs. silently failing
conn = Connection(server, user="domain\username", password=password, raise_exceptions=True,authentication=NTLM)
conn.start_tls()
conn.open()
conn.bind()
# Run the check
fileCheck = r"\\shareserver.server.com\someNetworkShare\someFile.txt"
owner = getFileOwner(smbclient, conn, fileCheck)
# Unbind ldap session
# I'm not clear if this is 100% required, I don't THINK so
# but better safe than sorry
conn.unbind()
# Print results
print(owner)
Now, this isn't super efficient. It takes 6 seconds for me to run this one a SINGLE file. So if you wanted to run some kind of ownership scan, then you probably want to just write the program in C++ or some other low-level language instead of trying to use python. But for something quick and dirty this does work. You could also setup a threading pool and run batches. The piece that takes longest is connecting to the file itself, not running the ldap query, so if you can find a more efficient way to do that you'll be golden.
Terminology Warning, Owner != Creator/Author
Last note on this. Owner != File Author. Many domain environments, and in particular SMB shares, automatically alter ownership from the creator to a group. In my case the results of the above is:
What I was actually looking for was the creator of the file. File creator and modifier aren't attributes which windows keeps track of by default. An administrator can enable policies to audit file changes in a share, or auditing can be enabled on a file-by-file basis using the Security->Advanced->Auditing functionality for an individual file (which does nothing to help you determine the creator).
That being said, some applications store that information for themselves. For example, if you're looking for Excel this answer provides a method for which to get the creator of any xls or xlsx files (doesn't work for xlsb due to the binary nature of the files). Unfortunately few files store this kind of information. In my case I was hoping to get that info for tblu, pbix, and other reporting type files. However, they don't contain this information (which is good from a privacy perspective).
So in case anyone finds this answer trying to solve the same kind of thing I did - Your best bet (to get actual authorship information) is to work with your domain IT administrators to get auditing setup.

snakemake - replacing wildcards in input directive by anonymous function

I am writing a snakemake that will run a bioinformatics pipeline for several input samples. These input files (two for each analysis, one with the partial string match R1 and the second with the partial string match R2) start with a pattern and end with the extension .fastq.gz. Eventually I want to perform multiple operations, though, for this example I just want to align the fastq reads against a reference genome using bwa mem. So for this example my input file is NIPT-N2002394-LL_S19_R1_001.fastq.gz and I want to generate NIPT-N2002394-LL.bam (see code below specifying the directories where input and output are).
My config.yaml file looks like so:
# Run_ID
run: "200311_A00154_0454_AHHHKMDRXX"
# Base directory: the analysis directory from which I will fetch the samples
bd: "/nexusb/nipt/"
# Define the prefix
# will be used to subset the folders in bd
prefix: "NIPT"
# Reference:
ref: "/nexus/bhinckel/19/ONT_projects/PGD_breakpoint/ref_hg19_local/hg19_chr1-y.fasta"
And below is my snakefile
import os
import re
#############
# config file
#############
configfile: "config.yaml"
#######################################
# Parsing variables from config.yaml
#######################################
RUN = config['run']
BD = config['bd']
PREFIX = config['prefix']
FQDIR = f'/nexusb/Novaseq/{RUN}/Unaligned/'
BASEDIR = BD + RUN
SAMPLES = [sample for sample in os.listdir(BASEDIR) if sample.startswith(PREFIX)]
# explanation: in BASEDIR I have multiple subdirectories. The names of the subdirectories starting with PREFIX will be the name of the elements I want to have in the list SAMPLES, which eventually shall be my {sample} wildcard
#############
# RULES
#############
rule all:
input:
expand("aligned/{sample}.bam", sample = SAMPLES)
rule bwa_map:
input:
REF = config['ref'],
R1 = FQDIR + "{sample}_S{s}_R1_001.fastq.gz",
R2 = FQDIR + "{sample}_S{s}_R2_001.fastq.gz"
output:
"aligned/{sample}.bam"
shell:
"bwa mem {input.REF} {input.R1} {input.R2}| samtools view -Sb - > {output}"
But I am getting:
Building DAG of jobs...
WildcardError in line 55 of /nexusb/nipt/200311_A00154_0454_AHHHKMDRXX/testMetrics/snakemake/Snakefile:
Wildcards in input files cannot be determined from output files:
's'
When calling snakemake -np
I believe my error lies in the definitions of R1 and R2 in the input directive. I find it puzzling because according to the official documentation snakemake should interpret any wildcard as the regex .+. But it is not doing that for sample NIPT-PearlPPlasma-05-PPx, whose R1 and R2 should be NIPT-PearlPPlasma-05-PPx_S5_R1_001.fastq.gz and NIPT-PearlPPlasma-05-PPx_S5_R2_001.fastq.gz, respectively.
Take a look again at the snakemake tutorial on how input is inferred from output, anyways I think the problem lies in this piece of code:
output:
expand("aligned/{sample}.bam", sample = SAMPLES)
And needs to be changed into
output:
"aligned/{sample}.bam"
What you had didn't work because before expand("aligned/{sample}.bam", sample = SAMPLES) basically becomes a list like this ["aligned/sample0.bam","aligned/sample1.bam"]. When you remove the expand, you only give a "description" of how the output should look like, and thus snakemake can infer the wildcards and input.
edit:
It's difficult to test it since I don't have the actual files, but you should do something like this. Won't work if multiple S-thingies exist.
def get_reads(wildcards):
R1 = FQDIR + f"{wildcards.sample}_S{{s}}_R1_001.fastq.gz"
R2 = FQDIR + f"{wildcards.sample}_S{{s}}_R2_001.fastq.gz"
globbed = glob_wildcards(R1)
R1, R2 = expand([R1, R2], s=globbed.s)
return {"R1": R1, "R2": R2}
rule bwa_map:
input:
unpack(get_reads),
REF = config['ref']
output:
"aligned/{sample}.bam"
shell:
"bwa mem {input.REF} {input.R1} {input.R2}| samtools view -Sb - > {output}"
The problem is here:
rule bwa_map:
input:
REF = config['ref'],
R1 = FQDIR + "{sample}_S{s}_R1_001.fastq.gz",
R2 = FQDIR + "{sample}_S{s}_R2_001.fastq.gz"
output:
"aligned/{sample}.bam"
Your output clearly defines a pattern where {sample} is a wildcard. When Snakemake builds the DAG and finds that any other rule requires a file that matches this pattern, it sets a concrete value to the wildcard.sample. At this moment all the inputs shall be defined, but you are introducing one more level of indirection: the wildcard {s} which is not defined.
The value of {s} shall be clearly inferred from the output. If you can do it in design time, substitute the it with the concrete values, otherwise you may use checkpoint feature of Snakemake.

MafftCommandline and io.StringIO

I've been trying to use the Mafft alignment tool from Bio.Align.Applications. Currently, I've had success writing my sequence information out to temporary text files that are then read by MafftCommandline(). However, I'd like to avoid redundant steps as much as possible, so I've been trying to write to a memory file instead using io.StringIO(). This is where I've been having problems. I can't get MafftCommandline() to read internal files made by io.StringIO(). I've confirmed that the internal files are compatible with functions such as AlignIO.read(). The following is my test code:
from Bio.Align.Applications import MafftCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import io
from Bio import AlignIO
sequences1 = ["AGGGGC",
"AGGGC",
"AGGGGGC",
"AGGAGC",
"AGGGGG"]
longest_length = max(len(s) for s in sequences1)
padded_sequences = [s.ljust(longest_length, '-') for s in sequences1] #padded sequences used to test compatibilty with AlignIO
ioSeq = ''
for items in padded_sequences:
ioSeq += '>unknown\n'
ioSeq += items + '\n'
newC = io.StringIO(ioSeq)
cLoc = str(newC).strip()
cLocEdit = cLoc[:len(cLoc)] #create string to remove < and >
test1Handle = AlignIO.read(newC, "fasta")
#test1HandleString = AlignIO.read(cLocEdit, "fasta") #fails to interpret cLocEdit string
records = (SeqRecord(Seq(s)) for s in padded_sequences)
SeqIO.write(records, "msa_example.fasta", "fasta")
test1Handle1 = AlignIO.read("msa_example.fasta", "fasta") #alignIO same for both #demonstrates working AlignIO
in_file = '.../msa_example.fasta'
mafft_exe = '/usr/local/bin/mafft'
mafft_cline = MafftCommandline(mafft_exe, input=in_file) #have to change file path
mafft_cline1 = MafftCommandline(mafft_exe, input=cLocEdit) #fails to read string (same as AlignIO)
mafft_cline2 = MafftCommandline(mafft_exe, input=newC)
stdout, stderr = mafft_cline()
print(stdout) #corresponds to MafftCommandline with input file
stdout1, stderr1 = mafft_cline1()
print(stdout1) #corresponds to MafftCommandline with internal file
I get the following error messages:
ApplicationError: Non-zero return code 2 from '/usr/local/bin/mafft <_io.StringIO object at 0x10f439798>', message "/bin/sh: -c: line 0: syntax error near unexpected token `newline'"
I believe this results due to the arrows ('<' and '>') present in the file path.
ApplicationError: Non-zero return code 1 from '/usr/local/bin/mafft "_io.StringIO object at 0x10f439af8"', message '/usr/local/bin/mafft: Cannot open _io.StringIO object at 0x10f439af8.'
Attempting to remove the arrows by converting the file path to a string and indexing resulted in the above error.
Ultimately my goal is to reduce computation time. I hope to accomplish this by calling internal memory instead of writing out to a separate text file. Any advice or feedback regarding my goal is much appreciated. Thanks in advance.
I can't get MafftCommandline() to read internal files made by
io.StringIO().
This is not surprising for a couple of reasons:
As you're aware, Biopython doesn't implement Mafft, it simply
provides a convenient interface to setup a call to mafft in
/usr/local/bin. The mafft executable runs as a separate process
that does not have access to your Python program's internal memory,
including your StringIO file.
The mafft program only works with an input file, it doesn't even
allow stdin as a data source. (Though it does allow stdout as a
data sink.) So ultimately, there must be a file in the file system
for mafft to open. Thus the need for your temporary file.
Perhaps tempfile.NamedTemporaryFile() or tempfile.mkstemp() might be a reasonable compromise.

Resources