Recursively getting body of email with Pyzmail module - python-3.x

I'm trying to create an app that needs to recursively check an email address for new emails and then do some other stuff; I'm having some problems with the getting the body of the emails, though. I'm using the pyzmail module alongside imapclient, and the Automate the Boring Stuff for guidance (with python 3.6). Here's my code:
mail = imapclient.IMAPClient('imap.gmail.com', ssl=True)
mail.login('email', 'password')
mail.select_folder('INBOX', readonly=False)
uid = mail.gmail_search('NC')
for i in uid:
message = mail.fetch(i, ['BODY[]'], 'FLAGS')
msg = pyzmail.PyzMessage.factory(message[i][b'BODY[]'])
msg.html_part.get_payload().decode(msg.text_part.charset)
But it's not working. I've basically tried different forms of this code but to no avail and there's really not that many examples that can help me along. I'm a bit of a python newbie. Can anybody help?
Thanks,
EDIT
I realized where I made a mistake and fixed a bit of the code:
server = imapclient.IMAPClient('imap.gmail.com', ssl=True)
server.login('p.imagery.serv#gmail.com', 'rabbitrun88ve')
server.select_folder('INBOX', readonly=True)
uids = server.gmail_search('NC')
for i in uids:
messages = server.fetch(i, ['BODY[]'])
msg = pyzmail.PyzMessage.factory(messages[b'BODY[]'])
The problem I'm having is with the last line, which I dont know how to fed using the variables that is created with the iterator. It throws out this message:
ValueError: input must be a string a bytes, a file or a Message

I'm not sure if you still have this problem but for those who might have similar issues in future.
I noticed a little omission in the last line which might be the culprit.
msg = pyzmail.PyzMessage.factory(messages[b'BODY[]'])
You omitted the 'i' variable of the for loop
msg = pyzmail.PyzMessage.factory(messages[i][b'BODY[]'])

I'd like to do next to get body text of searched messages:
server = imapclient.IMAPClient('imap.gmail.com', ssl=True)
server.login('p.imagery.serv#gmail.com', 'rabbitrun88ve')
server.select_folder('INBOX', readonly=True)
uids = server.gmail_search('NC')
rawmessage = server.fetch(uids, ['BODY[]'])
for i in rawmessage:
msg = pyzmail.PyzMessage.factory(rawmessage[i][b'BODY[]'])
msg.html_part.get_payload().decode(msg.text_part.charset)
In this case, you get iteration over fetched emails with body text. I checked similar example but I used text_part.get_payload() instead html regarding features of my server.

Related

Getting owner of file from smb share, by using python on linux

I need to find out for a script I'm writing who is the true owner of a file in an smb share (mounted using mount -t cifs of course on my server and using net use through windows machines).
Turns out it is a real challenge finding this information out using python on a linux server.
I tried using many many smb libraries (such as smbprotocol, smbclient and others), nothing worked.
I find few solutions for windows, they all use pywin32 or another windows specific package.
And I also managed to do it from bash using smbcalcs but couldn't do it cleanly but using subprocess.popen('smbcacls')..
Any idea on how to solve it?
This was unbelievably not a trivial task, and unfortunately the answer isn't simple as I hoped it would be..
I'm posting this answer if someone will be stuck with this same problem in the future, but hope maybe someone would post a better solution earlier
In order to find the owner I used this library with its examples:
from smb.SMBConnection import SMBConnection
conn = SMBConnection(username='<username>', password='<password>', domain=<domain>', my_name='<some pc name>', remote_name='<server name>')
conn.connect('<server name>')
sec_att = conn.getSecurity('<share name>', r'\some\file\path')
owner_sid = sec_att.owner
The problem is that pysmb package will only give you the owner's SID and not his name.
In order to get his name you need to make an ldap query like in this answer (reposting the code):
from ldap3 import Server, Connection, ALL
from ldap3.utils.conv import escape_bytes
s = Server('my_server', get_info=ALL)
c = Connection(s, 'my_user', 'my_password')
c.bind()
binary_sid = b'....' # your sid must be in binary format
c.search('my_base', '(objectsid=' + escape_bytes(binary_sid) + ')', attributes=['objectsid', 'samaccountname'])
print(c.entries)
But of course nothing will be easy, it took me hours to find a way to convert a string SID to binary SID in python, and in the end this solved it:
# posting the needed functions and omitting the class part
def byte(strsid):
'''
Convert a SID into bytes
strdsid - SID to convert into bytes
'''
sid = str.split(strsid, '-')
ret = bytearray()
sid.remove('S')
for i in range(len(sid)):
sid[i] = int(sid[i])
sid.insert(1, len(sid)-2)
ret += longToByte(sid[0], size=1)
ret += longToByte(sid[1], size=1)
ret += longToByte(sid[2], False, 6)
for i in range(3, len(sid)):
ret += cls.longToByte(sid[i])
return ret
def byteToLong(byte, little_endian=True):
'''
Convert bytes into a Python integer
byte - bytes to convert
little_endian - True (default) or False for little or big endian
'''
if len(byte) > 8:
raise Exception('Bytes too long. Needs to be <= 8 or 64bit')
else:
if little_endian:
a = byte.ljust(8, b'\x00')
return struct.unpack('<q', a)[0]
else:
a = byte.rjust(8, b'\x00')
return struct.unpack('>q', a)[0]
... AND finally you have the full solution! enjoy :(
I'm adding this answer to let you know of the option of using smbprotocol; as well as expand in case of misunderstood terminology.
SMBProtocol Owner Info
It is possible to get the SID using the smbprotocol library as well (just like with the pysmb library).
This was brought up in the github issues section of the smbprotocol repo, along with an example of how to do it. The example provided is fantastic and works perfectly. An extremely stripped down version
However, this also just retrieves a SID and will need a secondary library to perform a lookup.
Here's a function to get the owner SID (just wrapped what's in the gist in a function. Including here in case the gist is deleted or lost for any reason).
import smbclient
from ldap3 import Server, Connection, ALL,NTLM,SUBTREE
def getFileOwner(smb: smbclient, conn: Connection, filePath: str):
from smbprotocol.file_info import InfoType
from smbprotocol.open import FilePipePrinterAccessMask,SMB2QueryInfoRequest, SMB2QueryInfoResponse
from smbprotocol.security_descriptor import SMB2CreateSDBuffer
class SecurityInfo:
# 100% just pulled from gist example
Owner = 0x00000001
Group = 0x00000002
Dacl = 0x00000004
Sacl = 0x00000008
Label = 0x00000010
Attribute = 0x00000020
Scope = 0x00000040
Backup = 0x00010000
def guid2hex(text_sid):
"""convert the text string SID to a hex encoded string"""
s = ['\\{:02X}'.format(ord(x)) for x in text_sid]
return ''.join(s)
def get_sd(fd, info):
""" Get the Security Descriptor for the opened file. """
query_req = SMB2QueryInfoRequest()
query_req['info_type'] = InfoType.SMB2_0_INFO_SECURITY
query_req['output_buffer_length'] = 65535
query_req['additional_information'] = info
query_req['file_id'] = fd.file_id
req = fd.connection.send(query_req, sid=fd.tree_connect.session.session_id, tid=fd.tree_connect.tree_connect_id)
resp = fd.connection.receive(req)
query_resp = SMB2QueryInfoResponse()
query_resp.unpack(resp['data'].get_value())
security_descriptor = SMB2CreateSDBuffer()
security_descriptor.unpack(query_resp['buffer'].get_value())
return security_descriptor
with smbclient.open_file(filePath, mode='rb', buffering=0,
desired_access=FilePipePrinterAccessMask.READ_CONTROL) as fd:
sd = get_sd(fd.fd, SecurityInfo.Owner | SecurityInfo.Dacl)
# returns SID
_sid = sd.get_owner()
try:
# Don't forget to convert the SID string-like object to a string
# or you get an error related to "0" not existing
sid = guid2hex(str(_sid))
except:
print(f"Failed to convert SID {_sid} to HEX")
raise
conn.search('DC=dell,DC=com',f"(&(objectSid={sid}))",SUBTREE)
# Will return an empty array if no results are found
return [res['dn'].split(",")[0].replace("CN=","") for res in conn.response if 'dn' in res]
to use:
# Client config is required if on linux, not if running on windows
smbclient.ClientConfig(username=username, password=password)
# Setup LDAP session
server = Server('mydomain.com',get_info=ALL,use_ssl = True)
# you can turn off raise_exceptions, or leave it out of the ldap connection
# but I prefer to know when there are issues vs. silently failing
conn = Connection(server, user="domain\username", password=password, raise_exceptions=True,authentication=NTLM)
conn.start_tls()
conn.open()
conn.bind()
# Run the check
fileCheck = r"\\shareserver.server.com\someNetworkShare\someFile.txt"
owner = getFileOwner(smbclient, conn, fileCheck)
# Unbind ldap session
# I'm not clear if this is 100% required, I don't THINK so
# but better safe than sorry
conn.unbind()
# Print results
print(owner)
Now, this isn't super efficient. It takes 6 seconds for me to run this one a SINGLE file. So if you wanted to run some kind of ownership scan, then you probably want to just write the program in C++ or some other low-level language instead of trying to use python. But for something quick and dirty this does work. You could also setup a threading pool and run batches. The piece that takes longest is connecting to the file itself, not running the ldap query, so if you can find a more efficient way to do that you'll be golden.
Terminology Warning, Owner != Creator/Author
Last note on this. Owner != File Author. Many domain environments, and in particular SMB shares, automatically alter ownership from the creator to a group. In my case the results of the above is:
What I was actually looking for was the creator of the file. File creator and modifier aren't attributes which windows keeps track of by default. An administrator can enable policies to audit file changes in a share, or auditing can be enabled on a file-by-file basis using the Security->Advanced->Auditing functionality for an individual file (which does nothing to help you determine the creator).
That being said, some applications store that information for themselves. For example, if you're looking for Excel this answer provides a method for which to get the creator of any xls or xlsx files (doesn't work for xlsb due to the binary nature of the files). Unfortunately few files store this kind of information. In my case I was hoping to get that info for tblu, pbix, and other reporting type files. However, they don't contain this information (which is good from a privacy perspective).
So in case anyone finds this answer trying to solve the same kind of thing I did - Your best bet (to get actual authorship information) is to work with your domain IT administrators to get auditing setup.

How to check if an email is part of a conversation?

Our company has a Customer Service (CS) process where after a client reports an error with our software, we'll get an email about their complain with generic subject ("User Submitted Error") and short description of the error. We then fix the problem and email back to client. An issue may have 1, or multiple emails back and forth between our CS department and client.
My python scrip used win32com module to pull emails from Outlook and put them into dataframe, each row as an entry for a unique reported error. After reading (https://learn.microsoft.com/en-us/office/vba/api/outlook.mailitem.conversationid), I decided to go with message.ConversationID. However, the generic email subject means sometimes they would group all unrelated emails together, make ConversationID not really that unique nor useful to me.
Can someone provide me some guide on how best to tackle this issue?
outlook = win32com.client.Dispatch('Outlook.Application').GetNamespace('MAPI')
def message_to_row(message, year, start_month, end_month): # Process each email into row of information
message_time = message.ReceivedTime
winrec_time = message.ReceivedTime
rec_time = pywintypes.Time(winrec_time)
rec_year = rec_time.year
rec_month = rec_time.month
rec_day = rec_time.day
rec_time_string = str(rec_time.hour) + ":" + str(rec_time.minute)
rec_time = format(datetime.datetime.strptime(rec_time_string, "%H:%M"), "%H:%M")
if rec_year >= year:
if rec_month in range(start_month, end_month):
convo_id = message.ConversationID
message_body = message.body.replace("_", " ")
row = [convo_id, rec_year, rec_month,
rec_day, rec_time, message_body]
return row

Sending mail with Python's smtplib returns "501 5.1.3 Invalid address"

Consider this code extract:
import smtplib
from email.message import EmailMessage
body = "some content"
email = EmailMessage()
email.set_content(body, subtype='html')
to = "you#work.com"
email['From'] = "me#work.com"
email['To'] = to
email['Cc'] = ""
email['Bcc'] = ""
email['Subject'] = "Hello"
smtp_connection = smtplib.SMTP("smtp.work.com", 25)
status = smtp_connection.send_message(email)
print(str(status))
print(to)
When running the code, the mail actually arrives correctly at the destination, but the print statement returns this: {'': (501, b'5.1.3 Invalid address')}
I've seen other posts around on the internet with similar error message, where it's been a case of malformed recipient addresses causing the message not to be delivered, but in my case the emails actually do get delivered correctly. I've also made sure that the email address output'ed by the last print statement is actually correct.
Any input on how to debug this further will be appreciated.
I believe I found the answer. Looks like empty "CC" and "BCC" values causes the error. When I removed these I got rid of the error message.
If your Cc and Bcc is also empty, it is no need to attach it in email[]

Obtain the desired value from the output of a method Python

i use a method in telethon python3 library:
"client(GetMessagesRequest(peers,[pinnedMsgId]))"
this return :
ChannelMessages(pts=41065, count=0, messages=[Message(out=False, mentioned=False,
media_unread=False, silent=False, post=False, id=20465, from_id=111104071,
to_id=PeerChannel(channel_id=1111111111), fwd_from=None, via_bot_id=None,
reply_to_msg_id=None, date=datetime.utcfromtimestamp(1517325331),
message=' test message test', media=None, reply_markup=None,
entities=[], views=None, edit_date=None, post_author=None, grouped_id=None)],
chats=[Channel(creator=..............
i only need text of message ------> test message test
how can get that alone?
the telethon team say:
"This is not related to the library. You just need more Python knowledge so ask somewhere else"
thanks
Assuming you have saved the return value in some variable, say, result = client(...), you can access members of any instance through the dot operator:
result = client(...)
message = result.messages[0]
The [0] is a way to access the first element of a list (see the documentation for __getitem__). Since you want the text...:
text = message.message
Should do the trick.

How to get all users in organization in GitHub?

I try to get all users in my organization in GitHub. I can get users, but I have problem with pagination - I don't know how many page I have to get around.
curl -i -s -u "user:pass" 'https://api.github.com/orgs/:org/members?page=1&per_page=100'
Of course I can iterate all the pages until my request will not return "0", but i think this is not very good idea )
Maybe GitHub have standard method for get all users in organization?
According to Traversing with Pagination, there should be a Link response header, such as:
Link: <https://api.github.com/orgs/:org/members?page=2>; rel="next", <https://api.github.com/orgs/:org/members?page=3>; rel="last"
These headers should give you all the information needed so that you can continue getting pages.
For performance reasons, I do not think that any API exists to bypass pagination.
Here is my github-users script for instance:
#!/usr/bin/env ruby
require 'octokit'
Octokit.configure do |c|
c.login = '....'
c.password = '...'
end
get = Octokit.org(ARGV.first).rels[:public_members].get
members = get.data
urls = members.map(&:url)
while members.size > 0
next_url = get.rels[:next]
next members = [] unless next_url
get = next_url.get
members = get.data
urls << members.map(&:url)
end
puts urls
e.g. github-members stackexchange gives:
https://api.github.com/users/JasonPunyon
https://api.github.com/users/JonHMChan
https://api.github.com/users/NickCraver
https://api.github.com/users/NickLarsen
https://api.github.com/users/PeterGrace
https://api.github.com/users/bretcope
https://api.github.com/users/captncraig
https://api.github.com/users/df07
https://api.github.com/users/dixon
https://api.github.com/users/gdalgas
https://api.github.com/users/haneytron
https://api.github.com/users/jc4p
https://api.github.com/users/kevin-montrose
https://api.github.com/users/kirtithorat
https://api.github.com/users/kylebrandt
https://api.github.com/users/momow
https://api.github.com/users/ocoster-se
https://api.github.com/users/robertaarcoverde
https://api.github.com/users/rossipedia
https://api.github.com/users/shanemadden
https://api.github.com/users/sklivvz
If you are using Python and requests library then get the header response and split it like below to get the last page number
last_page_num = int(r.headers["link"].split(",")[-1].split('&page=')[-1][0])
You can use the PyGithub Python library implementing the GitHub API v3 https://pygithub.readthedocs.io/en/latest/
g = Github("ghp_your-github-token")
for member in g.get_organization("my-org").get_members():
print(member.login, member.name, member.email)

Resources