No MESSAGE-ID and get imap_tools work for imap.mail.yahoo.com - python-3.x

The question is twofold, about getting MESSAGE-ID, and using imap_tools. For an email client ("handmade") in Python I need to lessen the data amount read from the server (presently it takes 2 min to read the whole mbox folder of ~170 msg for yahoo), I believe that having MESSAGE-ID will help me.
imap_tools has IDLE command which is essential to keep the yahoo server connection alive and other features which I believe will simplify the code.
To learn about MESSAGE-ID I started with the following code (file fetch_ssl.py):
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import imaplib
import email
import os
import ssl
import conf
# Why UID==1 has no MESSAGE-ID ?
if __name__ == '__main__':
args = conf.parser.parse_args()
host, port, env_var = conf.config[args.host]
if 0 < args.verbose:
print(host, port, env_var)
with imaplib.IMAP4_SSL(host, port,
ssl_context=ssl.create_default_context()) as mbox:
user, pass_ = os.getenv('USER_NAME_EMAIL'), os.getenv(env_var)
mbox.login(user, pass_)
mbox.select()
typ, data = mbox.search(None, 'ALL')
for num in data[0].split():
typ, data = mbox.fetch(num, '(RFC822)')
msg = email.message_from_bytes(data[0][1])
print(f'num={int(num)}, MESSAGE-ID={msg["MESSAGE-ID"]}')
ans = input('Continue[Y/n]? ')
if ans.upper() in ('', 'Y'):
continue
else:
break
Where conf.py is:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import argparse
HOST = 'imap.mail.yahoo.com'
PORT = 993
config = {'gmail': ('imap.gmail.com', PORT, 'GMAIL_APP_PWD'),
'yahoo': ('imap.mail.yahoo.com', PORT, 'YAHOO_APP_PWD')}
parser = argparse.ArgumentParser(description="""\
Fetch MESSAGE-ID from imap server""")
parser.add_argument('host', choices=config)
parser.add_argument('-verbose', '-v', action='count', default=0)
fetch_ssl.py outputs:
$ python fetch_ssl.py yahoo
num=1, MESSAGE-ID=None
Continue[Y/n]?
num=2, MESSAGE-ID=<83895140.288751#communications.yahoo.com>
Continue[Y/n]? n
I'd like to understand why the message with UID == 1 has no MESSAGE-ID? Does that happen from time to time (I mean there are messages with no MESSAGE-ID)? How to handle these cases? I haven't found such cases for gmail.
Then I attempted to do similar with imap_tools (Version: 0.56.0), (file fetch_tools.py):
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import ssl
from imap_tools import MailBoxTls
import conf
# https://github.com/ikvk/imap_tools/blob/master/examples/tls.py
# advices
# ctx.load_cert_chain(certfile="./one.crt", keyfile="./one.key")
if __name__ == '__main__':
args = conf.parser.parse_args()
host, port, env_var = conf.config[args.host]
if 0 < args.verbose:
print(host, port, env_var)
user, pass_ = os.getenv('USER_NAME_EMAIL'), os.getenv(env_var)
ctx = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
ctx.options &= ~ssl.OP_NO_SSLv3
# imaplib.abort: socket error: EOF
with MailBoxTls(host=host, port=port, ssl_context=ctx) as mbox:
mbox.login(user, pass_, 'INBOX')
for msg in mbox.fetch():
print(msg.subject, msg.date_str)
Command
$python fetch_tools.py yahoo
outputs:
Traceback (most recent call last):
File "/home/vlz/Documents/python-scripts/programming_python/Internet/Email/ymail/imap_tools_lab/fetch_tools.py", line 20, in <module>
with MailBoxTls(host=host, port=port, ssl_context=ctx) as mbox:
File "/home/vlz/Documents/.venv39/lib/python3.9/site-packages/imap_tools/mailbox.py", line 322, in __init__
super().__init__()
File "/home/vlz/Documents/.venv39/lib/python3.9/site-packages/imap_tools/mailbox.py", line 35, in __init__
self.client = self._get_mailbox_client()
File "/home/vlz/Documents/.venv39/lib/python3.9/site-packages/imap_tools/mailbox.py", line 328, in _get_mailbox_client
client = imaplib.IMAP4(self._host, self._port, self._timeout) # noqa
File "/usr/lib/python3.9/imaplib.py", line 205, in __init__
self._connect()
File "/usr/lib/python3.9/imaplib.py", line 247, in _connect
self.welcome = self._get_response()
File "/usr/lib/python3.9/imaplib.py", line 1075, in _get_response
resp = self._get_line()
File "/usr/lib/python3.9/imaplib.py", line 1185, in _get_line
raise self.abort('socket error: EOF')
imaplib.abort: socket error: EOF
Command
$ python fetch_tools.py gmail
Produces identical results. What are my mistakes?
Using Python 3.9.2, Debian GNU/Linux 11 (bullseye), imap_tools
(Version: 0.56.0)
EDIT
Headers from the message with no MESSAGE-ID
X-Apparently-To: vladimir.zolotykh#yahoo.com; Sun, 25 Oct 2015 20:54:21 +0000
Return-Path: <mail#product.communications.yahoo.com>
Received-SPF: fail (domain of product.communications.yahoo.com does not designate 216.39.62.96 as permitted sender)
...
X-Originating-IP: [216.39.62.96]
Authentication-Results: mta1029.mail.bf1.yahoo.com from=product.communications.yahoo.com; domainkeys=neutral (no sig); from=product.communications.yahoo.com; dkim=pass (ok)
Received: from 127.0.0.1 (EHLO n3-vm4.bullet.mail.gq1.yahoo.com) (216.39.62.96)
by mta1029.mail.bf1.yahoo.com with SMTPS; Sun, 25 Oct 2015 20:54:21 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=product.communications.yahoo.com; s=201402-std-mrk-prd; t=1445806460; bh=5PTgF8Jghm92xeMD5mSHp6A3eRVV70PWo1oQ15K7Tfk=; h=Date:From:Reply-To:To:Subject:From:Subject; b=D7ItgOiuLbiexJGHvORgbpRi22X+sYso6gwZKDXVca79DxMMy2R1dUtZTIg7tcft1lovVJUDw/7fC51orDltRidlfnpayeY8lT+94DRlSBwopuxgOqqR9oTTjTBZ0oEvdxUcXl/q54N2GxuBFvmg8UO0OZoCnFPpUVYo9x4arMjt/0TOW1Q5d/yjdmO7iwiued/rliP/Bsq0TaZYcb0oCAT7Q50tb1fB7wcXLYNSC1OCQ1l1LajbUqmU1LWWNse36mUUTBieO2sZT0ERFrHaCTaTNQSXKQG2AxYF7Dd/8i0Iq3xqdcS0bDpjmWE25uoKvCdtXtUbylsuQSChuLFMTw==
Received: from [216.39.60.185] by n3.bullet.mail.gq1.yahoo.com with NNFMP; 25 Oct 2015 20:54:20 -0000
Received: from [98.137.101.84] by t1.bullet.mail.gq1.yahoo.com with NNFMP; 25 Oct 2015 20:54:20 -0000
Date: 25 Oct 2015 20:54:20 +0000
Received: from [127.0.0.1] by nu-repl01.direct.gq1.yahoo.com with NNFMP; 25 Oct 2015 20:54:20 -0000
X-yahoo-newman-expires: 1445810060
From: "Yahoo Mail" <mail#product.communications.yahoo.com>
Reply-To: replies#communications.yahoo.com
To: <ME>#yahoo.com
Subject: Welcome to Yahoo! Vladimir
X-Yahoo-Newman-Property: ydirect
Content-Type: text/html
Content-Length: 25180
I skipped only X-YMailISG.
EDIT II
Of 167 messages 21 have no MESSAGE-ID header.
fetch_ssl.py takes 4m12.342s, and fetch_tools.py -- 3m41.965s

It looks simply like your email without a Message-ID legitimately does not have one; it appears the welcome email Yahoo sent you actually lacks it. Since it's a system generated email, that's not that unexpected. You'd just have to skip over it.
The second problem is that you need to use imap_tools.MailBox.
Looking at the documentation and source at the repo it appears that the relevant classes to use are:
MailBox - for a normal encrypted connection. This is what most email servers use these days, aka IMAPS (imap with SSL/TLS)
MailBoxTls - For a STARTTLS connection: this creates a plaintext connection then upgrades it later by using a STARTTLS command in the protocol. The internet has mostly gone to the "always encrypted" rather than "upgrade" paradigm, so this is not the class to use.
MailBoxUnencrypted - Standard IMAP without SSL/TLS. You should not use this on the public internet.
The naming is a bit confusing. MailBox corresponds to imaplib.IMAP4_SSL; MailBoxTls corresponds to imaplib.IMAP4, then using startls() on the resulting connection; and MailboxUnencrypted corresponds to imaplib.IMAP4 with no security applied. I imagine it's this way so the most common one (Mailbox) is a safe default.

Related

PXSSH Connection fails sometimes randomly after upgrading to Python3

I am trying to create a ssh session using pexpect.pxssh as follows:
from pexpect import pxssh
connection = pxssh.pxssh()
connection.login('localhost', username, password, port=port, check_local_ip=False)
"""
Fails with the following error
pexpect.pxssh.ExceptionPxssh: Could not establish connection to host
"""
Also I create two sessions one after the other, the first session connects without a problem but the second session fails to connect with the same code. Also, sometimes the code works properly and is able to connect both times. I have also added retries just to be sure that it's not a random event.
Another thing to note is that this code runs without a problem with Python 2 but with Python 3 this happens. I couldn't find any difference in the connection mechanism b/w Python2 and Python3. Any help will be appreciated!
EDIT: After adding logging as per comment:
2021-06-25 10:49:37 INFO - Attempting to connect to device on port 10022.
Connecting to USB device...
Jun 25 10:49:37 tcprelay[203] : Created thread to connect [::1]:10022->[::1]:58316<12> to unix:0<15>
user#localhost's password: xxxxx
Jun 25 10:49:37 tcprelay[203] : Exiting thread to connect [::1]:10022->[::1]:58316 to unix:0
Connecting to USB device...
Jun 25 10:49:38 tcprelay[203] : Created thread to connect [::1]:10022->[::1]:58317<12> to unix:0<15>
user#localhost's password: xxxxx
Jun 25 10:49:39 tcprelay[203] : Exiting thread to connect [::1]:10022->[::1]:58316 to unix:0
The code retries 2 times and then fails.
Note: I am adding a port offset of 10000 using tcprelay
EDIT:
Sorry I was not logging the error properly.
2021-06-25 15:45:34 - ERROR - Failed to connect. Retrying...
2021-06-25 15:45:34 - ERROR - End Of File (EOF). Empty string style platform.
<pexpect.pxssh.pxssh object at 0x127feb0a0>
command: /usr/bin/ssh
args: ['/usr/bin/ssh', '-q', '-oNoHostAuthenticationForLocalhost=yes', '-p', 'xxxxx', '-l', 'xxxxx', 'localhost']
buffer (last 100 chars): b''
before (last 100 chars): b' \r\n'
after: <class 'pexpect.exceptions.EOF'>
match: None
match_index: None
exitstatus: None
flag_eof: True
pid: 30020
child_fd: 26
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: <_io.BufferedWriter name='<stdout>'>
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_re:
0: re.compile(b'(?i)are you sure you want to continue connecting')
1: re.compile(b'[#$]')
2: re.compile(b'(?i)(?:password:)|(?:passphrase for key)')
3: re.compile(b'(?i)permission denied')
4: re.compile(b'(?i)terminal type')
5: TIMEOUT
Traceback (most recent call last):
File "/src/helpers/utilities.py", line 590, in try_connect_ssh
connection.make_connection(ipaddress=ipaddress, user=user,
File "/src/transport/myssh.py", line 26, in make_connection
self.ssh_process.login(ipaddress, user, password, port=port, sync_multiplier=5, check_local_ip=False)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pexpect/pxssh.py", line 418, in login
i = self.expect(session_regex_array)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pexpect/spawnbase.py", line 343, in expect
return self.expect_list(compiled_pattern_list,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pexpect/spawnbase.py", line 372, in expect_list
return exp.expect_loop(timeout)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pexpect/expect.py", line 179, in expect_loop
return self.eof(e)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pexpect/expect.py", line 122, in eof
raise exc
pexpect.exceptions.EOF: End Of File (EOF). Empty string style platform.
<pexpect.pxssh.pxssh object at 0x127feb0a0>
EDIT:
Using macOS Mojave
pExpect 3.8.0
Device asks for password but after password is sent the connection returns EOF

ParallelSSHClient - Python - Handle authentication errors

I have a problem which I didn't found the solution here.
I am working with SSHClient for connecting to multiple servers.
But, if there is 1 server in the list that I cannot access with my username and password (SSH), it's throwing me an exception.
I've tried to work with try and except but it didn't work.
Here is my original code:
from pssh.clients import ParallelSSHClient
import configparser
res = []
servers = ['test1', 'test2', 'test3', 'test4']
client = ParallelSSHClient(servers, user='test', password='test')
output = client.run_command('service ntpd status')
client.join(output)
for host_out in output:
for line in host_out.stdout:
if 'running' or 'Running' in line:
continue
else:
res.append(host_out.host + ' is not running')
if res:
return res
else:
return "All servers are running"
server test2 isn't accessible with my username so the script is throwing me an exception and failing the script:
pssh.exceptions.AuthenticationError: No authentication methods succeeded
How can I continue the script without the server test2 (if it is not accessible)
try something like this:
from pssh.clients import ParallelSSHClient
from pssh.config import HostConfig
host_config = [
HostConfig(user='user',
password='pwd'),
HostConfig(user='user',
password='pwd'),
HostConfig(user='user',
password='pwd')
]
servers = ['10.97.180.90', '10.97.180.99', "10.97.180.88"]
client = ParallelSSHClient(servers, host_config=host_config, num_retries=1, timeout=3)
output = client.run_command('uname -a', stop_on_errors=False, timeout=3)
for host_output in output:
try:
for line in host_output.stdout:
print(line)
exit_code = host_output.exit_code
except TypeError:
print("timeOut...")
And output looks like this:
Linux hostName-001 3.10.0-957.21.3.el7.x86_64 #1 SMP Fri Jun 14 02:54:29 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux
Linux hostName-003 3.10.0-1160.15.2.el7.x86_64 #1 SMP Thu Jan 21 16:15:07 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
timeOut...
More info can be found here on stackoverflow:
Python parallel-ssh run_command does not timeout when using pssh.clients

How do I change the line endings used by PExpect output

The returned output from pexpect.run() includes \r\n at the end of every line. Printing to the terminal using print(returnVal.decode()) correctly prints one line for each line returned. When I examine the output I see that the byte string contains \r\n. When I log that to a file I get double returns to the log file. I'm on a Mac using Python 3.7. Is there a way to set the preferred new line when writing the output? I am using pythons logging class and using the info() method to write the string. Output looks like this:
total 80
-rw-r--r-- 1 xxxx admin 1048 Nov 12 00:41 Constants.py
-rw-r--r-- 1 xxxx admin 5830 Nov 12 13:33 file1.py
-rw-r--r-- 1 xxxx admin 2255 Nov 12 00:51 file2.py
When it should look like:
total 80
-rw-r--r-- 1 xxxx admin 1048 Nov 12 00:41 Constants.py
-rw-r--r-- 1 xxxx admin 5830 Nov 12 13:33 file1.py
-rw-r--r-- 1 xxxx admin 2255 Nov 12 00:51 file2.py
Here is a simplified version of my original Logger class:
class Logger():
def __init__( self, path ):
msgFormat = '%(asctime)s.%(msecs)d\t%(message)s'
dateFormat = '%m/%d/%Y %H:%M:%S'
logging.basicConfig( format=msgFormat, datefmt=dateFormat, filename=path, level=logging.INFO )
def Log ( self, theStr ):
logging.info( str( theStr ))
The string being returned from Pexpect looks something like:
Line1\r\nLine2
Depending on how you log the output, it's advisable to format the newlines before sending to logger. However, if you must override the logging module's newline parameter for FileHandler, and as an experiment, you can do so by monkey patching its _open method as the functionality isn't available by default.
I used source code for Python version 3.8 to get _open function's definition.
import logging
def custom_open(self):
"""
Monkey patched _open function of class logging.FileHandler (Python 3.8)
"""
return open(self.baseFilename, self.mode, encoding=self.encoding, newline='')
logging.FileHandler._open = custom_open
if __name__ == "__main__":
pexpect_return = "Output\nTest"
my_log = logging.getLogger("test_logger")
my_log.setLevel(logging.INFO)
my_log.addHandler(logging.FileHandler("test.log"))
my_log.info(pexpect_return)
How it works
Python's logging module has a class FileHandler, which uses a method _open to create a file handler object to write and append to log files on disk. Its default implementation as of version 3.8 does not have the newline parameter so it uses default newlines.
Monkey patching is when you replace or update a method/function in one of your imported classes, as the program is running. This line logging.FileHandler._open = custom_open tells python to replace the _open method of the FileHandler class, with my custom_open method. Then later when I use my_log.addHandler(logging.FileHandler("test.log")), the new custom_open method is used to open the file with newline paramater.
You can further confirm that the new method is used to open the file by adding a suffix to the file name like this:
return open(self.baseFilename+"__Monkey_Patched", self.mode, encoding=self.encoding, newline='')
If you will now run that demo code, the filename will be "test.log__Monkey_Patched".
This code, however, will not replace any newline characters which you pass to the logger as part of the string to log. You need to process that beforehand.

Pyzmail/IMAPclient: Can't figure out what key to use

I'm following this guide: https://automatetheboringstuff.com/chapter16/#calibre_link-45
to scrape emails and I am having issues using pyzmail.PyzMessage.factory(). I keep getting a KeyError.
I took the advice from here: Python email bot Pyzmail/IMAPclient error
but I continued to get the same error.
imapObj = imapclient.IMAPClient("imap.gmail.com", ssl = True)
imapObj.login("MY_EMAIL_ADDRESS", "MY_PASSWORD")
imapObj.select_folder("INBOX", readonly=False)
UIDs = imapObj.gmail_search("test1")
print(UIDs)
rawMessages = imapObj.fetch(UIDs, ["BODY[]"])
pprint.pprint(rawMessages)
message = pyzmail.PyzMessage.factory(rawMessages[40041][b'BODY[]'])
I am getting this error:
[7156]
Traceback (most recent call last):
File "C:/Users/Logan/PycharmProjects/email_sending_test/venv/main1.py", line 17, in <module>
message = pyzmail.PyzMessage.factory(rawMessages[0][b'BODY[]'])
KeyError: b'BODY[]'
defaultdict(<class 'dict'>,
{7156: {b'BODY[]': b'MIME-Version: 1.0\r\nDate: Thu, 3 Jan 2019 16:'
b'51:54 -0500\r\nMessage-ID: <CAB4Lt1swQPJvCL3ot'
b'8E7q2Pc9_C26hZxMdUgcZd9LbJUyhZbvw#mail.gmail'
b'.com>\r\nSubject: test1\r\nFrom: Rob Roberts'
b' <swimmingonanarwhal#gmail.com>\r\nTo: Rob Rob'
b'erts <swimmingonanarwhal#gmail.com>\r\nContent'
b'-Type: multipart/alternative; boundary="0000'
b'000000006f5b28057e94c5de"\r\n\r\n--000000000'
b'0006f5b28057e94c5de\r\nContent-Type: text/plai'
b'n; charset="UTF-8"\r\n\r\ntrying this ou'
b't\r\n\r\n--0000000000006f5b28057e94c5de\r\nCon'
b'tent-Type: text/html; charset="UTF-8"\r\n\r'
b'\n<div dir="ltr">trying this out</div>\r\n\r'
b'\n--0000000000006f5b28057e94c5de--',
b'SEQ': 6962}})
Process finished with exit code 1

How to search tweets from an id to another id

I'm trying to get tweets using TwitterSearch in Python3.
So basically I want to get all tweets between these 2 IDs.
748843914254249984 ->760065085616250880
These 2 IDs are from the
Fri Jul 01 11:41:16 +0000 2016 to Mon Aug 01 10:50:12 +0000 2016
So here's the code I made.
crawl.py
#!/usr/bin/python3
# coding: utf-8
from TwitterSearch import *
import datetime
def crawl():
try:
tso = TwitterSearchOrder()
tso.set_keywords(["keyword"])
tso.set_since_id(748843914254249984)
tso.set_max_id(760065085616250880)
ACCESS_TOKEN = xxx
ACCESS_SECRET = xxx
CONSUMER_KEY = xxx
CONSUMER_SECRET = xxx
ts = TwitterSearch(
consumer_key = CONSUMER_KEY,
consumer_secret = CONSUMER_SECRET,
access_token = ACCESS_TOKEN,
access_token_secret = ACCESS_SECRET
)
for tweet in ts.search_tweets_iterable(tso):
print(tweet['id_str'], '-', tweet['created_at'])
except TwitterSearchException as e:
print( e )
if __name__ == '__main__':
crawl()
I'm not very familiar with Twitter API and searching with it. But this code should do the job.
But it's giving :
760058064816988160 - Mon Aug 01 10:22:18 +0000 2016
[...]
760065085616250880 - Mon Aug 01 10:50:12 +0000 2016
Many, many times... Like I got the same lines over and over again instead of getting everything between my two IDs.
So I'm not getting any of the July tweets, any idea why ?
TL;DR
Remove the tso.set_max_id(760065085616250880) line.
Explanation (as far as I understand it)
I have found your problem in the TwitterSearch Docs:
"The only parameter with a default value is count with 100. This is because it is the maximum of tweets returned by this very Twitter API endpoint."
If I check this in your code by creating a search URL, I get:
tso.create_search_url()
#?q=Vuitton&since_id=748843914254249984&count=100&max_id=760065085616250880
which contains count=100 (meaning it will get the first page of 100 tweets). And, in contrast with removing the set_since_id and set_max_id which also has count=100 and retrieves many more tweets, it stops at 100 tweets.
set_since_id without set_max_id works, the other way around doesn't. So removing the max_id=760065085616250880 from the search URL resulted in the results you want.
If anyone can explain why set_max_id is not working along, please edit my answer.

Resources