split username & password from URL in 3.8+ (splituser is deprecated, no alternative) - python-3.x

trying to filter out the user-password from a URL.
(I could've split it manually by the last '#' sign, but I'd rather use a parser)
Python gives a deprecation warning but urlparse() doesn't handle user/password.
Should I just trust the last-#-sign, or is there a new version of split-user?
Python 3.8.2 (default, Jul 16 2020, 14:00:26)
[GCC 9.3.0] on linux
>>> url="http://usr:pswd#www.site.com/path&var=val"
>>> import urllib.parse
>>> urllib.parse.splituser(url)
<stdin>:1: DeprecationWarning: urllib.parse.splituser() is deprecated as of 3.8, use urllib.parse.urlparse() instead
('http://usr:pswd', 'www.site.com/path&var=val')
>>> urllib.parse.urlparse(url)
ParseResult(scheme='http', netloc='usr:pswd#www.site.com', path='/path&var=val', params='', query='', fragment='')
#neigher with allow_fragments:
>>> urllib.parse.urlparse(url,allow_fragments=True)
ParseResult(scheme='http', netloc='us:passw#ktovet.com', path='/all', params='', query='var=val', fragment='')
(Edit: the repr() output is partial & misleading; see my answer.)

It's all there, clear and accessible.
What went wrong: The repr() here is misleading, showing only few properties / values (why? it's another question).
The result is available with explicit property get:
>>> url = 'http://usr:pswd#www.sharat.uk:8082/nativ/page?vari=valu'
>>> p = urllib.parse.urlparse(url)
>>> p.port
8082
>>> p.hostname
'www.sharat.uk'
>>> p.password
'pswd'
>>> p.username
'usr'
>>> p.path
'/nativ/page'
>>> p.query
'vari=valu'
>>> p.scheme
'http'
Or as a one-liner (I just needed the domain):
>>> urllib.parse.urlparse('http://usr:pswd#www.sharat.uk:8082/nativ/page?vari=valu').hostname
www.shahart.uk

Looking at the source code for splituser, looks like they simply use str.rpartition:
def splituser(host):
warnings.warn("urllib.parse.splituser() is deprecated as of 3.8, "
"use urllib.parse.urlparse() instead",
DeprecationWarning, stacklevel=2)
return _splituser(host)
def _splituser(host):
"""splituser('user[:passwd]#host[:port]') --> 'user[:passwd]', 'host[:port]'."""
user, delim, host = host.rpartition('#')
return (user if delim else None), host
which yes, relies on the last occurrence of #.
EDIT: urlparse still has all these fields, see Berry's answer

Related

Python: Group / rearange text

I have the following files in a folder:
a235626_1.jpg
a235626_2.jpg
a235626_3.jpg
a235626_4.jpg
a235626_5.jpg
A331744R_1.JPG
A331744R_2.jpg
A331758L_1.JPG
A331758L_2.jpg
A331758R_1.JPG
A331758R_2.jpg
A331789L_1.JPG
A331789L_2.jpg
A331789R_1.JPG
A331789R_2.jpg
A331793L_1.JPG
A331793L_2.jpg
A331826L_1.JPG
A331826L_2.jpg
A331826R_1.JPG
A331826R_2.jpg
A335531L_1.JPG
A335531R_1.JPG
A335531R_2.jpg
How can i group them so that i get:
a235626_1.jpg|a235626_2.jpg|a235626_3.jpg|a235626_4.jpg|a235626_5.jpg
A331744R_1.JPG|A331744R_2.JPG
A331758L_1.JPG|A331758L_2.JPG
... and so on.
Thanks!
Use itertools.groupby
from itertools import groupby
files = ['a235626_1.jpg', 'a235626_2.jpg', 'a235626_3.jpg', 'a235626_4.jpg', 'a235626_5.jpg', 'A331744R_1.JPG',
'A331744R_2.jpg', 'A331758L_1.JPG', 'A331758L_2.jpg', 'A331758R_1.JPG', 'A331758R_2.jpg', 'A331789L_1.JPG',
'A331789L_2.jpg', 'A331789R_1.JPG', 'A331789R_2.jpg', 'A331793L_1.JPG', 'A331793L_2.jpg', 'A331826L_1.JPG',
'A331826L_2.jpg', 'A331826R_1.JPG', 'A331826R_2.jpg', 'A335531L_1.JPG', 'A335531R_1.JPG', 'A335531R_2.jpg']
for key, items in groupby(files, lambda t: t.split('_')[0]):
print('|'.join(items))
>>> a235626_1.jpg|a235626_2.jpg|a235626_3.jpg|a235626_4.jpg|a235626_5.jpg
>>> A331744R_1.JPG|A331744R_2.jpg
>>> A331758L_1.JPG|A331758L_2.jpg
>>> A331758R_1.JPG|A331758R_2.jpg
>>> A331789L_1.JPG|A331789L_2.jpg
>>> A331789R_1.JPG|A331789R_2.jpg
>>> A331793L_1.JPG|A331793L_2.jpg
>>> A331826L_1.JPG|A331826L_2.jpg
>>> A331826R_1.JPG|A331826R_2.jpg
>>> A335531L_1.JPG
>>> A335531R_1.JPG|A335531R_2.jpg
I think this question is quite similar to your question and it has been answered before. Can you take a look at it?
How to sort file names in a particular order using python

Change locale for Google Colab

I want to change the local setting (to change the date format) in GoogleCollab
The following works for me in JupyterNotebook but not in GoogleColab:
locale.setlocale(locale.LC_TIME, 'de_DE.UTF-8')
It always returns the error: unsupported locale setting
I have already looked at many other solutions and tried everything.
One solution to change only the time zone I have seen is this one:
'!rm /etc/localtime
!ln -s /usr/share/zoneinfo/Asia/Bangkok /etc/localtime
!date
I figured this one out after a long time:
In Colab, you will have to install the desired locales. You do this with:
!sudo dpkg-reconfigure locales
This will prompt for a numeric input, e.g. 268 and 269 for Hungarian.
So you enter 268 269.
It will also prompt for the default locale, after installation. Here you will need to select your desired custom locale. This time, it is a numeric selection out of 3-5 options, depending, on how many have you selected at the previous step. In my case, I have selected 3, and the default locale became hu_HU.
You need to restart the Colab runtime: Ctrl + M then .
You need to activate the locale:
import locale
locale.setlocale(locale.LC_ALL, 'hu_HU') <- make sure you do it for the LC_ALL context.
The custom locale is now ready to use with pandas:
pd.to_datetime('2021-01-01').day_name() returns Friday, but
pd.to_datetime('2021-01-01').day_name('hu_HU') returns PĂ©ntek
I wasn't successful using German locale on Google Colab, but desired formatting could be obtained as a combination of overriding locale for decimal separator and date formatting.
German formatting rules can be found here.
For custom string formatting nice cheatsheet is here.
from datetime import datetime, timedelta
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import numpy as np
import locale
german_format_str_full = '%Y-%m-%d, %H.%M Uhr'
german_format_str_date = '%Y-%m-%d'
# genereting plot data, xs are dates with not obvious step
xs = np.arange(datetime(year=2021, month=11, day=28, hour=23, minute=59, second=59),
datetime(year=2021, month=12, day=6, hour=23, minute=59, second=59),
timedelta(hours=5,minutes=47,seconds=27))
ys = np.sin(np.arange(0,len(xs),1)) # whatever
# use overwritten locale for comma as decimal point -- German formatting
plt.rcParams['axes.formatter.use_locale'] = True
locale._override_localeconv["decimal_point"]= ','
# plot
fig, ax = plt.subplots(figsize=(9,4))
ax.plot(xs,ys, 'o-')
# set formatting string using mdates from matplotlib
ax.xaxis.set_major_formatter(mdates.DateFormatter(german_format_str_date))
# rotate formatted ticks or use autoformat 'fig.autofmt_xdate()'
plt.xticks(rotation=70)
plt.title('Google Colab plot with German locale style')
plt.show()
It gives me this plot:
If you need to check how formatting settings look like on your machine you can use locale.nl_langinfo(locale.D_T_FMT). For example:
import locale
from datetime import datetime
now = datetime.now()
# find local date time formatting on Google Colab
local_format_str = locale.nl_langinfo(locale.D_T_FMT)
print('local_format_str on Google Colab: ', local_format_str)
print('now in Google Colab default format:', now.strftime(local_format_str))
german_format_str_full = '%Y-%m-%d, %H.%M Uhr'
german_format_str_date = '%Y-%m-%d'
print('now in German format, full:',now.strftime(german_format_str_full))
print('now in German format, only date:',now.strftime(german_format_str_date))
ridiculous_format = '%Y->%m-->%d'
print('now ridiculous_format:',now.strftime(ridiculous_format))
Based on this answer I was able to load german locales. However it needs to be done in two steps: Installing new, german locale. Restarting kernel and loading german locale.
In short:
import os
# Install de_DE
!/usr/share/locales/install-language-pack de_DE
!dpkg-reconfigure locales
# Restart Python process to pick up the new locales
os.kill(os.getpid(), 9)
More detailed version:
It turned out that the list of available locales is pretty short which can be checked like this:
import locale
from datetime import datetime
now = datetime.now()
# find local date time formatting on Google Colab
local_format_str = locale.nl_langinfo(locale.D_T_FMT)
print('local_format_str on Google Colab: ', local_format_str)
print('now in Google Colab default format:', now.strftime(local_format_str))
print('Loading avaliable locales via real names...')
for real_name in set(locale.locale_alias.values()):
try:
locale.setlocale(locale.LC_ALL, real_name)
print('success: real_name = ', real_name)
except:
pass
print('Loading avaliable locales via aliases...')
for alias , real_name in locale.locale_alias.items():
try:
locale.setlocale(locale.LC_ALL, alias)
print('success: alias = ' , alias, ' , real_name = ', real_name)
except:
pass
With output:
local_format_str on Google Colab: %a %b %e %H:%M:%S %Y
now in Google Colab default format: Wed Dec 1 12:10:52 2021
Loading avaliable locales via real names...
success: real_name = en_US.UTF-8
success: real_name = C
Loading avaliable locales via aliases...
As we can see there is no german locale, so it needs to be installed with code:
import os
# Install de_DE
!/usr/share/locales/install-language-pack de_DE
!dpkg-reconfigure locales
# Restart Python process to pick up the new locales
os.kill(os.getpid(), 9)
giving an output:
Generating locales (this might take a while)...
de_DE.ISO-8859-1... done
Generation complete.
dpkg-trigger: error: must be called from a maintainer script (or with a --by-package option)
Type dpkg-trigger --help for help about this utility.
Generating locales (this might take a while)...
de_DE.ISO-8859-1... done
en_US.UTF-8... done
Generation complete.
Then we load german locale locale.setlocale(locale.LC_ALL, 'german') and the same code as at the beginning (remember about importing again packages) gives us:
Loading avaliable locales via real names...
success: real_name = C
success: real_name = en_US.UTF-8
success: real_name = de_DE.ISO8859-1
Loading avaliable locales via aliases...
success: alias = deutsch , real_name = de_DE.ISO8859-1
success: alias = german , real_name = de_DE.ISO8859-1
and the default formatting is more German:
local_format_str on Google Colab: %a %d %b %Y %T %Z
now in Google Colab default format: Mi 01 Dez 2021 12:12:03

A Question with using scapy.sniff for get the 'Ethernet Frame' in pcap files

Aim: Get the arrival time from the pcap files
Language: python3.7
Tools: Scapy.sniff
Above all ,i want get the arrival time data,in the .pcap ,when i use wireshark ,i saw the data in the Ethernet Frame,but when i use
#Scapy.sniff(offline='.pcap') ,i just get the Ether,TCP,IP and others ,so how can i get that data?
Thanx alot!
>>from scapy.all import *
>>a = sniff(offline = '***.pcap')
>>a[0]
[out]:
<Ether dst=*:*:*:*:*:* src=*:*:*:*:*:* type=** |<IP version=4 ihl=5 tos=0x20 len=52 id=14144 flags=DF frag=0 ttl=109 proto=tcp chksum=0x5e3b src=*.*.*.* dst=*.*.*.* |<TCP sport=gcsp dport=http seq=1619409885 ack=1905830025 dataofs=8 reserved=0 flags=A window=65535 chksum=0xfdb5 urgptr=0 options=[('NOP', None), ('NOP', None), ('SAck', (1905831477, 1905831485))] |>>>
[ ]:
The packet time from the pcap is available in the time member:
print(a[0].time)
It's kept as a floating point value (the standard python "timestamp" format). To get it in a form more easily understandable, you may want to use the datetime module:
>>> from datetime import datetime
>>> dt = datetime.fromtimestamp(a[0].time)
>>> print(dt)
2018-11-12 03:03:00.259780
The scapy documentation isn't great. It can be very instructive to use the interactive help facility. For example, in the interpreter:
$ python
>>> from scapy.all import *
>>> a = sniff(offline='mypcap.pcap')
>>> help(a[0])
This will show you all the methods and attributes of the object represented by a[0]. In your case, that is an instance of class Ether(scapy.packet.Packet).

how to get only date string from a long string

I know there are lots of Q&As to extract datetime from string, such as dateutil.parser, to extract datetime from a string
import dateutil.parser as dparser
dparser.parse('something sep 28 2017 something',fuzzy=True).date()
output: datetime.date(2017, 9, 28)
but my question is how to know which part of string results this extraction, e.g. i want a function that also returns me 'sep 28 2017'
datetime, datetime_str = get_date_str('something sep 28 2017 something')
outputs: datetime.date(2017, 9, 28), 'sep 28 2017'
any clue or any direction that i can search around?
Extend to the discussion with #Paul and following the solution from #alecxe, I have proposed the following solution, which works on a number of testing cases, I've made the problem slight challenger:
Step 1: get excluded tokens
import dateutil.parser as dparser
ostr = 'something sep 28 2017 something abcd'
_, excl_str = dparser.parse(ostr,fuzzy_with_tokens=True)
gives outputs of:
excl_str: ('something ', ' ', 'something abcd')
Step 2 : rank tokens by length
excl_str = list(excl_str)
excl_str.sort(reverse=True,key = len)
gives a sorted token list:
excl_str: ['something abcd', 'something ', ' ']
Step 3: delete tokens and ignore space element
for i in excl_str:
if i != ' ':
ostr = ostr.replace(i,'')
return ostr
gives a final output
ostr: 'sep 28 2017 '
Note: step 2 is required, because it will cause problem if any shorter token a subset of longer ones. e.g., in this case, if deletion follows an order of ('something ', ' ', 'something abcd'), the replacement process will remove something from something abcd, and abcd will never get deleted, ends up with 'sep 28 2017 abcd'
Interesting problem! There is no direct way to get the parsed out date string out of the bigger string with dateutil. The problem is that dateutil parser does not even have this string available as an intermediate result as it really builds parts of the future datetime object on the fly and character by character (source).
It, though, also collects a list of skipped tokens which is probably your best bet. As this list is ordered, you can loop over the tokens and replace the first occurrence of the token:
from dateutil import parser
s = 'something sep 28 2017 something'
parsed_datetime, tokens = parser.parse(s, fuzzy_with_tokens=True)
for token in tokens:
s = s.replace(token.lstrip(), "", 1)
print(s) # prints "sep 28 2017"
I am though not 100% sure if this would work in all the possible cases, especially, with the different whitespace characters (notice how I had to workaround things with .lstrip()).

Python3.6.6 argparse for negative string values with list of arguments

I'm having issues with negative strings being fed into argparse and resulting in errors. I am wondering if anyone has figured out a way around this. Unfortunately, I need the strings to be prepended by a negative sign in some cases, so I cannot fix this by removing the negation part.
I've taken a look at some other stackoverflow pages, such as How to parse positional arguments with leading minus sign (negative numbers) using argparse but still have no solution for this. I'm sure that someone must have a solution for this!
Here is what I have tried and am seeing:
PyDev console: starting.
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:07:29)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
import argparse
argparser = argparse.ArgumentParser()
# arguments:
argparser.add_argument("--configfile", "--config", type=str, default=None,
help="A config file to parse (see src/configs/sample_config.ini for more details).")
argparser.add_argument("--starttime", "--st", type=str, default="-1h#m",
help="Starting time for the dump; default: one hour ago rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--endtime", "--et", type=str, default="#m",
help="Ending time for the dump; default: current time rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--indexes", "--i", type=str, nargs="+", default="-config-",
help="Provide one or more indexes (comma-separated with quotes around the list; "
"ex: \"index1, index two, index3\") to search on. Default is \"-config-\" "
"which means that the indeces will be gathered from file names; see the sample "
"config for details.")
argparser.add_argument("--format", "--f", type=str, default="csv",
help="Write data out to CSV file.")
import shlex
args = argparser.parse_args(shlex.split("--configfile /somepath/sample_config_test04.ini --endtime now --indexes \"index one\" index2 index3 --format csv --starttime -5h#m"))
usage: pydevconsole.py [-h] [--configfile CONFIGFILE] [--starttime STARTTIME]
[--endtime ENDTIME] [--indexes INDEXES [INDEXES ...]]
[--format FORMAT]
pydevconsole.py: error: argument --starttime/--st: expected one argument
Process finished with exit code 2
As you can see the above fails, but the following works fine, so I am sure that it's an issue with the "-":
PyDev console: starting.
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:07:29)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
import argparse
argparser = argparse.ArgumentParser()
# arguments:
argparser.add_argument("--configfile", "--config", type=str, default=None,
help="A config file to parse (see src/configs/sample_config.ini for more details).")
argparser.add_argument("--starttime", "--st", type=str, default="-1h#m",
help="Starting time for the dump; default: one hour ago rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--endtime", "--et", type=str, default="#m",
help="Ending time for the dump; default: current time rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--indexes", "--i", type=str, nargs="+", default="-config-",
help="Provide one or more indexes (comma-separated with quotes around the list; "
"ex: \"index1, index two, index3\") to search on. Default is \"-config-\" "
"which means that the indeces will be gathered from file names; see the sample "
"config for details.")
argparser.add_argument("--format", "--f", type=str, default="csv",
help="Write data out to CSV file.")
import shlex
args = argparser.parse_args(shlex.split("--configfile /somepath/sample_config_test04.ini --endtime now --indexes \"index one\" index2 index3 --format csv --starttime 5h#m"))
print(args)
Namespace(configfile='/somepath/sample_config_test04.ini', endtime='now', format='csv', indexes=['index one', 'index2', 'index3'], starttime='5h#m')
You may be wondering why I'm calling the code like this; that's because I need to do some unittest-ing of the argparse calls that I am running, so I need to be able to call it from both the commandline as well as from my unittesting code.
If I call the same code from commandline without any quotes or \ in front of the -5h#m, that seems to work fine, but only for commandline (gets converted to \-5h#m). I have tried --starttime \"-5h#m\" and --starttime \-5h#m, -5h#m, '\-5h#m', etc. but nothing seems to be accepted and correctly parsed by argparse other than cmdl input.
The error is typically:
test.py: error: argument --starttime/--st: expected one argument
Any help would be greatly appreciated!
Updates: changing the input to be alike to -configfile=/somepath/sample_config_test04.ini -endtime=now -indexes="index one, index2, index3" -format=csv -starttime=-5h#m seems to work from command line.
NOTE: I'd like to keep this answer mainly because the other suggested answer has a very weirdly phrased title that I would not be have been able to find by doing a google search for what I needed. I did update the question to also reflect that I also had listed items as part of it.

Resources