how to format the variable x into format(platecode) - string

import string
with open('platenon.txt', 'w') as f:
for platecode in range(1000):
x =['A' + upper_char for upper_char in string.ascii_uppercase]
f.write('KJA{0:03d}'.format(platecode))

To get a list of all combinations of two letters from 'AA' to 'ZZ':
import string
import product
list(''.join(pair) for pair in itertools.product(string.lowercase, repeat=2))

If I understand you question correctly you want to get a list which contains the strings 'AA' - 'AZ' ['AA', 'AB', 'AC', ..., 'AZ']?
import string
upper_chars = ['A' + upper_char for upper_char in string.ascii_uppercase]
To get a list with all strings from 'AA' to 'ZZ' you can use this in python3
from string import ascii_uppercase
from itertools import product
[''.join(c) for c in product(string.ascii_uppercase, string.ascii_uppercase)]

Related

Python Lists - find uncommon elements

I want to compare two lists in order to create a new list of specific elements found in one list but not in the other. I want the code to return all occurrences of unmatched values.
input:
list1=7,7,8,9
list2=8,9
desired output= 7,7
import numpy as np
list1 = input("Input list1 : ").split(",")
list2 = input("Input list list2 : ").split(",")
main_list = np.setdiff1d(list1,list2)
print(main_list)
You could do:
[i for i in list1 if i in (set(list1) - set(list2))]
using numpy:
import numpy as np
np.array(list1)[np.in1d(list1, np.setdiff1d(list1, list2))].tolist()

How to create datetime index from string in python?

There are three files with names: file_2018-01-01_01_temp.tif, file_2018-01-01_02_temp.tif and file_2018-01-01_03_temp.tif. I want to list them names as ['2018010101', '2018010102', '2018010103'] in python.
The below code create an incorrect list.
import pandas as pd
from glob import glob
from os import path
pattern = '*.tif'
filenames = [path.basename(x) for x in glob(pattern)]
pd.DatetimeIndex([pd.Timestamp(f[5:9]) for f in filenames])
Result:
DatetimeIndex(['2018-01-01', '2018-01-01', '2018-01-01']
I think simpliest is indexing with replace in list comprehension:
a = [f[5:18].replace('_','').replace('-','') for f in filenames]
print (a)
['2018010101', '2018010102', '2018010103']
Similar with Series.str.replace:
a = pd.Index([f[5:18] for f in filenames]).str.replace('\-|_', '')
print (a)
Index(['2018010101', '2018010102', '2018010103'], dtype='object')
Or convert values to DatetimeIndex and then use DatetimeIndex.strftime:
a = pd.to_datetime([f[5:18] for f in filenames], format='%Y-%m-%d_%H').strftime('%Y%m%d%H')
print (a)
Index(['2018010101', '2018010102', '2018010103'], dtype='object')
EDIT:
dtype is in object, but it must be in dtype='datetime64[ns]
If need datetimes, then formating has to be default, not possible change it:
d = pd.to_datetime([f[5:18] for f in filenames], format='%Y-%m-%d_%H')
print (d)
DatetimeIndex(['2018-01-01 01:00:00', '2018-01-01 02:00:00',
'2018-01-01 03:00:00'],
dtype='datetime64[ns]', freq=None)

converting a string into parameter for GET request in python

I am calling a GET API to retrieve some data. For get call I need to covert my keyword as
keyword = "mahinder singh dhoni"
into
caption%3Amahinder%2Ccaption%3Asingh%2Ccaption%3Adhoni
I am new to python and dont know the pythonic way. I am doing like this
caption_heading = "caption%3A"
caption_tail = "%2Ccaption%3A"
keyword = "mahinder singh dhoni"
x = keyword.split(" ")
new_caption_keyword = []
new_caption_keyword.append(caption_heading)
for data in x:
new_caption_keyword.append(data)
new_caption_keyword.append(caption_tail)
search_query = ''.join(new_caption_keyword)
search_query = search_query[:-13]
print("new transformed keyword", search_query)
Is there a better way to do this.I means this is kind of hard coding.
Thanks
Best to turn our original string into a list:
>>> keyword = "mahinder singh dhoni"
>>> keyword.split()
['mahinder', 'singh', 'dhoni']
Then your actual string looks like caption:...,caption:...,caption:..., that can be done with a join and a format:
>>> # if you're < python3.6, use 'caption:{}'.format(part)`
>>> [f'caption:{part}' for part in keyword.split()]
['caption:mahinder', 'caption:singh', 'caption:dhoni']
>>> ','.join([f'caption:{part}' for part in keyword.split()])
'caption:mahinder,caption:singh,caption:dhoni'
And finally you'll urlencode using urllib.parse:
>>> import urllib.parse
>>> urllib.parse.quote(','.join([f'caption:{part}' for part in keyword.split()]))
'caption%3Amahinder%2Ccaption%3Asingh%2Ccaption%3Adhoni'
so try this way,
instead of split you can replace " " empty space with "%2Ccaption%3A" and start your string with "caption%3A"
for 2.x:
>>> from urllib import quote
>>> keyword = "mahinder singh dhoni"
>>> quote(','.join(['caption:%s'%i for i in keyword.split()]))
for 3.x:
>>> from urllib.parse import quote
>>> keyword = "mahinder singh dhoni"
>>> quote(','.join(['caption:%s'%i for i in keyword.split()]))

count frequent strings + python3.7

I got a question for you, first of all the code here:
from urllib import request
from collections import Counter
from nltk import word_tokenize
URL = 'https://www.gutenberg.org/files/46/46-0.txt'
RESPONSE = request.urlopen(URL)
RAW = RESPONSE.read().decode('utf8')
print('\n')
type(RAW)
print('\n')
len(RAW)
TOKENS = word_tokenize(RAW)
print(type(TOKENS))
X = print(len(TOKENS))
print(TOKENS[:X])
print('\n')
c = Counter(RAW)
print(c.most_common(30))
Here is the first Output, I get. With that one I am satisfied.
['\ufeffThe', 'Project', 'Gutenberg', 'EBook', 'of', 'A', 'Christmas', 'Carol', ',', 'by', 'Charles',...]
Here is the second part of the output which do not makes me satisfied:
[(' strong text', 28438), ('e', 16556), ('t', 11960), ('o', 10940), ('a', 10092), ('n', 8868), ('i', 8791),...]
Here is my question: As you can see I am counting the most frequently occuring strings in a text, but the Problem is I want to count the whole elements of the list of words: The final part of second output should look something like that:
[('Dickens', 28438), ('Project', 16556), ('Gutenberg', 11960),...]
and not as you can see above in the second part of output. I want to show the 30 most frequently used Words in the text, and not parts of elements in elements of the list.
Do you know how I can solve that Problem? Thanks for helping.
Try changing this one
c = Counter(TOKENS)
Here attached your full code with change
from urllib import request
from collections import Counter
from nltk import word_tokenize
URL = 'https://www.gutenberg.org/files/46/46-0.txt'
RESPONSE = request.urlopen(URL)
RAW = RESPONSE.read().decode('utf8')
print('\n')
type(RAW)
print('\n')
len(RAW)
TOKENS = word_tokenize(RAW)
print(type(TOKENS))
X = print(len(TOKENS))
print(TOKENS[:X])
print('\n')
c = Counter(TOKENS)
print(c.most_common(500))

Filter file list in python/ lowercase and uppercase extension files

I am filtering my file list using this line:
MyList = filter(lambda x: x.endswith(('.doc','.txt','.dat')), os.listdir(path))
The line above will only filter lowercase extension files. Therefore, is there an elegant way to make it filter also the uppercase extension files?
You just need to add a .lower() to your lambda function
MyList = filter(lambda x: x.lower().endswith(('.doc','.txt','.dat')), os.listdir(path))
I'd prefer to use os.path.splitext with a list comprehension
from os.path import splitext
my_list = [x for x in os.listdir(path) if splitext(x)[1].lower() in {'.doc', '.txt', '.dat'}]
Still a bit much for a single line, so perhaps
from os.path import splitext
def valid_extension(x, valid={'.doc', '.txt', '.dat'}):
return splitext(x)[1].lower() in valid
my_list = [x for x in os.listdir(path) if valid_extension(x)]
import os
import re
pat = re.compile(r'[.](doc|txt|dat)$', re.IGNORECASE)
filenames = [filename for filename in os.listdir(path)
if re.search(pat, filename)]
print(filenames)

Resources