Why is str.translate() returning an error and how can I fix it? - python-3.x

import os
def rename_files():
file_list = os.listdir(r"D:\360Downloads\test")
saved_path = os.getcwd()
os.chdir(r"D:\360Downloads\test")
for file_name in file_list:
os.rename(file_name, file_name.translate(None,"0123456789"))
rename_files()
the error message is TypeError: translate() takes exactly one argument (2 given). How can I format this so that translate() does not return an error?

Hope this helps!
os.rename(file_name,file_name.translate(str.maketrans('','','0123456789')))
or
os.rename(file_name,file_name.translate({ ord(i) : None for i in '0123456789' }))
Explanation:
I think you're using Python 3.x and syntax for Python 2.x. In Python 3.x translate() syntax is
str.translate(table)
which takes only one argument, not like Python 2.x in which translate() syntax is
str.translate(table[, deletechars])
which can takes more than one arguments.
We can make translation table easily using maketrans function.
In this case, In first two parameters, we're replacing nothing to nothing and in third parameter we're specifying which characters to be removed.
We can also make translation table manually using dictionary in which key contains ASCII of before and value contains ASCII of after character.If we want to remove some character it value must be None.
I.e. if we want to replace 'A' with 'a' and remove '1' in string then our dictionary looks like this
{65: 97, 49: None}

Related

Python substitute column text

On execute the following line of code, I am getting TypeError: repl must be a string or callable
ps_df['Name'].str.replace(ps_df['Substitute'].str,'\n'+ps_df['Substitute'].str)
When I changed it to the following,
ps_df['Name'].str.replace(ps_df['Substitute'].str,'\n'+ps_df['Substitute'].str)
I get this error, TypeError: can only concatenate str (not "StringMethods") to str
According to pandas documentation for pandas.Series.str.replace, the first argument should be a string or compiled regex.
But you are trying to feed in a series or a list of strings. Also, you have used the .str accessor of the series which is not properly used. Hence, the error.
You should be using apply to replace strings row-wise.
ps_df.apply(lambda x: x.Name.replace(x.Substitute, '\n' + x.Substitute), axis=1)

How to use f'string bytes'string together? [duplicate]

I'm looking for a formatted byte string literal. Specifically, something equivalent to
name = "Hello"
bytes(f"Some format string {name}")
Possibly something like fb"Some format string {name}".
Does such a thing exist?
No. The idea is explicitly dismissed in the PEP:
For the same reason that we don't support bytes.format(), you may
not combine 'f' with 'b' string literals. The primary problem
is that an object's __format__() method may return Unicode data
that is not compatible with a bytes string.
Binary f-strings would first require a solution for
bytes.format(). This idea has been proposed in the past, most
recently in PEP 461. The discussions of such a feature usually
suggest either
adding a method such as __bformat__() so an object can control how it is converted to bytes, or
having bytes.format() not be as general purpose or extensible as str.format().
Both of these remain as options in the future, if such functionality
is desired.
In 3.6+ you can do:
>>> a = 123
>>> f'{a}'.encode()
b'123'
You were actually super close in your suggestion; if you add an encoding kwarg to your bytes() call, then you get the desired behavior:
>>> name = "Hello"
>>> bytes(f"Some format string {name}", encoding="utf-8")
b'Some format string Hello'
Caveat: This works in 3.8 for me, but note at the bottom of the Bytes Object headline in the docs seem to suggest that this should work with any method of string formatting in all of 3.x (using str.format() for versions <3.6 since that's when f-strings were added, but the OP specifically asks about 3.6+).
From python 3.6.2 this percent formatting for bytes works for some use cases:
print(b"Some stuff %a. Some other stuff" % my_byte_or_unicode_string)
But as AXO commented:
This is not the same. %a (or %r) will give the representation of the string, not the string iteself. For example b'%a' % b'bytes' will give b"b'bytes'", not b'bytes'.
Which may or may not matter depending on if you need to just present the formatted byte_or_unicode_string in a UI or if you potentially need to do further manipulation.
As noted here, you can format this way:
>>> name = b"Hello"
>>> b"Some format string %b World" % name
b'Some format string Hello World'
You can see more details in PEP 461
Note that in your example you could simply do something like:
>>> name = b"Hello"
>>> b"Some format string " + name
b'Some format string Hello'
This was one of the bigger changes made from python 2 to python3. They handle unicode and strings differently.
This s how you'd convert to bytes.
string = "some string format"
string.encode()
print(string)
This is how you'd decode to string.
string.decode()
I had a better appreciation for the difference between Python 2 versus 3 change to unicode through this coursera lecture by Charles Severence. You can watch the entire 17 minute video or fast forward to somewhere around 10:30 if you want to get to the differences between python 2 and 3 and how they handle characters and specifically unicode.
I understand your actual question is how you could format a string that has both strings and bytes.
inBytes = b"testing"
inString = 'Hello'
type(inString) #This will yield <class 'str'>
type(inBytes) #this will yield <class 'bytes'>
Here you could see that I have a string a variable and a bytes variable.
This is how you would combine a byte and string into one string.
formattedString=(inString + ' ' + inBytes.encode())

NoneType object has no attribute groups? [duplicate]

I have a string which I want to extract a subset of. This is part of a larger Python script.
This is the string:
import re
htmlString = '</dd><dt> Fine, thank you. </dt><dd> Molt bé, gràcies. (<i>mohl behh, GRAH-syuhs</i>)'
Which I want to pull-out "Molt bé, gràcies. mohl behh, GRAH-syuhs". And for that I use regular expression using re.search:
SearchStr = '(\<\/dd\>\<dt\>)+ ([\w+\,\.\s]+)([\&\#\d\;]+)(\<\/dt\>\<dd\>)+ ([\w\,\s\w\s\w\?\!\.]+) (\(\<i\>)([\w\s\,\-]+)(\<\/i\>\))'
Result = re.search(SearchStr, htmlString)
print Result.groups()
AttributeError: 'NoneType' object has no attribute 'groups'
Since Result.groups() doesn't work, neither do the extractions I want to make (i.e. Result.group(5) and Result.group(7)).
But I don't understand why I get this error? The regular expression works in TextWrangler, why not in Python? Im a beginner in Python.
You are getting AttributeError because you're calling groups on None, which hasn't any methods.
regex.search returning None means the regex couldn't find anything matching the pattern from supplied string.
when using regex, it is nice to check whether a match has been made:
Result = re.search(SearchStr, htmlString)
if Result:
print Result.groups()
import re
htmlString = '</dd><dt> Fine, thank you. </dt><dd> Molt bé, gràcies. (<i>mohl behh, GRAH-syuhs</i>)'
SearchStr = '(\<\/dd\>\<dt\>)+ ([\w+\,\.\s]+)([\&\#\d\;]+)(\<\/dt\>\<dd\>)+ ([\w\,\s\w\s\w\?\!\.]+) (\(\<i\>)([\w\s\,\-]+)(\<\/i\>\))'
Result = re.search(SearchStr.decode('utf-8'), htmlString.decode('utf-8'), re.I | re.U)
print Result.groups()
Works that way. The expression contains non-latin characters, so it usually fails. You've got to decode into Unicode and use re.U (Unicode) flag.
I'm a beginner too and I faced that issue a couple of times myself.

set function with file- python3

I have a text file with given below content
Credit
Debit
21/12/2017
09:10:00
Written python code to convert text into set and discard \n.
with open('text_file_name', 'r') as file1:
same = set(file1)
print (same)
print (same.discard('\n'))
for first print statement print (same). I get correct result:
{'Credit\n','Debit\n','21/12/2017\n','09:10:00\n'}
But for second print statement print (same.discard('\n')) . I am getting result as
None.
Can anybody help me to figure out why I am getting None. I am using same.discard('\n') to discard \n in the set.
Note:
I am trying to understand the discard function with respect to set.
The discard method will only remove an element from the set, since your set doesn't contain just \n it can't discard it. What you are looking for is a map that strips the \n from each element like so:
set(map(lambda x: x.rstrip('\n'), same))
which will return {'Credit', 'Debit', '09:10:00', '21/12/2017'} as the set. This sample works by using the map builtin which applies it's first argument to each element in the set. The first argument in our map usage is lambda x: x.rstrip('\n') which is simply going to remove any occurrences of \n on the right-hand side of each string.
discard removes the given element from the set only if it presents in it.
In addition, the function doesn't return any value as it changes the set it was ran from.
with open('text_file_name', 'r') as file1:
same = set(file1)
print (same)
same = {elem[:len(elem) - 1] for elem in same if elem.endswith('\n')}
print (same)
There are 4 elements in the set, and none of them are newline.
It would be more usual to use a list in this case, as that preserves order while a set is not guaranteed to preserve order, plus it discards duplicate lines. Perhaps you have your reasons.
You seem to be looking for rstrip('\n'). Consider processing the file in this way:
s = {}
with open('text_file_name') as file1:
for line in file1:
s.add(line.rstrip('\n'))
s.discard('Credit')
print(s) # This displays 3 elements, without trailing newlines.

Python trouble debugging i/0, how do I get the correct format?

I am attempting to make a dictionary into a formatted string and then write it to a file, however my entire formatting seems to be incorrect. I'm not sure how to debug since all my tester cases are given different files. I was able to use the interactive mode in python to find out what my function is actually writing to the file, and man is it so wrong! Can you help me correctly format?
Given a sorted dictionary, I created it into a string. I need the function to return it like so:
Dictionary is : {'orange':[1,3],'apple':[2]}
"apple:\t2\norange:\t1,\t3\n"
format is: Every key-value pair of the dictionary
should be output as: a string that starts with key, followed by ":", a tab, then the integers from the
value list. Every integer should be followed by a "," and a tab except for the very last one, which should be followed by a newline
Here is my function that I thought would work:
def format_item(key,value):
return key+ ":\t"+",\t".join(str(x) for x in value)
def format_dict(d):
return sorted(format_item(key,value) for key, value in d.items())
def store(d,filename):
with open(filename, 'w') as f:
f.write("\n".join(format_dict(d)))
f.close()
return None
I now have too many tabs on the last line. How do I edit the last line only out of the for loop?
ex input:
d = {'orange':[1,3],'apple':[2]}
my function gives: ['apple:\t2', 'orange:\t1,\t3']
but should give: "apple:\t2\norange:\t1,\t3\n"
Adding the newline character to the end of the return statement in format_item seems to yield the correct output.
return key+ ":\t"+",\t".join(str(x) for x in value) + '\n'
In [10]: format_dict(d)
Out[10]: ['apple:\t2\n', 'orange:\t1,\t3\n']

Resources