detect_langs how to use the output - python-3.x

I'm using the langdetect and it should return the probability/percentage of a certain language in a string which is something like [en:0.9999960343803843] for an English text. I want to check the language and the percentage and store them in variables to use them later but I can't do anything with it except printing it. the type seems to be <class 'langdetect.language.Language'>
lan="Otec matka syn."
lan=detect_langs(line)
print(lan)
print(type(lan[0]))
this code outputs
[pl:0.7142846922445223, fi:0.2857135474194883]
<class 'langdetect.language.Language'>
note: It's not json because i've tried json.loads(lan[0]) and an error says it should be a string not language
edit: as user696969 answered the solution was to save them in a dict
x=detect_langs(line)
lan={}
for lang in x:
lan.update({lang.lang: lang.prob})

Since they are language.Language object, you can convert each language data into dict type using the following code
from langdetect import detect_langs
line="Otec matka syn."
lan=[{lang.lang: lang.prob} for lang in detect_langs(line)]
print(lan)
print(type(lan[0]))
The expected output for lan would be
[{'fi': 0.8571392823357673}, {'pl': 0.14285943305652865}]
You can also store the entire list of languages into dictionary by replacing
lan=[{lang.lang: lang.prob} for lang in detect_langs(line)]
with
lan={lang.lang: lang.prob for lang in detect_langs(line)}
The expected output would be something like below
{'fi': 0.7142848220971209, 'pl': 0.2857147054811151}

Related

Django - select [column] AS

I have the following code:
MyModel.objects.values('foo','flu','fu').filter(part_of=pk)
which gives me the following dictionary:
{'foo': a, 'flu': b, 'fu': c}
This dictionary is serialized into a JSON response like so:
JsonResponse(data, safe=False)
Is there a way I could rename the key of 'flu' into something while preserving the value?
So far I tried:
values[0]['new_flu'] = values[0].pop('flu')
Which I think is the usual Python way of renaming a dictionary key however this seems to have no effect and the JSON returned still contains 'flu' instead of new_flu.
I feel like this could be simply solved by ALIAS eg. SELECT foo, flu AS new_flu, fu from ..... What would be the Django alternative of this command? Or is there some other solution?
One of the options is to annotate query with names you needed:
from django.models import F
MyModel.objects.filter(part_of=pk).annotate(new_flu=F('flu')).values('foo','new_flu','fu')

Python translate a column with multiple languages to english

I have a dataset where there are multiple comments columns having multiple languages and I want to translate these columns into English and create new columns with all the english translations.
Accountability_COMMENT is the column which has multiple comments in different language in every row. I want to create a new column and translate all such comments to English.
I have tried the following code :
from googletrans import Translator
from textblob import TextBlob
translator = Translator()
data_merge['Accountability_COMMENT'] = data_merge['Accountability_COMMENT'].apply(lambda x:
TextBlob(x).translate(to='en'))
The error that I am getting is :
TypeError: The text argument passed to __init__(text) must be a string, not class 'float'
My column has objet format which is correct
You most probably have some comments that only consists of a float (i.e. a decimal number), that even if they are type: object according to pandas they are still interpreted as float by TextBlob. This leads to the error:
TypeError: The text argument passed to __init__(text) must be a string, not <class 'float'>
One solution is to make sure that the input x of TextBlob(x) is a string. You could do this by modifying the apply row like:
data_merge['Accountability_COMMENT'] = data_merge['Accountability_COMMENT'].apply(lambda x: TextBlob(str(x)).translate(to='en'))
Unfortunately this will probably also rais an error like:
raise NotTranslated('Translation API returned the input string unchanged.')
textblob.exceptions.NotTranslated: Translation API returned the input string unchanged.
This is due to the fact that when translating a number, the translation and the original text will be exactly the same, and apparently TextBlob doesn't like that.
What you can do to avoid this is to catch that exception NotTranslated and just return the untranslated TextBlob, like this:
from textblob import TextBlob
from textblob.exceptions import NotTranslated
def translate_comment(x):
try:
# Try to translate the string version of the comment
return TextBlob(str(x)).translate(to='en')
except NotTranslated:
# If the output is the same as the input just return the TextBlob version of the input
return TextBlob(str(x))
data_merge['Accountability_COMMENT'] = data_merge['Accountability_COMMENT'].apply(translate_comment)
EDIT:
If you get the HTTP error Too Many Requests it's probably because you are being kicked out by the Google Translate API. Instead of using apply, you can make your translation "extra-slow" by using a for loop with some sleep in-between cycles. In this case you should import another package (time) and substitute the last line:
from time import sleep
from textblob import TextBlob
from textblob.exceptions import NotTranslated
def translate_comment(x):
try:
# Try to translate the string version of the comment
return TextBlob(str(x)).translate(to='en')
except NotTranslated:
# If the output is the same as the input just return the TextBlob version of the input
return TextBlob(str(x))
for i in range(len(data_merge['Accountability_COMMENT'])):
# Translate one comment at a time
data_merge['Accountability_COMMENT'].iloc[i] = translate_comment(data_merge['Accountability_COMMENT'].iloc[i])
# Sleep for a quarter of second
sleep(0.25)
You can then experiment with different values for the sleep function. Of course the longer the sleep the slower the translation! N.B. sleep argument is in seconds.

Basic string formatting with NIM

I am trying to do some very basic string formatting and I got immediately stuck.
What is wrong with this code?
import strutils
import parseopt2
for kind, key, val in getopt():
echo "$1 $2 $3" % [kind, key, val]
I get Error: type mismatch: got (TaintedString) but expected 'CmdLineKind = enum' but I don't understand how shall I fix it.
The problem here is that Nim's formatting operator % expects an array of objects with the same type. Since the first element of the array here has the CmdLineKind enum type, the compiler expects the rest of the elements to have the same type. Obviously, what you really want is all of the elements to have the string type and you can enforce this by explicitly converting the first paramter to string (with the $ operator).
import strutils
import parseopt2
for kind, key, val in getopt():
echo "$1 $2 $3" % [$kind, key, val]
In case, you are also wondering what is this TaintedString type appearing in the error message, this is a special type indicating a non-validated external input to the program. Since non-validated input data poses a security risk, the language supports a special "taint mode", which helps you keep track of where the inputs may need validation. This mode is inspired by a similar set of features available in the Perl programming language:
http://docstore.mik.ua/orelly/linux/cgi/ch08_04.htm
If you use the strformat Nim-inbuilt library, the same code snippet can be more concise:
import parseopt # parseopt2 has been deprecated!
import strformat
for kind, key, val in getopt():
echo fmt"{kind} {key} {val}"
Also note that parseopt replaces the deprecated parseopt2 library, at least as of today on Nim 0.19.2.

what is the workaround for QString.contains() method for pyqt4+python3?

I have been converting a Qt/C++ widget code into PyQt4+Python3. I have a QFileSystemModel defined and the items it returns have "data" with the filename as type "str". (This is of type QString in Qt/C++ or Python2x).
I have to search for a filter based on QRegEx. In Qt/C++ and Python2x this is achieved by QString.contains(QRegEx).
I found that QString has been removed in Python3. Since now in Python3 everything is now of type "str", how can i implement the old method QString.contains(QRegEx)?
Thanks,
Kodanda
For string mainipulation, Python is generally superior to anything Qt has to offer (particularly when it comes to regular expressions).
But if you must use QRegExp:
# test whether string contains pattern
if QRegExp(pattern).indexIn(string) != -1:
print('found')
Python:
if re.search(pattern, string):
print('found')

Can two strings arguments be passed to a python dict() built in function?

I have a loop i would like to build a dictionary from. The part of the code I'm having trouble with is that both the key and the value are strings. I cannot convert the IP variable back string into a int nor is it a float.
Here is the method from my class I'm attempting to build the dictionary with. There is a loop elsewhere walking the IP range I'm interested in feeding the method parameter 'ip'.
def dictbuild(self,ip):
s = pxssh.pxssh()
s.force_password = True
try:
s.login(str(ip), 'root', 'password')
s.sendline('hostname') # run a command
s.prompt() # match the prompt
out = s.before # print everything before the prompt, this returns a byte object and could need decode(utf-8)
out = out.decode('utf-8')
out = out.split()
out = out[1]
print(type(out)) #These type function give us an easy output of data types in the traceback
print(type(ip))
ipdict = dict(out=ip) ###Getting stuck here on two string types.
print(ipdict)
s.logout()
except (pxssh.ExceptionPxssh, pexpect.EOF) as e:
pass
Instead of passing the string i want (it would actually be the hostname of the box) we get an output like this...
<class 'str'>
<class 'str'>
{'out': '10.20.234.3'}
The python documentation only gives examples of key as string and value as int. Am i using this wrong?
https://docs.python.org/2/tutorial/datastructures.html
Thanks in advance.
Use a dict literal instead:
ipdict = { out: ip }
Calling dict() in that way is just passing named arguments; dict literals take expressions for keys and expressions for values.

Resources