Edited at 19:19 like I said in a comment
Here is the text file first: https://www.dropbox.com/s/dtxgyk9v25p7swt/mbox-short.txt?dl=0
and my code is:
xfile = open('mbox-short.txt')
for line in xfile:
# La prochaine ligne est pour enlever les espaces supperflus
line = line.rstrip()
if line.startswith('From:') :
print(line)
The output is this:
From: stephen.marquard#uct.ac.za
From: louis#media.berkeley.edu
From: zqian#umich.edu
From: rjlowe#iupui.edu
From: zqian#umich.edu
and more.
What I am asking in this. When my program finds what I am looking for, it prints "From ....za". Now I know that there are more characters following za before it reaches the character \n? How my program do to decide to print only "From ....za" and not the others characters?
Thanks in advance for the trouble.
Mario
For this project, I was trying to make a list that contained all the distances between all the points that I had in a list(I have a list of x coordinates and one of y)
for da in range(len(meta.values())):
for db in range(len(meta.values())):
dis.append(math.sqrt((x[db] - x[da])**2 + (y[db] - y[da])**2)
print(dis)
However, this part of it either gives me an "unexpected EOF while parsing" or an "invalid syntax" by the print statement. I cant see the mistake here can someone please help me?
You have an unmatched parantheses in dis.append(...
it should be:
dis.append(math.sqrt((x[db] - x[da])**2 + (y[db] - y[da])**2))
I want to use split function of LUA 5.1 to split string of emoji characters without spaces and add space between ones, but I can't do it rightly. So I do it by this way, but it's wrong:
#!/usr/bin/env lua
local text = "👊🏽👊🏽🥊😂🎹😂👩👩👧👦👨👦👦⌚↔"
for emoji in string.gmatch(text, "[%z\1-\127\194-\244][\128-\191]*") do
io.write(emoji .. " ")
end
See in browser Firefox 65!
MY WRONG RESULT: 👊 🏽 👊 🏽 🥊 😂 🎹 😂 👩 👩 👧 👦 👨 👦 👦 ⌚ ↔
WAITED RESULT: 👊🏽 👊🏽 🥊 😂 🎹 😂 👩👩👧👦 👨👦👦 ⌚ ↔
local text = "👊🏽👊🏽🥊😂🎹😂👩👩👧👦👨👦👦⌚↔"
for emoji in text
:gsub("(.)([\194-\244])", "%1\0%2")
:gsub("%z(\240\159\143[\187-\191])", "%1")
:gsub("%z(\239\184[\128-\143])", "%1")
:gsub("%z(\226\128\141)%z", "%1")
:gmatch"%Z+"
do
print(emoji)
end
I want to print output like that:
Percentual de sapos: 25.00 %
I write this following code:
print("Percentual de sapos: %i.2f %"%Z)
But it didn't work.
How to print "%" in output by python..?
print("Percentual de sapos: {:.2f} %".format(Z))
This question is duplicated:How can I selectively escape...
when i use CRF++0.58 to model a NE and progarm have a problem:
"reading training data:tagger.cpp(399) [feature_index_->buildFeatures(this)] 0.00s"
the develop environment:
red hat linux 6.5,gcc 5.0,CRF++0.58
written feature template:
template
dataset:
Boson_train.txt
Boson_test.txt
the first column is words ,the second column is pos,the third column is NER tagger
the problem:
when i want to train the NER model, i type this sentences "crf_learn -f 3 -c 4.0 template Boson_train crf_model", and i got
this notification, "reading training data:tagger.cpp(399) [feature_index_->buildFeatures(this)] 0.00s". I can't understand
the C++ language, so i can't fix the problem.
the method i tryed:
1.change the encode type of dataset. I use notepad++ to change "utf-8 with no BOM" to "utf-8". It didn't work.
2.change the delimiter from '\t' to ' '(space). It didn't work.
3.And i think maybe the template was wrong.So i use the crf++0.58/example/seg/template for test. It worked. But this template
is simple, so I use /example/JapaneseNE/template which is more similar with my feature template. It didn't work. Then, i check
the JapaneseNE example It works well. So i got confused. Is there someone can help me.
template
U00:%x[-2,0]
U01:%x[-1,0]
U02:%x[0,0]
U03:%x[1,0]
U04:%x[2,0]
U05:%x[-2,0]/%x[-1,0]/%x[0,0]
U06:%x[-1,0]/%x[0,0]/%x[1,0]
U07:%x[0,0]/%x[1,0]/%x[2,0]
U08:%x[-1,0]/%x[0,0]
U09:%x[0,0]/%x[1,0]
U10:%x[-2,1]/%x[0,1]
U11:%x[-2,1]/%x[1,1]
U11:%x[-1,1]/%x[0,1]
U12:%x[0,0]/%x[0,1]
U13:%x[0,1]/%x[1,1]
U14:%x[0,1]/%x[2,1]
U15:%x[-1,0]/%x[0,1]
U16:%x[-1,0]/%x[-1,1]
U17:%x[1,0]/%x[1,1]
U18:%x[1,0]/%x[1,1]
U19:%x[2,0]/%x[2,1]
U20:%x[-1,2]
U21:%x[-2,2]
U22:%x[0,1]/%x[-1,2]
U23:%x[0,1]/%x[-2,2]
U24:%x[0,0]/%x[-1,2]
U25:%x[0,0]/%x[-2,2]
U26:%x[-1,2]/%x[-2,2]/%x[0,1]
U27:%x[-2,2]/%x[0,1]/%x[1,1]
U28:%x[-1,1]/%x[-1,2]/%x[0,1]
U29:%x[-1,2]/%x[0,0]/%x[0,1]
Boson_train
浙江 ns B_product_name
在线 b I_product_name
杭州 ns I_product_name
4 m B_time
月 m I_time
25 m I_time
日 m I_time
讯 ng Out
( x Out
记者 n Out
x Out
x B_person_name
施宇翔 nr I_person_name
x Out
通讯员 n B_person_name
x Out
方英 nr B_person_name
) x Out
毒贩 n Out
很 zg Out
“ x Out
时髦 nr Out
” x Out
, x Out
用 p Out
微信 vn B_product_name
交易 n Out
毒品 n Out
。 x Out
没 v Out
料想 v Out
警方 n B_person_name
也 d Out
You were debugging in the right direction. The issue is indeed with your template file.
Your training data has 3 columns (column 0:word, column 1:pos-tag and column 2:tag).
You cannot use the tag as feature, but your template file has reference to it (i.e, column 2) in many feature definitions (see, U20 to U29). Your training should work after removing/correcting these.
Hope this helps :)
You can also checkout these video tutorials for better understanding of Template Files and Training NER with CRF++ :
1) https://youtu.be/GJHeTvDkIaE
2) https://youtu.be/Ur5umC4BwN4