pocketsphinx hotword detection not working

pocketsphinx hotword detection not working - cmusphinx

I'm trying to build a small piece of software that detect a hotword using the CMU Sphinx Speech Recognition Toolkit (pocketsphinx).
I created a corpus file with 2 words, one is HELP
Using the model tool aquired the model.... (http://www.speech.cs.cmu.edu/tools/lmtool-new.html)
I get too many hotword even when no one say the hotword....
What I'm missing?
Here is the code:
#include "stdafx.h"
#include <windows.h>
#include <sphinxbase/err.h>
#include <sphinxbase/ad.h>
#include <pocketsphinx.h>
using namespace System;
#define HOTWORD_KEY "hotwordsearch"
#define LM_KEY "lmsearch"
static const arg_t args_def[] = {
POCKETSPHINX_OPTIONS,
CMDLN_EMPTY_OPTION
};
const char *keyphrase = NULL;
ad_rec_t* open_recording_device(ps_decoder_t *ps, cmd_ln_t *config)
{
ad_rec_t *ad;
int samprate = (int)cmd_ln_float32_r(ps_get_config(ps), "-samprate");
if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev"), samprate)) == NULL) {
E_ERROR("Failed to open audio device\n");
return NULL;
}
if (ad_start_rec(ad) < 0) {
E_ERROR("Failed to start recording\n");
return NULL;
}
return ad;
}
char const *acquire_from_mic(ps_decoder_t *ps, ad_rec_t *ad, int need_final)
{
int16 adbuf[4096];
uint8 utt_started, in_speech;
int32 k, score=0;
char const *hyp;
if (ps_start_utt(ps) < 0) {
E_ERROR("Failed to start utterance\n");
return NULL;
}
utt_started = FALSE;
E_INFO("Ready....\n");
for (;;) {
if ((k = ad_read(ad, adbuf, 4096)) < 0)
E_FATAL("Failed to read audio\n");
ps_process_raw(ps, adbuf, k, FALSE, FALSE);
in_speech = ps_get_in_speech(ps);
if (in_speech && !utt_started) {
utt_started = TRUE;
E_INFO("Listening...\n");
}
if (!in_speech && utt_started){
/* speech -> silence transition, time to start new utterance */
ps_end_utt(ps);
hyp = NULL;
//hyp = ps_get_hyp_final(ps, &score);
hyp = ps_get_hyp(ps, &score);
if ((hyp != NULL)/*&&(score>0)*/) {
E_INFO("---> score = %d\n", score);
E_INFO("---> hyp = %s \n", hyp);
return hyp;
}
if (ps_start_utt(ps) < 0) {
E_ERROR("Failed to start utterance\n");
return NULL;
}
utt_started = FALSE;
E_INFO("Ready again....\n");
}
Sleep(10);
}
return NULL;
}
int wait_for_hotword(ps_decoder_t *ps, ad_rec_t *ad)
{
if (ps_set_search(ps, HOTWORD_KEY) < 0) {
E_ERROR("Couldn't set hotwordsearch\n");
return 0;
}
if (keyphrase == NULL) {
keyphrase = ps_get_kws(ps, HOTWORD_KEY);
E_INFO("keyphrase is: %s \n", keyphrase);
}
const char *hyp;
do {
hyp = NULL;
hyp = acquire_from_mic(ps, ad, FALSE);
if (hyp != NULL){
if (strcmp(keyphrase, hyp) == 0) {
return 1;
}
}
} while (1);
return 0;
}
int main(int argc, char *argv[])
{
ps_decoder_t *ps;
cmd_ln_t *config;
config = cmd_ln_parse_file_r(NULL, args_def, "pocketsphinx.conf", 1);
if (config == NULL) {
fprintf(stderr, "Failed to create config object, see log for details\n");
return -1;
}
ps = ps_init(config);
if (ps == NULL) {
fprintf(stderr, "Failed to create recognizer, see log for details\n");
return -1;
}
ps_set_lm_file(ps, LM_KEY, "0806.lm");
ps_set_keyphrase(ps, HOTWORD_KEY, "HELP");
ad_rec_t* ad = open_recording_device(ps, config);
if (ad == NULL) {
fprintf(stderr, "Failed to open_recording_device\n");
return -1;
}
while (true) {
if (wait_for_hotword(ps, ad)==1)
{
fprintf(stderr, "\n\n****************\nGot hotword\n");
}
}
ps_free(ps);
cmd_ln_free_r(config);
Console::WriteLine(L"Hello World");
return 0;
}
The config file:
-dict 0806.dic
-kws_threshold 1e-40
-samprate 16000
-lm 0806.lm
-hmm model/en-us/en-us/
The output below (
bad detection: the hotword detected all the time - no one say the word "HELP"):
Don't understand why I get the ---> hyp = HELP HELP HELP HELP HELP
output:
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from model/en-us/en-us//feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+000
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-001
-ascale 20.0 2.000000e+001
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-048
-bestpath yes yes
-bestpathlw 9.5 9.500000e+000
-ceplen 13 13
-cmn current current
-cmninit 8.0 40,3,-1
-compallsen no no
-debug 0
-dict 0806.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-008
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-064
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+000
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-029
-fwdtree yes yes
-hmm model/en-us/en-us/
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-001
-kws_threshold 1 1.000000e-040
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm 0806.lm
-lmctl
-lmname
-logbase 1.0001 1.000100e+000
-logfn
-logspec no no
-lowerf 133.33334 1.300000e+002
-lpbeam 1e-40 1.000000e-040
-lponlybeam 7e-29 7.000000e-029
-lw 6.5 6.500000e+000
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-007
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-nwpen 1.0 1.000000e+000
-pbeam 1e-48 1.000000e-048
-pip 1.0 1.000000e+000
-pl_beam 1e-10 1.000000e-010
-pl_pbeam 1e-10 1.000000e-010
-pl_pip 1.0 1.000000e+000
-pl_weight 3.0 3.000000e+000
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+004
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-003
-smoothspec no no
-svspec 0-12/13-25/26-38
-tmat
-tmatfloor 0.0001 1.000000e-004
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+003
-uw 1.0 1.000000e+000
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 3.000000e+000
-var
-varfloor 0.0001 1.000000e-004
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-029
-wip 0.65 6.500000e-001
-wlen 0.025625 2.562500e-002
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: model/en-us/en-us//mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: model/en-us/en-us//mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: model/en-us/en-us//transition_matrices
INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model/en-us/en-us//means
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model/en-us/en-us//variances
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(354): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file model/en-us/en-us//sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(835): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4104 * 20 bytes (80 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: 0806.dic
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(336): 3 words read
INFO: dict.c(358): Reading filler dictionary: model/en-us/en-us//noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(347): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(358): Header doesn't match
INFO: ngram_model_trie.c(176): Trying to read LM in arpa format
INFO: ngram_model_trie.c(192): LM of order 3
INFO: ngram_model_trie.c(194): #1-grams: 5
INFO: ngram_model_trie.c(194): #2-grams: 6
INFO: ngram_model_trie.c(194): #3-grams: 3
INFO: lm_trie.c(473): Training quantizer
INFO: lm_trie.c(481): Building LM trie
INFO: ngram_search_fwdtree.c(99): 3 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 6 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 6 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 139
INFO: ngram_search_fwdtree.c(339): after: 3 root, 11 non-root channels, 5 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: ngram_model_trie.c(347): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(358): Header doesn't match
INFO: ngram_model_trie.c(176): Trying to read LM in arpa format
INFO: ngram_model_trie.c(192): LM of order 3
INFO: ngram_model_trie.c(194): #1-grams: 5
INFO: ngram_model_trie.c(194): #2-grams: 6
INFO: ngram_model_trie.c(194): #3-grams: 3
INFO: lm_trie.c(473): Training quantizer
INFO: lm_trie.c(481): Building LM trie
INFO: ngram_search_fwdtree.c(99): 3 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 6 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 6 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 139
INFO: ngram_search_fwdtree.c(339): after: 3 root, 11 non-root channels, 5 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: kws_search.c(420): KWS(beam: -1080, plp: -23, default threshold -900, delay 10)
ERROR: "cmd_ln.c", line 938: Unknown argument: -adcdev
Allocating 32 buffers of 2500 samples each
INFO: cppTest.cpp(103): keyphrase is: HELP
INFO: cppTest.cpp(53): Ready....
INFO: cppTest.cpp(63): Listening...
INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 3.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 38.01 18.02 -0.60 -3.94 3.44 2.67 -2.47 -0.60 2.43 -5.32 -2.22 -7.04 2.46 >
INFO: cppTest.cpp(76): ---> score = 0
INFO: cppTest.cpp(77): ---> hyp = HELP HELP HELP
INFO: cppTest.cpp(53): Ready....
INFO: cppTest.cpp(63): Listening...
INFO: cmn_prior.c(131): cmn_prior_update: from < 38.01 18.02 -0.60 -3.94 3.44 2.67 -2.47 -0.60 2.43 -5.32 -2.22 -7.04 2.46 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 39.34 14.99 -7.16 -2.53 -4.98 -8.98 -3.15 1.60 4.95 -9.76 2.29 -5.59 4.39 >
INFO: cppTest.cpp(76): ---> score = 0
INFO: cppTest.cpp(77): ---> hyp = HELP HELP HELP HELP HELP HELP HELP HELP HELP HELP HELP HELP HELP
INFO: cppTest.cpp(53): Ready....
INFO: cppTest.cpp(63): Listening...
INFO: cmn_prior.c(131): cmn_prior_update: from < 39.34 14.99 -7.16 -2.53 -4.98 -8.98 -3.15 1.60 4.95 -9.76 2.29 -5.59 4.39 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 39.29 13.66 -5.37 -1.40 -4.89 -9.12 -5.23 0.36 4.09 -10.30 3.82 -4.18 4.00 >
INFO: cppTest.cpp(76): ---> score = 0
INFO: cppTest.cpp(77): ---> hyp = HELP HELP HELP HELP HELP
INFO: cppTest.cpp(53): Ready....
INFO: cppTest.cpp(63): Listening...
INFO: cmn_prior.c(99): cmn_prior_update: from < 39.29 13.66 -5.37 -1.40 -4.89 -9.12 -5.23 0.36 4.09 -10.30 3.82 -4.18 4.00 >
INFO: cmn_prior.c(116): cmn_prior_update: to < 39.24 13.36 -4.74 -0.99 -5.23 -9.40 -5.08 -0.40 3.95 -10.70 3.53 -4.19 3.83 >
INFO: cmn_prior.c(99): cmn_prior_update: from < 39.24 13.36 -4.74 -0.99 -5.23 -9.40 -5.08 -0.40 3.95 -10.70 3.53 -4.19 3.83 >
INFO: cmn_prior.c(116): cmn_prior_update: to < 39.12 11.92 -5.07 -1.94 -6.63 -7.68 -3.63 -2.29 0.49 -11.79 3.25 -3.04 2.44 >
INFO: cmn_prior.c(131): cmn_prior_update: from < 39.12 11.92 -5.07 -1.94 -6.63 -7.68 -3.63 -2.29 0.49 -11.79 3.25 -3.04 2.44 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 38.03 11.47 -5.09 -2.07 -6.22 -6.86 -3.23 -2.30 0.27 -11.47 2.86 -3.17 2.37 >
INFO: cppTest.cpp(76): ---> score = 0
INFO: cppTest.cpp(77): ---> hyp = HELP HELP HELP HELP HELP HELP HELP HELP HELP HELP HELP
INFO: cppTest.cpp(53): Ready....
INFO: cppTest.cpp(63): Listening...
INFO: cmn_prior.c(99): cmn_prior_update: from < 38.03 11.47 -5.09 -2.07 -6.22 -6.86 -3.23 -2.30 0.27 -11.47 2.86 -3.17 2.37 >
INFO: cmn_prior.c(116): cmn_prior_update: to < 37.17 10.53 -3.43 1.18 -7.71 -8.15 -2.56 -5.45 -0.81 -12.44 0.63 -2.53 2.86 >
INFO: cmn_prior.c(131): cmn_prior_update: from < 37.17 10.53 -3.43 1.18 -7.71 -8.15 -2.56 -5.45 -0.81 -12.44 0.63 -2.53 2.86 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 36.30 10.14 -2.66 2.23 -7.40 -7.74 -2.55 -5.77 -0.57 -11.16 0.96 -2.47 2.95 >
INFO: cppTest.cpp(76): ---> score = 0
INFO: cppTest.cpp(77): ---> hyp = HELP HELP HELP HELP HELP HELP HELP HELP HELP
INFO: cppTest.cpp(53): Ready....
INFO: cppTest.cpp(63): Listening...
INFO: cmn_prior.c(131): cmn_prior_update: from < 36.30 10.14 -2.66 2.23 -7.40 -7.74 -2.55 -5.77 -0.57 -11.16 0.96 -2.47 2.95 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 36.39 10.85 -2.80 2.29 -5.72 -6.97 -2.79 -3.21 -0.31 -10.98 0.31 -3.93 2.68 >
INFO: cppTest.cpp(76): ---> score = 0
INFO: cppTest.cpp(77): ---> hyp = HELP
****************
Got hotword
INFO: cppTest.cpp(53): Ready....
INFO: cppTest.cpp(63): Listening...

Related

Why do I get a view error when enumerating a Dataframe

Why do I get a "view" error:
ndf = pd.DataFrame()
ndf['Signals'] = [1,1,1,1,1,0,0,0,0,0]
signals_diff = ndf.Signals.diff()
ndf['Revals'] = [101,102,105,104,105,106,107,108,109,109]
ndf['Entry'] = 0
for i,element in enumerate(signals_diff):
if (i==0):
ndf.iloc[i]['Entry'] = ndf.iloc[i]['Revals']
elif (element == 0):
ndf.iloc[i]['Entry'] = ndf.iloc[i - 1]['Entry']
else:
ndf.iloc[i]['Entry'] = ndf.iloc[i]['Revals']
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ndf.iloc[i]['Entry'] = ndf.iloc[i]['Revals']

instead of iloc use loc:
ndf = pd.DataFrame()
ndf['Signals'] = [1,1,1,1,1,0,0,0,0,0]
signals_diff = ndf.Signals.diff()
ndf['Revals'] = [101,102,105,104,105,106,107,108,109,109]
ndf['Entry'] = 0
for i,element in enumerate(signals_diff):
if (i==0):
ndf.loc[i,'Entry'] = ndf.loc[i,'Revals']
elif (element == 0):
ndf.loc[i,'Entry'] = ndf.loc[i - 1,'Entry']
else:
ndf.loc[i,'Entry'] = ndf.loc[i,'Revals']
This will solve the problem but when assigning, the index should be same. So because of the index thing you might not be able to get the expected result.

Do not chain indexes like ndf.iloc[i]['Entry'] when trying to assign something. See why does that not work.
That said, your code can be rewrite as:
ndf['Entry'] = ndf['Revals'].where(signals_diff != 0).ffill()
Output:
Signals Revals Entry
0 1 101 101.0
1 1 102 101.0
2 1 105 101.0
3 1 104 101.0
4 1 105 101.0
5 0 106 106.0
6 0 107 106.0
7 0 108 106.0
8 0 109 106.0
9 0 109 106.0

Let us keep using the index position slice with get_indexer
for i,element in enumerate(signals_diff):
if (i==0):
ndf.iloc[i,ndf.columns.get_indexer(['Entry'])] = ndf.iloc[i,ndf.columns.get_indexer(['Revals'])]
elif (element == 0):
ndf.iloc[i,ndf.columns.get_indexer(['Entry'])] = ndf.iloc[i - 1,ndf.columns.get_indexer(['Entry'])]
else:
ndf.iloc[i,ndf.columns.get_indexer(['Entry'])] = ndf.iloc[i,ndf.columns.get_indexer(['Revals'])]

Can't display image when using OpenCV

I'm trying to display an image on screen using OpenCV, but nothing happens - all I see is a black screen.
No error message is received.
I'm using Mac and I have a secondary screen connected. When I run the code, both are turned black for 10 seconds but the image is not displayed.
This is the code:
while True:
if not os.path.isfile(pic_name):
print("Wrong path: ", pic_name)
return
image = cv2.imread(pic_name, 0)
if image is not None:
print(image)
cv2.imshow('image', image)
k = cv2.waitKey(1000)
if k == 27: # If escape was pressed exit
cv2.destroyAllWindows()
break
break
return pic_time
Also - when printing the image as nd-array, values are good:
[[ 0 3 4 ... 239 220 3]
[ 0 2 0 ... 238 219 3]
[ 2 6 0 ... 237 218 2]
...
[ 0 26 127 ... 175 173 2]
[ 0 33 149 ... 169 168 3]
[ 3 22 145 ... 167 163 2]]
Thanks!

Scraping issue: Beatifulsoup; IndexError: list index out of range

I'm trying to scrape: http://www.wtatennis.com/stats. But I run into an error when I have the complete code done. Potentially I'm starting to long at this but I don't see the error and therefore can't resolve it.
import requests, re
from bs4 import BeautifulSoup
r=requests.get("http://www.wtatennis.com/stats")
c=r.content
soup=BeautifulSoup(c,"html.parser")
all=soup.find_all("div",{"class":"view-content"})
#find the results, names, scores
for classes in all:
position = classes.find_all('td',{"class":"views-field views-field-counter views-align-center"})[0].text
wta_name = classes.find_all('td',{"class":"views-field views-field-field-lastname views-align-left"})[0].text
current_ranking = classes.find_all('td',{"class":"views-field views-field-field-current-rank views-align-center"})[0].text
match_count = classes.find_all('td',{"class":"views-field views-field-field-matchcount views-align-center"})[0].text
aces_count = classes.find_all('td',{"class":"views-field views-field-field-aces active views-align-center"})[0].text
df_count = classes.find_all('td',{"class":"views-field views-field-field-double-faults views-align-center"})[0].text
firstserver_perc = classes.find_all('td',{"class":"views-field views-field-field-first-serve-percent views-align-center"})[0].text
firstservewon_perc = classes.find_all('td',{"class":"views-field views-field-field-first-serve-won-percent views-align-center"})[0].text
secondservewon_perc = classes.find_all('td',{"class":"views-field views-field-field-second-serve-won-percent views-align-center"})[0].text
print (position)
print (wta_name)
print (current_ranking)
print (match_count)
print (aces_count)
print (df_count)
print (firstserver_perc)
print (firstservewon_perc)
print (secondservewon_perc)
Result
1
Goerges, Julia (GER)
12
7
61
25
59.8 %
76.0 %
52.4 %
IndexError Traceback (most recent call last)
<ipython-input-6-fabdb2904a0b> in <module>()
18 current_ranking = classes.find_all('td',{"class":"views-field views-field-field-current-rank views-align-center"})[0].text
19 match_count = classes.find_all('td',{"class":"views-field views-field-field-matchcount views-align-center"})[0].text
---> 20 aces_count = classes.find_all('td',{"class":"views-field views-field-field-aces active views-align-center"})[0].text
21 df_count = classes.find_all('td',{"class":"views-field views-field-field-double-faults views-align-center"})[0].text
22 firstserver_perc = classes.find_all('td',{"class":"views-field views-field-field-first-serve-percent views-align-center"})[0].text
IndexError: list index out of range

Here are the issues I found with your code:
The line all=soup.find_all("div",{"class":"view-content"}) is using the find_all, which is wrong because there are multiple div tags with class view-content. I changed this line to use the find() function instead of the find_all() function
After fixing the issue stated in the previous bulletpoint, you will have issues specficially at the printing area(Not getting all the data, just the first record of the table you're trying to parse).
Also Note that I removed the re library imported in your code, as it wasn't needed.
Here is my attempt at your problem:
import requests
from bs4 import BeautifulSoup
c = requests.get("http://www.wtatennis.com/stats").text
soup = BeautifulSoup(c, "html.parser")
c = soup.find("div", {"class": "view-content"})
position = c.find_all('td', {"class": "views-field views-field-counter views-align-center"})
wta_name = c.find_all('td', {"class": "views-field views-field-field-lastname views-align-left"})
current_ranking = c.find_all('td', {"class": "views-field views-field-field-current-rank views-align-center"})
match_count = c.find_all('td', {"class": "views-field views-field-field-matchcount views-align-center"})
aces_count = c.find_all('td', {"class": "views-field views-field-field-aces active views-align-center"})
df_count = c.find_all('td', {"class": "views-field views-field-field-double-faults views-align-center"})
firstserver_perc = c.find_all('td', {"class": "views-field views-field-field-first-serve-percent views-align-center"})
firstservewon_perc = c.find_all('td', {"class": "views-field views-field-field-first-serve-won-percent views-align-center"})
secondservewon_perc = c.find_all('td', {"class": "views-field views-field-field-second-serve-won-percent views-align-center"})
for i in range(0, len(position)):
print(position[i].text)
print(wta_name[i].text)
print(current_ranking[i].text)
print(match_count[i].text)
print(aces_count[i].text)
print(df_count[i].text)
print(firstserver_perc[i].text)
print(firstservewon_perc[i].text)
print(secondservewon_perc[i].text)
print("***************")
Output:
1
Goerges, Julia (GER)
12
7
61
25
59.8 %
76.0 %
52.4 %
***************
2
Svitolina, Elina (UKR)
3
10
60
13
60.1 %
72.2 %
47.5 %
***************
3
Wozniacki, Caroline (DEN)
1
12
58
37
64.3 %
71.9 %
50.3 %
***************
4
Pliskova, Karolina (CZE)
5
8
53
19
63.9 %
71.6 %
47.7 %
***************
5
Barty, Ashleigh (AUS)
16
9
50
27
61.0 %
67.7 %
53.6 %
***************
6
Mertens, Elise (BEL)
20
10
43
35
65.8 %
69.1 %
46.9 %
***************
7
Siniakova, Katerina (CZE)
52
8
39
31
61.2 %
65.5 %
46.5 %
***************
8
Osaka, Naomi (JPN)
53
5
38
11
62.5 %
69.4 %
44.8 %
***************
9
Pliskova, Kristyna (CZE)
78
5
38
17
59.3 %
70.4 %
41.3 %
***************
10
Keys, Madison (USA)
14
6
37
17
61.1 %
73.9 %
46.8 %
***************
11
Bertens, Kiki (NED)
28
6
35
26
61.2 %
70.1 %
39.6 %
***************
12
Sevastova, Anastasija (LAT)
15
7
34
11
60.2 %
71.4 %
47.7 %
***************
13
Konta, Johanna (GBR)
11
6
31
22
65.6 %
66.1 %
50.0 %
***************
14
Halep, Simona (ROU)
2
12
30
27
66.1 %
68.2 %
50.3 %
***************
15
Kontaveit, Anett (EST)
27
6
29
32
63.9 %
67.3 %
48.3 %
***************
16
Strycova, Barbora (CZE)
24
10
29
25
65.6 %
64.4 %
46.7 %
***************
17
Giorgi, Camila (ITA)
63
7
26
27
59.3 %
65.8 %
48.2 %
***************
18
Sharapova, Maria (RUS)
41
7
26
36
60.0 %
70.0 %
48.0 %
***************
19
Kanepi, Kaia (EST)
66
6
25
24
56.8 %
64.3 %
50.3 %
***************
20
Watson, Heather (GBR)
75
6
25
17
62.2 %
65.0 %
50.7 %
***************

Reset GPU memory using Keras 1.2.2 with MXnet backend

I'm using AWS p2.x8large and trying to evaluate my model using k-fold cross validation.
After the first repetition my GPUs memory is full and when I try to train once again I receive a cuda memory problem.
My question is how to reset the GPU memory within the loop? I used K.clear_session() and also gc.collect() but none of them worked.
the error message:
> MXNetError Traceback (most recent call
> last) ~/anaconda3/lib/python3.6/site-packages/mxnet/symbol.py in
> simple_bind(self, ctx, grad_req, type_dict, group2ctx,
> shared_arg_names, shared_exec, shared_buffer, **kwargs) 1472
> shared_exec_handle,
> -> 1473 ctypes.byref(exe_handle))) 1474 except MXNetError as e:
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/base.py in
> check_call(ret)
> 128 if ret != 0:
> --> 129 raise MXNetError(py_str(_LIB.MXGetLastError()))
> 130
>
> MXNetError: [19:24:04] src/storage/./pooled_storage_manager.h:102:
> cudaMalloc failed: out of memory
>
> Stack trace returned 10 entries: [bt] (0)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1d57cc)
> [0x7f55ce9fe7cc] [bt] (1)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1242238)
> [0x7f55cfa6b238] [bt] (2)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1244c0a)
> [0x7f55cfa6dc0a] [bt] (3)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe4d4db)
> [0x7f55cf6764db] [bt] (4)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe549cd)
> [0x7f55cf67d9cd] [bt] (5)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe59f95)
> [0x7f55cf682f95] [bt] (6)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe5d6ee)
> [0x7f55cf6866ee] [bt] (7)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe5dcd4)
> [0x7f55cf686cd4] [bt] (8)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2261)
> [0x7f55cf605291] [bt] (9)
> /home/ubuntu/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)
> [0x7f560d6c4ec0]
>
>
> During handling of the above exception, another exception occurred:
>
> RuntimeError Traceback (most recent call
> last) <ipython-input-4-0720b69f15af> in <module>()
> 33 if val_batches.n>0:
> 34 hist = model.fit_generator(generator=train_gen, samples_per_epoch=batches.n,
> ---> 35 nb_epoch=epochs, verbose=True, validation_data=val_gen, nb_val_samples=val_batches.n,
> callbacks=callbacks)
> 36 else:
> 37 model.fit_generator(generator=train_gen, samples_per_epoch=batches.n,
>
> ~/anaconda3/lib/python3.6/site-packages/Keras-1.2.2-py3.6.egg/keras/engine/training.py
> in fit_generator(self, generator, samples_per_epoch, nb_epoch,
> verbose, callbacks, validation_data, nb_val_samples, class_weight,
> max_q_size, nb_worker, pickle_safe, initial_epoch) 1557
> outs = self.train_on_batch(x, y, 1558
> sample_weight=sample_weight,
> -> 1559 class_weight=class_weight) 1560 1561 if not
> isinstance(outs, list):
>
> ~/anaconda3/lib/python3.6/site-packages/Keras-1.2.2-py3.6.egg/keras/engine/training.py
> in train_on_batch(self, x, y, sample_weight, class_weight) 1320
> ins = x + y + sample_weights 1321
> self._make_train_function()
> -> 1322 outputs = self.train_function(ins) 1323 if len(outputs) == 1: 1324 return outputs[0]
>
> ~/anaconda3/lib/python3.6/site-packages/Keras-1.2.2-py3.6.egg/keras/engine/training.py
> in train_function(inputs) 1952 def
> _make_train_function(self): 1953 def train_function(inputs):
> -> 1954 data, label, _, data_shapes, label_shapes = self._adjust_module(inputs, 'train') 1955 1956
> batch = K.mx.io.DataBatch(data=data, label=label, bucket_key='train',
>
> ~/anaconda3/lib/python3.6/site-packages/Keras-1.2.2-py3.6.egg/keras/engine/training.py
> in _adjust_module(self, inputs, phase) 1908 if not
> self._mod.binded: 1909
> self._mod.bind(data_shapes=data_shapes, label_shapes=None,
> -> 1910 for_training=True) 1911 self._set_weights() 1912
> self._mod.init_optimizer(kvstore=self._kvstore,
> optimizer=self.optimizer)
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/module/bucketing_module.py
> in bind(self, data_shapes, label_shapes, for_training,
> inputs_need_grad, force_rebind, shared_module, grad_req)
> 322 state_names=self._state_names)
> 323 module.bind(data_shapes, label_shapes, for_training, inputs_need_grad,
> --> 324 force_rebind=False, shared_module=None, grad_req=grad_req)
> 325 self._curr_module = module
> 326 self._curr_bucket_key = self._default_bucket_key
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/module/module.py in
> bind(self, data_shapes, label_shapes, for_training, inputs_need_grad,
> force_rebind, shared_module, grad_req)
> 415 fixed_param_names=self._fixed_param_names,
> 416 grad_req=grad_req,
> --> 417 state_names=self._state_names)
> 418 self._total_exec_bytes = self._exec_group._total_exec_bytes
> 419 if shared_module is not None:
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py
> in __init__(self, symbol, contexts, workload, data_shapes,
> label_shapes, param_names, for_training, inputs_need_grad,
> shared_group, logger, fixed_param_names, grad_req, state_names)
> 229 self.num_outputs = len(self.symbol.list_outputs())
> 230
> --> 231 self.bind_exec(data_shapes, label_shapes, shared_group)
> 232
> 233 def decide_slices(self, data_shapes):
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py
> in bind_exec(self, data_shapes, label_shapes, shared_group, reshape)
> 325 else:
> 326 self.execs.append(self._bind_ith_exec(i, data_shapes_i, label_shapes_i,
> --> 327 shared_group))
> 328
> 329 self.data_shapes = data_shapes
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py
> in _bind_ith_exec(self, i, data_shapes, label_shapes, shared_group)
> 601 type_dict=input_types, shared_arg_names=self.param_names,
> 602 shared_exec=shared_exec,
> --> 603 shared_buffer=shared_data_arrays, **input_shapes)
> 604 self._total_exec_bytes += int(executor.debug_str().split('\n')[-3].split()[1])
> 605 return executor
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/symbol.py in
> simple_bind(self, ctx, grad_req, type_dict, group2ctx,
> shared_arg_names, shared_exec, shared_buffer, **kwargs) 1477
> error_msg += "%s: %s\n" % (k, v) 1478 error_msg += "%s"
> % e
> -> 1479 raise RuntimeError(error_msg) 1480 1481 # update shared_buffer
>
> RuntimeError: simple_bind error. Arguments: input_1_1: (64, 3, 224,
> 224) [19:24:04] src/storage/./pooled_storage_manager.h:102: cudaMalloc
> failed: out of memory
>
> Stack trace returned 10 entries: [bt] (0)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1d57cc)
> [0x7f55ce9fe7cc] [bt] (1)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1242238)
> [0x7f55cfa6b238] [bt] (2)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1244c0a)
> [0x7f55cfa6dc0a] [bt] (3)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe4d4db)
> [0x7f55cf6764db] [bt] (4)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe549cd)
> [0x7f55cf67d9cd] [bt] (5)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe59f95)
> [0x7f55cf682f95] [bt] (6)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe5d6ee)
> [0x7f55cf6866ee] [bt] (7)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe5dcd4)
> [0x7f55cf686cd4] [bt] (8)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2261)
> [0x7f55cf605291] [bt] (9)
> /home/ubuntu/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)
> [0x7f560d6c4ec0]

Using gc.collect I was able to limit the GPU memory footprint twice that of a single run. Without it, the memory footprint kept on increasing. I created a function for the training and evaluation, which allowed the gc to clear up the model after the result was returned.
cv_results = []
for train, test in cv_folds:
result = train_and_eval(train, test)
cv_results.append(result)
gc.collect()
Since the footprint is still double, you should look into reducing the batch size to compensate for this. You should then be able to fit everything into GPU memory.
It's worth noting that MXNet doesn't actually deallocate memory from the GPU, it adds it back into its internal memory pool for future use. So although the GPU memory usage may still look high in nvidia-smi, the memory is still free to use for MXNet. As with most computation steps, the garbage collection of this memory happens asynchronously.
If you can't keep two runs worth of memory on your GPU, you can always start subprocesses as mentioned by geoalgo, using something similar to;
from subprocess import Popen, PIPE, STDOUT
import json
def eval_on_fold(indicies):
indicies_str = json.dumps(indicies)
p = Popen(['python', 'train_and_eval.py', '--cv-indicies'], stdout=PIPE, stdin=PIPE, stderr=PIPE)
eval_metric_str = p.communicate(input=indicies_str)[0]
eval_metric = float(eval_metric_str)
return eval_metric
cv_results = []
for train, test in cv_folds:
indicies = {'train': train, 'test': test}
eval_metric = eval_on_fold(indicies)
cv_results.append(eval_metric)

uWSGI lazy-apps and ThreadPool

Running uWSGI with
$ cat /etc/uwsgi/uwsgi.cfg
[uwsgi]
callable = app
socket = /var/run/arivale-service/uwsgi.sock
chmod-socket = 666
pidfile = /var/run/arivale-service/uwsgi.pid
master = true
enable-threads = true
single-interpreter = true
thunder-lock
need-app
processes = 4
Without lazy-apps enabled, a request to the calling following endpoint hangs
import boto3
# ...irrelevant imports
from multiprocessing.dummy import Pool as ThreadPool
POOL = ThreadPool(6)
# ...irrelevant setup
def get_ecs_task_definitions(service_name):
ecs_service_name, _ = get_ecs_service_name_and_cluster(service_name)
def get_task_definition(task_definition_arn):
formatted_task_definition = {}
task_definition = ECS.describe_task_definition(taskDefinition=task_definition_arn)['taskDefinition']
# ...
# build formatted_task_definition from task_definition
# ...
return formatted_task_definition
task_definition_arns = ECS.list_task_definitions(
familyPrefix=ecs_service_name, status='ACTIVE')['taskDefinitionArns']
return POOL.map(get_task_definition, task_definition_arns)
#service.api('/api/services/<service_name>/ecs/task-definitions')
def get_task_definitions(service_name):
return {'service_name': service_name, 'task_definitions': get_ecs_task_definitions(service_name)}
NGINX is balancing uWSGI apps, and this is what it says in error.log (NGINX):
Jun 10 03:54:26 93e04e04e2cf nginx_error: 2017/06/10 03:54:26 [error] 49#49: *33 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.16.254.95, server: localhost, request: "GET /api/services/data-analysis-service/ecs/task-definitions HTTP/1.1",upstream: "uwsgi://unix:/var/run/arivale-service/uwsgi.sock", host: "devops-service.arivale.com"
Each request to the endpoint hangs a worker (below is the output of uwsgitop after 2 requests):
uwsgi-2.0.15 - Sat Jun 10 21:26:10 2017 - req: 0 - RPS: 0 - lq: 0 - tx: 0
node: localhost - cwd: /var/www/arivale-service/web - uid: 0 - gid: 0 - masterpid: 45
WID % PID REQ RPS EXC SIG STATUS AVG RSS VSZ TX ReSpwn HC RunT LastSpwn
1 0.0 74 0 0 0 0 busy 0ms 0 0 0 1 0 0 21:23:20
2 0.0 75 0 0 0 0 busy 0ms 0 0 0 1 0 0 21:23:20
3 0.0 76 0 0 0 0 idle 0ms 0 0 0 1 0 0 21:23:20
4 0.0 77 0 0 0 0 idle 0ms 0 0 0 1 0 0 21:23:20
Enabling lazy-apps fixes the issue. Does anyone know with certainty why?

This happened because uWSGI workers were unable to access the thread pool created in the master, see https://stackoverflow.com/a/39890941/2419669.
Using #postfork fixes this:
global THREAD_POOL = None
#postfork
def _make_thread_pool():
global THREAD_POOL
THREAD_POOL = ThreadPool(8)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

pocketsphinx hotword detection not working - cmusphinx

Related

Why do I get a view error when enumerating a Dataframe

Can't display image when using OpenCV

Scraping issue: Beatifulsoup; IndexError: list index out of range

Reset GPU memory using Keras 1.2.2 with MXnet backend

uWSGI lazy-apps and ThreadPool

Categories

Resources