GitBook: Why do appear `1.` prefixes and how can I delete them? - gitbook

This is my SUMMARY.md file:
# ОГЛАВЛЕНИЕ
1. [Введение](README.md)
1. [Назначение](chapters/001-introduction/README.md)
1. [Соглашения, принятые в документах](chapters/001-introduction/agreements.md)
1. [Границы проекта](chapters/001-introduction/project-borders.md)
1. [Ссылки](chapters/001-introduction/references.md)
1. [Общее описание](chapters/002-common-description/README.md)
1. [Общий взгляд на продукт](chapters/002-common-description/overview.md)
1. [Классы и характеристики пользователей](chapters/002-common-description/users.md)
1. [Операционная среда](chapters/002-common-description/system-requirements.md)
1. [Ограничения дизайна и реализации](chapters/002-common-description/restrictions.md)
1. [Предположения и зависимости](chapters/002-common-description/assumptions-and-dependencies.md)
This is the PDF publishing result:
Why do appear 1. prefixes and how can I delete them?

Related

Difference in output of spacy nlp .vector when applied on sentence?

I am doing the following:
import spicy
nlp = spacy.load("en")
doc = nlp('Hello Stack Over Flow, my name is Steve')
doc.vector:
In [1]: doc = nlp('Hello Stack Over Flow, my name is Steve')
In [2]: doc.vector
Out[2]:
array([ 1.67874452e-02, 1.43885329e-01, -1.64147541e-01, -3.52525562e-02,
1.71078995e-01, 5.81666678e-02, 1.42294103e-02, -1.58536658e-01,
-1.17119223e-01, 1.00338888e+00, -1.03455082e-01, 5.80027774e-02,
5.08872233e-02, -2.64734793e-02, -4.76809964e-02, -3.61649990e-02,
-4.25985567e-02, 4.86545563e-01, -5.22996634e-02, 2.66118869e-02,
-7.14791119e-02, 2.33504437e-02, -1.01438001e-01, 1.78358995e-03,
6.41188920e-02, -1.93965547e-02, -1.72182247e-02, -4.99197766e-02,
3.82994451e-02, 2.89904438e-02, 1.10834874e-01, 1.07230783e-01,
1.72666041e-03, 9.85269994e-02, -2.64622234e-02, 1.47332232e-02,
1.49853658e-02, -3.25594470e-02, -2.28943750e-02, -6.28201067e-02,
-4.13866527e-03, 4.12439965e-02, -1.09200180e-03, -3.77365127e-02,
3.02788876e-02, -2.47912239e-02, -3.86282206e-02, -8.49756673e-02,
8.79433304e-02, -7.35666696e-03, -2.35625561e-02, 1.29868105e-01,
-8.24742168e-02, 3.79751101e-02, 6.52077794e-03, 4.12433175e-03,
-4.44555469e-03, -8.54532197e-02, 4.30566669e-02, -4.90945578e-02,
1.08687999e-02, -3.58653292e-02, 3.19277793e-02, 1.70548886e-01,
7.04367757e-02, -1.03306666e-01, -6.25603348e-02, -4.16669573e-05,
-9.90156457e-03, 4.87144403e-02, -6.59128875e-02, 2.21944507e-03,
6.23853356e-02, -1.16886329e-02, -2.20711138e-02, 1.35971338e-01,
5.85511066e-02, -2.78507806e-02, -4.42699976e-02, 1.22686662e-01,
-4.96295579e-02, 8.47733300e-03, -1.72136649e-02, 3.73593345e-02,
1.38313353e-01, -1.81285888e-01, 8.07836726e-02, -1.01186670e-01,
1.90296680e-01, -8.37400090e-03, -4.79855575e-02, 4.62987460e-02,
4.97333193e-03, 1.08253332e-02, 1.37178123e-01, -4.36927788e-02,
-9.02644824e-03, 2.52826661e-02, -2.60283332e-02, 7.33327791e-02,
-4.21555527e-02, -9.45088938e-02, -2.36399993e-02, -2.59645544e-02,
-1.17972204e-02, -7.21249953e-02, -1.62978880e-02, 4.46572453e-02,
8.05888604e-03, 1.73073336e-02, -1.11245394e-01, -1.35631096e-02,
4.26412188e-02, -1.24742221e-02, -4.93782237e-02, -3.84650044e-02,
9.32500139e-03, -2.58344412e-02, 5.39288903e-03, -2.51024440e-02,
-1.68177821e-02, 1.81681886e-02, 6.95144460e-02, 5.96744493e-02,
1.28178876e-02, 8.18611085e-02, 2.03688871e-02, -1.45592675e-01,
-2.97091678e-02, 1.67966553e-03, 2.56901123e-02, -1.57507751e-02,
-3.29821557e-02, 3.69144455e-02, 2.69458871e-02, -7.87097737e-02,
-3.22544426e-02, 9.35557822e-04, 2.51506642e-02, -1.39920013e-02,
-5.63631117e-01, 1.28184333e-01, 8.25011209e-02, 4.69026715e-02,
-2.58401129e-02, 3.11454497e-02, 7.81277791e-02, -1.18433349e-02,
2.19431128e-02, 2.38199951e-03, -2.19482221e-02, 5.75609989e-02,
1.32304668e-01, 4.28974479e-02, -1.32128010e-02, 4.54772264e-02,
-9.00077820e-02, -7.34564438e-02, -8.14672261e-02, -5.10835573e-02,
-3.27358916e-02, 2.09213328e-02, 5.85612208e-02, -2.49340013e-02,
-1.03430830e-01, -1.28346771e-01, 4.52880040e-02, 5.96577907e-03,
1.12773672e-01, -3.90797779e-02, -5.79966642e-02, 4.97789842e-05,
2.49000057e-03, -2.88800001e-02, -9.96003374e-02, 3.41123343e-02,
-3.62301096e-02, -7.10571110e-02, -5.67906946e-02, 4.61289100e-03,
7.72120059e-02, -1.36105552e-01, -6.25717789e-02, -8.04037750e-02,
2.12122276e-02, -6.30133413e-03, -9.87700000e-02, 6.31399453e-02,
-8.64481106e-02, -4.26407792e-02, -8.36099982e-02, 1.07030040e-02,
-1.34339988e-01, 6.82333438e-03, 5.62012270e-02, 6.89233318e-02,
5.61566688e-02, -9.32652280e-02, 6.18273281e-02, 1.12723336e-01,
-1.04766667e-01, -2.15716790e-02, -1.15266666e-01, 4.57017794e-02,
7.47987852e-02, -9.02220607e-04, 7.75654465e-02, -2.66306698e-02,
1.93627775e-02, -4.89100069e-03, -1.43213451e-01, -6.52845576e-02,
1.64663326e-02, -5.07618897e-02, -1.49422223e-02, 4.21274304e-02,
1.06691113e-02, -5.97029589e-02, -1.20738111e-01, -1.61822215e-02,
-5.95551059e-02, 3.67141105e-02, 2.88833342e-02, 5.24356700e-02,
7.51844468e-03, -3.79579999e-02, 9.96864438e-02, 1.28289998e-01,
1.56755541e-02, -1.55926663e-02, -4.89732213e-02, 2.24273317e-02,
-9.15533304e-03, 7.32631087e-02, -7.48946667e-02, -1.15108885e-01,
-5.56773357e-02, -8.49866867e-03, -3.00188921e-02, 3.55113335e-02,
-4.22161110e-02, 7.19971135e-02, 3.67489979e-02, -1.00055551e-02,
7.52926618e-02, -1.43726662e-01, -4.08722041e-03, -1.49663329e-01,
1.41400262e-03, 5.52397817e-02, 8.86320025e-02, -7.44862184e-02,
-3.23222089e-03, 3.30205560e-02, 3.77681069e-02, 6.58650026e-02,
2.83081792e-02, -3.24210003e-02, 1.93070006e-02, 5.67157790e-02,
6.17166609e-02, 1.09540010e-02, 4.71896678e-02, 7.68444464e-02,
-2.51592230e-02, -4.28744499e-03, -2.40004435e-02, 3.28795537e-02,
1.25606894e-01, -6.05716556e-02, 5.52507788e-02, -2.12161113e-02,
-8.45399946e-02, -7.95067847e-02, -1.33965556e-02, -5.02544455e-02,
-3.03339995e-02, 1.19719980e-02, 6.15093298e-02, 1.11455554e-02,
1.24445252e-01, 5.54273315e-02, 1.28475904e-01, -9.19478834e-02,
-2.29498874e-02, -4.18815538e-02, 5.02915531e-02, -1.14721097e-02,
1.06602885e-01, -8.45602229e-02, -4.17976640e-02, 1.39088994e-02,
-2.19033333e-03, 7.99388885e-02, 1.08606648e-02, -1.27933361e-02,
-2.84678000e-03, -2.97433343e-02, -8.61347839e-02, 9.06177703e-03],
dtype=float32)
But when I running the following I get:
In [3]: for token in doc: print("{} : {}".format(token, token.vector[:3]))
Hello : [0. 0. 0.]
Stack : [0. 0. 0.]
Over : [0. 0. 0.]
Flow : [0. 0. 0.]
, : [-0.082752 0.67204 -0.14987 ]
my : [ 0.08649 0.14503 -0.4902 ]
name : [ 0.23231 -0.024102 -0.83964 ]
is : [-0.084961 0.502 0.0023823]
Steve : [0. 0. 0.]
Please advise why do I get different representations?
The first vector is whole sentence representation?
Please explain me why do I get different vectors?
The solution is: A real-valued meaning representation. Defaults to an average of the token vectors.
Source: https://spacy.io/api/doc#vector
Hope it will help others too.

How to decode encoded text from a web page

I have an API page of my school site and I want to parse the data from it but the page looks like this and I don't know how to decode this text using python.
(The encoded letters are cyryllic)
The data from the page(it looks like this even in the browser):
\u0421\u0434\u0430\u0442\u044c \u043f\u043e\u0441\u043b\u0435 \u043a\u0430\u043d\u0438\u043a\u0443\u043b, 15 \u0430\u043f\u0440\u0435\u043b\u044f. <br />\r\n\u0423\u0431\u0435\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u0430\u044f \u043f\u0440\u043e\u0441\u044c\u0431\u0430 \u043e\u0444\u043e\u0440\u043c\u043b\u044f\u0442\u044c \u0440\u0435\u0448\u0435\u043d\u0438\u0435 "\u043a\u0430\u043a \u043f\u043e\u043b\u043e\u0436\u0435\u043d\u043e" \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0442\u0435\u043c "\u043a\u0430\u043a \u0443\u0447\u0438\u043b\u0438", \u0430 \u043d\u0435 \u0442\u0430\u043a, \u0431\u0443\u0434\u0442\u043e \u0431\u044b \u0432\u044b \u0435\u0433\u043e \u043d\u0430 \u043a\u043e\u043b\u0435\u043d\u043a\u0435 \u0437\u0430 5 \u043c\u0438\u043d\u0443\u0442 \u043f\u0435\u0440\u0435\u0434 \u0441\u0434\u0430\u0447\u0435\u0439 \u0434\u0435\u043b\u0430\u043b\u0438. \u041f\u0438\u0441\u0430\u0442\u044c \u0440\u0430\u0437\u0431\u043e\u0440\u0447\u0438\u0432\u043e \u0438 \u0430\u043a\u043a\u0443\u0440\u0430\u0442\u043d\u043e.
The data that I want to get:
Сдать после каникул, 15 апреля. <br />\r\nУбедительная просьба оформлять решение "как положено" в соответствии с тем "как учили", а не так, будто бы вы его на коленке за 5 минут перед сдачей делали. Писать разборчиво и аккуратно.
Prepared by #APIuz team
#For python3
import json
import io
text = '\u0421\u0434\u0430\u0442\u044c \u043f\u043e\u0441\u043b\u0435 \u043a\u0430\u043d\u0438\u043a\u0443\u043b, 15 \u0430\u043f\u0440\u0435\u043b\u044f. <br />\r\n\u0423\u0431\u0435\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u0430\u044f \u043f\u0440\u043e\u0441\u044c\u0431\u0430 \u043e\u0444\u043e\u0440\u043c\u043b\u044f\u0442\u044c \u0440\u0435\u0448\u0435\u043d\u0438\u0435 "\u043a\u0430\u043a \u043f\u043e\u043b\u043e\u0436\u0435\u043d\u043e" \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0442\u0435\u043c "\u043a\u0430\u043a \u0443\u0447\u0438\u043b\u0438", \u0430 \u043d\u0435 \u0442\u0430\u043a, \u0431\u0443\u0434\u0442\u043e \u0431\u044b \u0432\u044b \u0435\u0433\u043e \u043d\u0430 \u043a\u043e\u043b\u0435\u043d\u043a\u0435 \u0437\u0430 5 \u043c\u0438\u043d\u0443\u0442 \u043f\u0435\u0440\u0435\u0434 \u0441\u0434\u0430\u0447\u0435\u0439 \u0434\u0435\u043b\u0430\u043b\u0438. \u041f\u0438\u0441\u0430\u0442\u044c \u0440\u0430\u0437\u0431\u043e\u0440\u0447\u0438\u0432\u043e \u0438 \u0430\u043a\u043a\u0443\u0440\u0430\u0442\u043d\u043e.'
io.open("APIuz.txt", "w", encoding="utf-8").write(json.dumps(text, ensure_ascii=False))
print(open("APIuz.txt", "r").read())
#For python2
print(repr(u'\u0421\u0434\u0430\u0442\u044c \u043f\u043e\u0441\u043b\u0435 \u043a\u0430\u043d\u0438\u043a\u0443\u043b, 15 \u0430\u043f\u0440\u0435\u043b\u044f. <br />\r\n\u0423\u0431\u0435\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u0430\u044f \u043f\u0440\u043e\u0441\u044c\u0431\u0430 \u043e\u0444\u043e\u0440\u043c\u043b\u044f\u0442\u044c \u0440\u0435\u0448\u0435\u043d\u0438\u0435 "\u043a\u0430\u043a \u043f\u043e\u043b\u043e\u0436\u0435\u043d\u043e" \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0442\u0435\u043c "\u043a\u0430\u043a \u0443\u0447\u0438\u043b\u0438", \u0430 \u043d\u0435 \u0442\u0430\u043a, \u0431\u0443\u0434\u0442\u043e \u0431\u044b \u0432\u044b \u0435\u0433\u043e \u043d\u0430 \u043a\u043e\u043b\u0435\u043d\u043a\u0435 \u0437\u0430 5 \u043c\u0438\u043d\u0443\u0442 \u043f\u0435\u0440\u0435\u0434 \u0441\u0434\u0430\u0447\u0435\u0439 \u0434\u0435\u043b\u0430\u043b\u0438. \u041f\u0438\u0441\u0430\u0442\u044c \u0440\u0430\u0437\u0431\u043e\u0440\u0447\u0438\u0432\u043e \u0438 \u0430\u043a\u043a\u0443\u0440\u0430\u0442\u043d\u043e.').decode('unicode-escape'))
#The code is just what you think
#>>> u'Сдать после каникул, 15 апреля. <br /> Убедительная просьба оформлять решение "как положено" в соответствии с тем "как учили", а не так, будто бы вы его на коленке за 5 минутперед сдачей делали. Писать разборчиво и аккуратно.'
Simplest solution in my case:
some_string.encode('utf-8').decode()

Tweepy cursor .pages() with api.search_users returning same page again and again

auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
user_objs = []
name = "phungsuk wangdu"
id_strs = {}
page_no = 0
try:
for page in tweepy.Cursor(api.search_users, name).pages(3):
dup_count = 0
print("******* Page", str(page_no))
print("Length of page", len(page))
user_objs.extend(page)
for user_obj in page:
id_str = user_obj._json['id_str']
if id_str in id_strs:
# print("Duplicate for:", id_str, "from page number:", id_strs[id_str])
dup_count += 1
else:
# print(id_str)
id_strs[id_str] = page_no
time.sleep(1)
print("Duplicates in page", str(page_no), str(dup_count))
page_no += 1
except Exception as ex:
print(ex)
With the above code, I am trying to get the search results for users using tweepy(Python 3.5.2, tweepy 3.5.0) cursor. The results are being duplicated with the pages parameter being passed. Is it the right way to query the search_users using the tweepy cursor? I am getting results for the above code with the following pattern:
1. for low search results(name = "phungsuk wangdu") (There are actually 9 results returned for manual search on twitter website):
******* Page 0
Length of page 2
Duplicates in page 0 0
******* Page 1
Length of page 2
Duplicates in page 1 2
******* Page 2
Length of page 2
Duplicates in page 2 2
******* Page 3
Length of page 2
Duplicates in page 3 2
2. for high search results (name = "jon snow")
******* Page 0
Length of page 20
Duplicates in page 0 0
******* Page 1
Length of page 20
Duplicates in page 1 20
******* Page 2
Length of page 20
Duplicates in page 2 0
******* Page 3
Length of page 20
Duplicates in page 3 0
Try adding this attribute to the Cursor; it should reduce the duplicates.
q= <your query> +" -filter:retweets"
There are two issues here.
Tweepy's pageiterator for cursor starts pagenumber from 0 while python's page number starts from 1.
Python returns results from the last available page for page numbers that are greater than available results.
I made a pull request to tweepy with both the fixes.

Cyrillic Alphabet to English or Latin

I hope you are all well. I have an Excel sheet that contains Cyrillic alphabet. I would like to get it to English/Latin Is there any easy way that Excel can do this? Thank you in advance for the help.
Your screenshot shows no Cyrillic letters: those would look like e.g. the Ukrainian alphabet letters: А а Б б В в Г г Ґ ґ Д д Е е Є є Ж ж З з И и І і Ї ї Й й К к Л л М м Н н О о П п Р р С с Т т У у Ф ф Х х Ц ц Ч ч Ш ш Щ щ Ь ь Ю ю Я я
You are victim of flagrant mojibake case rather as shown in next example; 39142772.txt file contains some accented characters (all Central Europe Latin). The file based on lines 1, 10 and 23 of your data, retyped to Czech and Hungarian valid names and is saved with UTF-8 encoding:
==> chcp 65001
Active code page: 65001
==> type D:\test\39142772.txt
1 STÁTNÍ ÚSTAV PRO KONTROLU LÉČIV
10 Pikó, Béla
23 Móricz, István
==> chcp 1252
Active code page: 1252
==> type D:\test\39142772.txt
1 STÃTNà ÚSTAV PRO KONTROLU LÉČIV
10 Pikó, Béla
23 Móricz, István
==>
Explanation: chcp command changes the active console Code Page;
chcp 65001 (UTF-8): file is displayed properly;
chcp 1252 (West European Latin): accented characters in file are displayed mojibake transformed exactly as shown in your screenshot;
the same mojibake transformation would happen if you import a .txt or .csv file into Excel using wrong encoding.
Solution: import a .txt or .csv file into Excel using proper encoding. Procedure is described here: Is it possible to force Excel recognize UTF-8 CSV files automatically?.

How to remove a line in a document opened in vim if certain word occurs in it?

I am having a text document opened in vim and it is like this:
...100 .Z.... 0. ....01 .506
...100 ....04 1. ....05 .182
...100 ....55 .312
...101 .Z.... -3280. ....01 .638
...101 ....04 1. ....05 .05
...101 ....55 .312
...102 .Z.... 3310. ....01 -1.
...103 .Z.... -1890. ....05 .92
...103 ....30 1. ....49 -9.5
...103 ....52 -.042 ....53 -.063
...103 ....55 .08
...104 ....34 .825 ....35 .175
...104 ....40 1. ....51 16.
...105 ....35 .175 ....40 1.
...105 ....46 .825 ....51 21.
...106 .Z.... -1890. ....06 1.
...106 ....30 1. ....49 3.6
...106 ....52 -.042 ....53 -.063
...107 .Z.... -903. ....06 1.
...107 ....38 1.
...108 ....06 1. ....50 -.8
...109 .Z.... 432. ....31 -1.23
I want to remove the lines which have .Z.... . Is there a quick way to do it in Vim?
Should have googled better
:g/.Z..../d
solves the problem.
You can also try this in the command mode:
:%s/^.*\.Z\.\.\..*$//g
note: \. escaping . metacharacter; .* - any character; ^ from beginning $ to the end of line

Resources