Sphinx search gives no results - search

I have a new index on a new table which is not returning any results. This is very odd, and I've never run into this problem before. Other indices (which are built almost identically) are searchable just fine from the search CLI and my API.
Here's my conf file
source topicalindex
{
type = pgsql
sql_host = localhost
sql_user = user
sql_pass = password
sql_db = db
sql_port = 5432 # optional, default is 3306
sql_query = SELECT id, topic, reference, start_ref, end_ref, see_also_html FROM my_topicalindex
sql_attr_uint = topic
sql_attr_uint = reference
}
index topicalindex_index
{
source = topicalindex
path = /path/to/data/topical_index
docinfo = extern
charset_type = utf-8
}
indexer
{
mem_limit = 32M
}
searchd
{
listen = 3312
log = /path/to/searchd.log
query_log = /path/to/query.log
read_timeout = 5
max_children = 30
pid_file = /usr/local/var/searchd.pid
max_matches = 30000
seamless_rotate = 1
preopen_indexes = 0
unlink_old = 1
}
Here's a excerpt proving there's content in the DB
[myself]:myapp(master)$ psql -d mydb -h localhost
psql (9.2.2)
Type "help" for help.
esv=# SELECT * FROM my_topicalindex LIMIT 1;
id | topic | reference | start_ref | end_ref | see_also_html
------+---------+--------------------+-----------+---------+---------------
2810 | Abraham | Genesis chs. 11–25 | 1011001 | 1025034 | blank
(1 row)
Here's the indexing process:
$ indexer --rotate --all --config /path/to/sphinx-topical.conf
Sphinx 0.9.9-rc2 (r1785)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file '/path/to/sphinx-topical.conf'...
indexing index 'topicalindex_index'...
collected 2809 docs, 0.1 MB
sorted 0.0 Mhits, 100.0% done
total 2809 docs, 75007 bytes
total 0.067 sec, 1117456 bytes/sec, 41848.54 docs/sec
total 3 reads, 0.000 sec, 47.0 kb/call avg, 0.0 msec/call avg
total 7 writes, 0.000 sec, 37.0 kb/call avg, 0.0 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=79833).
And the files showing they have content
[myself]:myapp(master)$ ll /path/to/data/
total 160
drwxr-xr-x 10 myself admin 340 Aug 29 08:56 ./
drwxr-xr-x 3 myself admin 102 Jun 1 2012 ../
-rw-r--r-- 1 myself admin 33708 Aug 29 08:56 topical_index.spa
-rw-r--r-- 1 myself admin 51538 Aug 29 08:56 topical_index.spd
-rw-r--r-- 1 myself admin 326 Aug 29 08:56 topical_index.sph
-rw-r--r-- 1 myself admin 15721 Aug 29 08:56 topical_index.spi
-rw-r--r-- 1 myself admin 0 Aug 29 08:56 topical_index.spk
-rw------- 1 myself admin 0 Aug 29 08:56 topical_index.spl
-rw-r--r-- 1 myself admin 0 Aug 29 08:56 topical_index.spm
-rw-r--r-- 1 myself admin 52490 Aug 29 08:56 topical_index.spp
and then -- my search with 0 results
[myself]:myapp(master)$ search -i topicalindex_index -a "Abraham"
Sphinx 0.9.9-rc2 (r1785)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file '/usr/local/etc/sphinx.conf'...
index 'topicalindex_index': query 'Abraham ': returned 0 matches of 0 total in 0.000 sec
words:
1. 'abraham': 0 documents, 0 hits
Why am I getting 0 results when I search "Abraham". (Strangely, I do get results when searching "a")
Is this a problem with my conf file? Is it something else?
EDIT
I've noticed that when searching "a" it is only searching against those rows which do have content for the see_also_html field. The number of matches matches the count of fields which have data for each column.

sql_query = SELECT id, topic, reference, start_ref, end_ref, see_also_html FROM my_topicalindex
sql_attr_uint = topic
sql_attr_uint = reference
id | topic | reference | start_ref | end_ref | see_also_html
------+---------+--------------------+-----------+---------+---------------
2810 | Abraham | Genesis chs. 11–25 | 1011001 | 1025034 | blank
You've made topic and reference integer attributes. (So they won't be included as full-text fields)
Columns not made an attribute (excluding the first column!) are automatically fields.
So start_ref, end_ref and see_also_html will be full-text fields, and therefore searchable.

OUT OF TOPIC ANSWER
Note: for more readability in your file
you can replace
sql_query = SELECT id, topic, reference, start_ref, end_ref, see_also_html FROM my_topicalindex
by
sql_query = SELECT id, \
topic, \
reference, \
start_ref, \
end_ref, \
see_also_html \
FROM my_topicalindex
Just be careful \ would be the last character (not space)
Sample of statement where I'm happy to use this notation
sql_query = \
SELECT /* sphinx index article Video */ \
`article`.id AS id, \
UNIX_TIMESTAMP(`article`.fromDate) AS ressource_date, \
unix_timestamp(now()) AS indexing_date, \
`programs`.id AS emission_id, \
IFnull(cat.parent_id, cat.id) AS genre_id, \
`article`.`title` AS `title_str`, \
`article`.`title` AS `title`, \
`article`.`feed` AS `feed_ordinal`, \
`article`.`feed` AS `feed`, \
`article`.`category` AS `category`, \
`article`.`dossierId` AS `dossierId`, \
`article`.`mainImage` AS `mainImage`, \
group_concat(`keywords`.`translation`) AS `tags`, \
group_concat(o.id) AS `video_id_list`, \
concat(' summary:', \
`article`.summary, \
' synopsis', \
`article`.synopsis, \
' Summary Title :', \
`article`.summaryTitle, \
' Url', \
`article`.title_url, \
' Main', \
`article`.mainParagraph) AS content, \
concat( ' Signature:', \
`article`.signature) AS subcontent, \
UNIX_TIMESTAMP(article.fromDate) AS online_from_date, \
if ( UNIX_TIMESTAMP(article.toDate)<2000000000, \
UNIX_TIMESTAMP(article.toDate), \
2000000000) AS online_to_date \
FROM \
`article`.`article` \
LEFT JOIN \
`media`.`programs` \
ON \
`article`.`category` = `programs`.`articleReferenceCategory` \
LEFT JOIN \
`media`.`category` AS cat \
ON `programs`.mainCategoryId = cat.id \
LEFT JOIN \
article.article_keyword AS ak \
ON \
ak.articleId = article.id \
LEFT JOIN \
article.keyword AS keywords \
ON \
ak.keywordId = keywords.id \
AND `keywords`.`visible` = 1 \
AND `keywords`.`translation` !='' \
LEFT JOIN \
article.embed AS ae \
ON \
ae.articleId = article.id \
INNER JOIN \
media.`objects` AS o \
ON \
o.id = SUBSTRING_INDEX( SUBSTRING(ae.code, 5+LOCATE('rel="',ae.code)), '"', 1) \
AND o.type='video' \
AND o.status='complete' \
AND o.active=1 \
AND ( o.part_type IS NULL OR o.part_type = 'extract') \
AND o.wildcard not like 'www.%.be/presse%' \
\
WHERE \
`article`.`validated` = 1 \
AND `article`.`published` = 1 \
AND `article`.`deleted` = 0 \
AND `article`.`displayDate` < Now() \
AND `article`.`fromDate` < Now() \
AND `article`.`toDate` > Now() \
AND `article`.id >= $start \
AND `article`.id < $end \
GROUP BY `article`.id

Related

How to Identify This Pyspark Coding Issue - No viable alternative at input

Trying to run the following code in Azure Synapse pyspark and receive the parsing error, it doesn't seem like Synapse accept the double brackets, anyone know how to fix it?
def curated_report(entity_name):
sqlstr ="WITH Participant_Name \
AS (SELECT \
CASEID, \
PARTICIPANTID, \
LASTWRITTEN, \
PARTICIPANT, \
FIRSTNAME, \
MIDDLENAME, \
LASTNAME \
FROM (SELECT \
ab.CASEID, \
ab.PARTICIPANTID, \
ab.DYNAMICDATATYPE, \
ab.DYNAMICEVIDENCEVALUE, \
ab.LASTWRITTEN \
FROM a.ev ab \
INNER JOIN (SELECT \
PARTICIPANTID, \
MAX(LASTWRITTEN) AS MAXDATE \
FROM a.bd \
where TYPE in ( 'PDC001' ) \
GROUP BY PARTICIPANTID) cd \
ON ab.PARTICIPANTID = cd.PARTICIPANTID \
AND ab.LASTWRITTEN = cd.MAXDATE \
GROUP BY ab.CASEID, \
ab.PARTICIPANTID, \
ab.DYNAMICDATATYPE, \
ab.DYNAMICEVIDENCEVALUE, \
ab.LASTWRITTEN) AS SOURCE \
PIVOT(max(DYNAMICEVIDENCEVALUE) \
FOR DYNAMICDATATYPE IN (PARTICIPANT, \
FIRSTNAME, \
MIDDLENAME,\
LASTNAME) \
)AS RESULT) \ <----*this line seems to be causing error*
SELECT* \
FROM PARTICIPANT_NAME"
df = spark.sql(sqlstr)
return df
*solved.
ParseException:
no viable alternative at input 'WITH Participant_Name AS (SELECT ...
Remove as RESULT from the query. For CTE table, it is sufficient to give with CTE table name.
Error Screenshot
I tried to repro this with similar script and got the same error.
In order to avoid this error, I removed the alias name as Result from the Query and it is executed successfully.
Corrected Code
sqlstr ="WITH Participant_Name \
AS (SELECT \
CASEID, \
PARTICIPANTID, \
LASTWRITTEN, \
PARTICIPANT, \
FIRSTNAME, \
MIDDLENAME, \
LASTNAME \
FROM (SELECT \
ab.CASEID, \
ab.PARTICIPANTID, \
ab.DYNAMICDATATYPE, \
ab.DYNAMICEVIDENCEVALUE, \
ab.LASTWRITTEN \
FROM enhanced.BDMCASEEVIDENCE ab \
INNER JOIN (SELECT \
PARTICIPANTID, \
MAX(LASTWRITTEN) AS MAXDATE \
FROM enhanced.BDMCASEEVIDENCE \
where EVIDENCETYPE in ( 'PDC0000258' ) \
GROUP BY PARTICIPANTID) cd \
ON ab.PARTICIPANTID = cd.PARTICIPANTID \
AND ab.LASTWRITTEN = cd.MAXDATE \
GROUP BY ab.CASEID, \
ab.PARTICIPANTID, \
ab.DYNAMICDATATYPE, \
ab.DYNAMICEVIDENCEVALUE, \
ab.LASTWRITTEN) AS SOURCE \
PIVOT(max(DYNAMICEVIDENCEVALUE) \
FOR DYNAMICDATATYPE IN ('PARTICIPANT', \
'FIRSTNAME', \
'MIDDLENAME',\
'LASTNAME') \
)) \
SELECT* \
FROM PARTICIPANT_NAME"
df = spark.sql(sqlstr)

How to cross-compile a rust application for ARM which uses Rocket web-server and requires nightly toolchain using Yocto?

I want to compile myRustApp which uses Rocket webserver in my meta layer. The issu I'm having is that openembedded does not support nightly release of Rust, which is required to run Rocket web-server.
The alternative was to use meta-rust, but it got me no where, because if I understand correctly the meta layer only supports native builds.
So I ended up using meta-rust-bin layer with pre-built nightly toolchain.
I was able to build nightly by executing ./build-new-version.sh nightly under meta-rust-bin.
After all this my recipe build by bitbake returns an Error:
WARNING: /home/jonas/project/sources/openembedded-core/meta/recipes-devtools/rust/rust-cross_1.59.0.bb: Exception during build_dependencies for TARGET_LLVM_FEATURES
WARNING: /home/jonas/project/sources/openembedded-core/meta/recipes-devtools/rust/rust-cross_1.59.0.bb: Error during finalise of /home/jonas/project/sources/openembedded-core/meta/recipes-devtools/rust/rust-cross_1.59.0.bb
ERROR: ExpansionError during parsing /home/jonas/project/sources/openembedded-core/meta/recipes-devtools/rust/rust-cross_1.59.0.bb
Traceback (most recent call last):
File "Var <TARGET_LLVM_FEATURES>", line 1, in <module>
File "/home/jonas/project/sources/openembedded-core/meta/recipes-devtools/rust/rust-common.inc", line 117, in llvm_features(d=<bb.data_smart.DataSmart object at 0x7fe03b341f60>):
return ','.join(llvm_features_from_tune(d) +
> llvm_features_from_cc_arch(d) +
llvm_features_from_target_fpu(d))
File "/home/jonas/project/sources/openembedded-core/meta/recipes-devtools/rust/rust-common.inc", line 36, in llvm_features_from_tune(d=<bb.data_smart.DataSmart object at 0x7fe03b341f60>):
> if target_is_armv7(d):
f.append('+v7')
bb.data_smart.ExpansionError: Failure expanding variable TARGET_LLVM_FEATURES, expression was ${#llvm_features(d)} which triggered exception NameError: name 'target_is_armv7' is not defined
The variable dependency chain for the failure is: TARGET_LLVM_FEATURES
ERROR: Parsing halted due to errors, see error messages above
WARNING: /home/jonas/project/sources/openembedded-core/meta/recipes-devtools/rust/rust-cross-canadian_1.59.0.bb: Exception during build_dependencies for TARGET_LLVM_FEATURES
WARNING: /home/jonas/project/sources/openembedded-core/meta/recipes-devtools/rust/rust-cross-canadian_1.59.0.bb: Error during finalise of /home/jonas/project/sources/openembedded-core/meta/recipes-devtools/rust/rust-cross-canadian_1.59.0.bb
My question is:
What causes this error I'm getting? How can I fix it?
Did someone already tried to cross-compile Rust nightly apps? Are there any examples anywhere?
My Cargo.toml:
[package]
name = "MyRustApp"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
rocket = "0.4.10"
rocket_cors = "0.5.1"
serde = "1.0"
serde_json = "1.0"
serde_derive = "1.0"
[dependencies.rocket_contrib]
version = "0.4.10"
default-features = false
features = ["json"]
[profile.release]
strip = true
opt-level = "z"
lto = true
My recipe generated by cargo bitbake:
# Auto-Generated by cargo-bitbake 0.3.16
#
inherit cargo
# If this is git based prefer versioned ones if they exist
# DEFAULT_PREFERENCE = "-1"
# how to get strain-webserver could be as easy as but default to a git checkout:
# SRC_URI += "crate://crates.io/MyRustApp/0.1.0"
SRC_URI += "git://git#bitbucket.org/work/MyRustApp.git;protocol=ssh;nobranch=1;branch=main"
SRCREV = "xxx"
S = "${WORKDIR}/git"
CARGO_SRC_DIR = ""
PV:append = ".AUTOINC+d2562d3c92"
# please note if you have entries that do not begin with crate://
# you must change them to how that package can be fetched
SRC_URI += " \
crate://crates.io/aead/0.3.2 \
crate://crates.io/aes-gcm/0.8.0 \
crate://crates.io/aes-soft/0.6.4 \
crate://crates.io/aes/0.6.0 \
crate://crates.io/aesni/0.10.0 \
crate://crates.io/aho-corasick/0.7.20 \
crate://crates.io/atty/0.2.14 \
crate://crates.io/autocfg/1.1.0 \
crate://crates.io/base64/0.13.1 \
crate://crates.io/base64/0.9.3 \
crate://crates.io/bitflags/1.3.2 \
crate://crates.io/block-buffer/0.9.0 \
crate://crates.io/byteorder/1.4.3 \
crate://crates.io/cfg-if/0.1.10 \
crate://crates.io/cfg-if/1.0.0 \
crate://crates.io/cipher/0.2.5 \
crate://crates.io/cookie/0.11.5 \
crate://crates.io/cpufeatures/0.2.5 \
crate://crates.io/cpuid-bool/0.2.0 \
crate://crates.io/crypto-mac/0.10.1 \
crate://crates.io/ctr/0.6.0 \
crate://crates.io/devise/0.2.1 \
crate://crates.io/devise_codegen/0.2.1 \
crate://crates.io/devise_core/0.2.1 \
crate://crates.io/digest/0.9.0 \
crate://crates.io/filetime/0.2.18 \
crate://crates.io/form_urlencoded/1.1.0 \
crate://crates.io/fsevent-sys/2.0.1 \
crate://crates.io/fsevent/0.4.0 \
crate://crates.io/fuchsia-zircon-sys/0.3.3 \
crate://crates.io/fuchsia-zircon/0.3.3 \
crate://crates.io/generic-array/0.14.6 \
crate://crates.io/getrandom/0.2.8 \
crate://crates.io/ghash/0.3.1 \
crate://crates.io/glob/0.3.0 \
crate://crates.io/hashbrown/0.12.3 \
crate://crates.io/hermit-abi/0.1.19 \
crate://crates.io/hkdf/0.10.0 \
crate://crates.io/hmac/0.10.1 \
crate://crates.io/httparse/1.8.0 \
crate://crates.io/hyper/0.10.16 \
crate://crates.io/idna/0.1.5 \
crate://crates.io/idna/0.3.0 \
crate://crates.io/indexmap/1.9.2 \
crate://crates.io/inotify-sys/0.1.5 \
crate://crates.io/inotify/0.7.1 \
crate://crates.io/iovec/0.1.4 \
crate://crates.io/itoa/1.0.4 \
crate://crates.io/kernel32-sys/0.2.2 \
crate://crates.io/language-tags/0.2.2 \
crate://crates.io/lazycell/1.3.0 \
crate://crates.io/libc/0.2.137 \
crate://crates.io/log/0.3.9 \
crate://crates.io/log/0.4.17 \
crate://crates.io/matches/0.1.9 \
crate://crates.io/memchr/2.5.0 \
crate://crates.io/mime/0.2.6 \
crate://crates.io/mio-extras/2.0.6 \
crate://crates.io/mio/0.6.23 \
crate://crates.io/miow/0.2.2 \
crate://crates.io/net2/0.2.38 \
crate://crates.io/notify/4.0.17 \
crate://crates.io/num_cpus/1.14.0 \
crate://crates.io/opaque-debug/0.3.0 \
crate://crates.io/pear/0.1.5 \
crate://crates.io/pear_codegen/0.1.5 \
crate://crates.io/percent-encoding/1.0.1 \
crate://crates.io/percent-encoding/2.2.0 \
crate://crates.io/polyval/0.4.5 \
crate://crates.io/ppv-lite86/0.2.17 \
crate://crates.io/proc-macro2/0.4.30 \
crate://crates.io/proc-macro2/1.0.47 \
crate://crates.io/quote/0.6.13 \
crate://crates.io/quote/1.0.21 \
crate://crates.io/rand/0.8.5 \
crate://crates.io/rand_chacha/0.3.1 \
crate://crates.io/rand_core/0.6.4 \
crate://crates.io/redox_syscall/0.2.16 \
crate://crates.io/regex-syntax/0.6.28 \
crate://crates.io/regex/1.7.0 \
crate://crates.io/rocket/0.4.11 \
crate://crates.io/rocket_codegen/0.4.11 \
crate://crates.io/rocket_contrib/0.4.11 \
crate://crates.io/rocket_cors/0.5.2 \
crate://crates.io/rocket_http/0.4.11 \
crate://crates.io/ryu/1.0.11 \
crate://crates.io/safemem/0.3.3 \
crate://crates.io/same-file/1.0.6 \
crate://crates.io/serde/1.0.147 \
crate://crates.io/serde_derive/1.0.147 \
crate://crates.io/serde_json/1.0.89 \
crate://crates.io/sha2/0.9.9 \
crate://crates.io/slab/0.4.7 \
crate://crates.io/smallvec/1.10.0 \
crate://crates.io/state/0.4.2 \
crate://crates.io/subtle/2.4.1 \
crate://crates.io/syn/0.15.44 \
crate://crates.io/syn/1.0.103 \
crate://crates.io/time/0.1.44 \
crate://crates.io/tinyvec/1.6.0 \
crate://crates.io/tinyvec_macros/0.1.0 \
crate://crates.io/toml/0.4.10 \
crate://crates.io/traitobject/0.1.0 \
crate://crates.io/typeable/0.1.2 \
crate://crates.io/typenum/1.15.0 \
crate://crates.io/unicase/1.4.2 \
crate://crates.io/unicase/2.6.0 \
crate://crates.io/unicase_serde/0.1.0 \
crate://crates.io/unicode-bidi/0.3.8 \
crate://crates.io/unicode-ident/1.0.5 \
crate://crates.io/unicode-normalization/0.1.22 \
crate://crates.io/unicode-xid/0.1.0 \
crate://crates.io/universal-hash/0.4.1 \
crate://crates.io/url/1.7.2 \
crate://crates.io/url/2.3.1 \
crate://crates.io/version_check/0.1.5 \
crate://crates.io/version_check/0.9.4 \
crate://crates.io/walkdir/2.3.2 \
crate://crates.io/wasi/0.10.0+wasi-snapshot-preview1 \
crate://crates.io/wasi/0.11.0+wasi-snapshot-preview1 \
crate://crates.io/winapi-build/0.1.1 \
crate://crates.io/winapi-i686-pc-windows-gnu/0.4.0 \
crate://crates.io/winapi-util/0.1.5 \
crate://crates.io/winapi-x86_64-pc-windows-gnu/0.4.0 \
crate://crates.io/winapi/0.2.8 \
crate://crates.io/winapi/0.3.9 \
crate://crates.io/windows-sys/0.42.0 \
crate://crates.io/windows_aarch64_gnullvm/0.42.0 \
crate://crates.io/windows_aarch64_msvc/0.42.0 \
crate://crates.io/windows_i686_gnu/0.42.0 \
crate://crates.io/windows_i686_msvc/0.42.0 \
crate://crates.io/windows_x86_64_gnu/0.42.0 \
crate://crates.io/windows_x86_64_gnullvm/0.42.0 \
crate://crates.io/windows_x86_64_msvc/0.42.0 \
crate://crates.io/ws2_32-sys/0.2.1 \
crate://crates.io/yansi/0.5.1 \
"
# FIXME: update generateme with the real MD5 of the license file
LIC_FILES_CHKSUM = " \
"
SUMMARY = "Webserver with Rust"t"
LICENSE = "CLOSED"
# includes this file if it exists but does not fail
# this is useful for anything you may want to override from
# what cargo-bitbake generates.
include MyRustApp-${PV}.inc
include MyRustApp.inc

How to create user for connect to database

ERROR :
[FATAL] [DBT-05509] Failed to connect to the specified database (cdb21).
CAUSE: OS Authentication might be disabled for this database (cdb21).
ACTION: Specify a valid sysdba user name and password to connect to the database.
First step:
./runInstaller -silent -responseFile /scratch/app/user/product/21.0.0/dbhome_1/install/response/db_install.rsp \
oracle.install.option=INSTALL_DB_SWONLY \
UNIX_GROUP_NAME=oinstall \
ORACLE_BASE=/scratch/app/user \
INVENTORY_LOCATION=/scratch/app/oraInventory \
SELECTED_LANGUAGES=en \
oracle.install.db.InstallEdition=EE \
oracle.install.db.isCustomInstall=false \
oracle.install.db.OSDBA_GROUP=oinstall \
oracle.install.db.OSBACKUPDBA_GROUP=oinstall \
oracle.install.db.OSDGDBA_GROUP=oinstall \
oracle.install.db.OSKMDBA_GROUP=oinstall \
oracle.install.db.OSRACDBA_GROUP=oinstall \
SECURITY_UPDATES_VIA_MYORACLESUPPORT=false \
DECLINE_SECURITY_UPDATES=true
Second step:
dbca -silent -createDatabase \
-templateName General_Purpose.dbc \
-gdbname cdb21 \
-sid cdb21 \
-responseFile NO_VALUE \
-characterSet AL32UTF8 \
-sysPassword Welcome1 \
-systemPassword Welcome1 \
-createAsContainerDatabase true \
-numberOfPDBs 1 \
-pdbName pdb21 \
-pdbAdminPassword Welcome1 \
-databaseType MULTIPURPOSE \
-memoryMgmtType auto_sga \
-totalMemory 4096 \
-storageType FS \
-datafileDestination /scratch/oradata/ \
-emConfiguration NONE \
-ignorePreReqs
Start The service using :
lsnrctl start
Then :
startup

why run "python run_squad.py" doesn't work?

I want fine tune on squad with huggingface run_squad.py, but meet the following question:
1, when I use "--do_train" without "True" as following code, after 20 minutes runing,there is no models in output_dir:
!python run_squad.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--output_dir models/bert/ \
--data_dir data/squad \
--overwrite_output_dir \
--overwrite_cache \
--do_train \
--train_file train-v2.0.json \
--version_2_with_negative \
--do_lower_case \
--do_eval \
--predict_file dev-v2.0.json \
--per_gpu_train_batch_size 2 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--threads 10 \
--save_steps 5000
2, when I use "--do_train=True" as following code, the error message is "run_squad.py: error: argument --do_train: ignored explicit argument 'True'":
!python run_squad.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--output_dir models/bert/ \
--data_dir data/squad \
--overwrite_output_dir \
--overwrite_cache \
--do_train=True \
--train_file train-v2.0.json \
--version_2_with_negative \
--do_lower_case \
--do_eval \
--predict_file dev-v2.0.json \
--per_gpu_train_batch_size 2 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--threads 10 \
--save_steps 5000
3, when I use "--do_train True" as following code, the error message is "run_squad.py: error: unrecognized arguments: True":
!python run_squad.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--output_dir models/bert/ \
--data_dir data/squad \
--overwrite_output_dir \
--overwrite_cache \
--do_train True \
--train_file train-v2.0.json \
--version_2_with_negative \
--do_lower_case \
--do_eval \
--predict_file dev-v2.0.json \
--per_gpu_train_batch_size 2 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--doc_stride 128 \
--threads 10 \
--save_steps 5000
I run code in colab with GPU: Tesla P100-PCIE-16GB
Judging by the running time, I think the code didn't through training process, but I don't know how to set parameters in order to let training go.what should I do?

How is the init process started in the Linux kernel?

I am trying to understand the init process in the linux kernel which is the first process and is statically initialized with the INIT_TASK macro.
161 #define INIT_TASK(tsk) \
162 { \
163 .state = 0, \
164 .stack = &init_thread_info, \
165 .usage = ATOMIC_INIT(2), \
166 .flags = PF_KTHREAD, \
167 .prio = MAX_PRIO-20, \
168 .static_prio = MAX_PRIO-20, \
169 .normal_prio = MAX_PRIO-20, \
170 .policy = SCHED_NORMAL, \
171 .cpus_allowed = CPU_MASK_ALL, \
172 .nr_cpus_allowed= NR_CPUS, \
173 .mm = NULL, \
174 .active_mm = &init_mm, \
175 .se = { \
176 .group_node = LIST_HEAD_INIT(tsk.se.group_node), \
177 }, \
178 .rt = { \
179 .run_list = LIST_HEAD_INIT(tsk.rt.run_list), \
180 .time_slice = RR_TIMESLICE, \
181 }, \
182 .tasks = LIST_HEAD_INIT(tsk.tasks), \
183 INIT_PUSHABLE_TASKS(tsk) \
184 INIT_CGROUP_SCHED(tsk) \
185 .ptraced = LIST_HEAD_INIT(tsk.ptraced), \
186 .ptrace_entry = LIST_HEAD_INIT(tsk.ptrace_entry), \
187 .real_parent = &tsk, \
188 .parent = &tsk, \
189 .children = LIST_HEAD_INIT(tsk.children), \
190 .sibling = LIST_HEAD_INIT(tsk.sibling), \
191 .group_leader = &tsk, \
192 RCU_POINTER_INITIALIZER(real_cred, &init_cred), \
193 RCU_POINTER_INITIALIZER(cred, &init_cred), \
194 .comm = INIT_TASK_COMM, \
195 .thread = INIT_THREAD, \
196 .fs = &init_fs, \
197 .files = &init_files, \
198 .signal = &init_signals, \
199 .sighand = &init_sighand, \
200 .nsproxy = &init_nsproxy, \
201 .pending = { \
202 .list = LIST_HEAD_INIT(tsk.pending.list), \
203 .signal = {{0}}}, \
204 .blocked = {{0}}, \
205 .alloc_lock = __SPIN_LOCK_UNLOCKED(tsk.alloc_lock), \
206 .journal_info = NULL, \
207 .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \
208 .pi_lock = __RAW_SPIN_LOCK_UNLOCKED(tsk.pi_lock), \
209 .timer_slack_ns = 50000, /* 50 usec default slack */ \
210 .pids = { \
211 [PIDTYPE_PID] = INIT_PID_LINK(PIDTYPE_PID), \
212 [PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID), \
213 [PIDTYPE_SID] = INIT_PID_LINK(PIDTYPE_SID), \
214 }, \
215 .thread_group = LIST_HEAD_INIT(tsk.thread_group), \
216 INIT_IDS \
217 INIT_PERF_EVENTS(tsk) \
218 INIT_TRACE_IRQFLAGS \
219 INIT_LOCKDEP \
220 INIT_FTRACE_GRAPH \
221 INIT_TRACE_RECURSION \
222 INIT_TASK_RCU_PREEMPT(tsk) \
223 INIT_CPUSET_SEQ \
224 INIT_VTIME(tsk) \
225 }
But I am not able to figure out
how it will be executed?
Where it is scheduled and
which lines of code in the linux kernel start executing immediately when we say we have scheduled this init_task task? Is there any function which it calls?
The kernel calls "init" as one of the very last things it does during kernel initialization. The function kernel_init() in init/main.c has the logic.
You will notice that the kernel tries four different combinations of init, and expects one of them to succeed. You will also notice that you can override what the kernel executes on startup by feeding the kernel command line parameter "init". So, you can say, for example, init=/bin/mystartup on the kernel command line and start your own custom application instead of the default /sbin/init. Notice also that on most modern systems, even embedded systems, /sbin/init is a soft link that points to the real executable.
To more generally answer your question, study this source file (main.c) you can see virtually all of the details of Linux kernel initialization, after the low-level assembly stuff and platform initialization, which, beyond the educational value, you shouldn't have to touch nor care about much.
The main mechanism is to call do_execve() with fixed arguments of argv_init and envp_init. The elf file is parsed and initial program counter (PC) is set as per the file. All memory management (mm) pages are mapped to the disks backing store. The code is set to run. On the initial PC fetch when it is scheduled, a page fault is generated which reads the first code page into memory. This is the same as any other execve() call.

Resources