Why does Pycharm start a test when I name a function test_

Why does Pycharm start a test when I name a function test_ - python-3.x

Why does PyCharm suddenly wants to start a test?
My script is named 1_selection_sort.py And I'm trying to call the function test_selection_sort, and I'm just running with <current file> (Added in 2022.2.2 I assume).
I'm pretty sure this worked 24/10/2022 (Version 2022.2.2 and maybe 2022.2.3, but in 2022.2.4 it's no longer working).
Could someone please tell me when and why this was changed? Or maybe I did something wrong during installing?
My file is NOT named according to this naming scheme (https://docs.pytest.org/en/7.1.x/explanation/goodpractices.html#conventions-for-python-test-discovery):
In those directories, search for test_*.py
or *_test.py files, imported by their test
package name.
"""
Schrijf een functie selection_sort dat een lijst in dalende volgorde sorteert m.b.v. selection sort.
"""
def selection_sort(lijst):
for i in range(len(lijst)):
for j, number in enumerate(lijst):
if number < lijst[i]:
lijst[j] = lijst[i]
lijst[i] = number
return lijst
def test_selection_sort(lijst, check):
print(lijst)
result = selection_sort(lijst)
print(result)
print(check)
assert result == check
print("Begin controle selection_sort")
test_selection_sort([1, 3, 45, 32, 65, 34], [65, 45, 34, 32, 3, 1])
test_selection_sort([1], [1])
test_selection_sort([54, 29, 12, 92, 2, 100], [100, 92, 54, 29, 12, 2])
test_selection_sort([], [])
print("Controle selection_sort succesvol")
Output:
"C:\Program Files\Anaconda3\python.exe" "C:/Users/r0944584/AppData/Local/JetBrains/PyCharm Community Edition 2022.2.4/plugins/python-ce/helpers/pycharm/_jb_pytest_runner.py" --path "C:\Users\r0944584\Downloads\skeletons(4)\skeletons\1_selection_sort.py"
Testing started at 14:13 ...
Launching pytest with arguments C:\Users\r0944584\Downloads\skeletons(4)\skeletons\1_selection_sort.py --no-header --no-summary -q in C:\Users\r0944584\Downloads\skeletons(4)\skeletons
============================= test session starts =============================
collecting ... collected 1 item
1_selection_sort.py::test_selection_sort ERROR [100%]
test setup failed
file C:\Users\r0944584\Downloads\skeletons(4)\skeletons\1_selection_sort.py, line 15
def test_selection_sort(lijst, check):
E fixture 'lijst' not found
> available fixtures: anyio_backend, anyio_backend_name, anyio_backend_options, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
C:\Users\r0944584\Downloads\skeletons(4)\skeletons\1_selection_sort.py:15
========================= 1 warning, 1 error in 0.01s =========================
Process finished with exit code 1

The solution I found was to disable Pytest following this answer: https://stackoverflow.com/a/59203776/13454049
Disable Pytest for you project
Open the Settings/Preferences | Tools | Python Integrated Tools settings dialog as described in Choosing Your Testing Framework.
In the Default test runner field select Unittests.
Click OK to save the settings.

Related

Scrapy using files.middleware downloads given file without extension

I want to automate some file-exchange. I need to download a .csv-file from a website, which is secured with an authentication before you can start to download.
First, I wanted to try downloading the file, with wget, but I did not manage, so I switched to scrapy and everything works fine, my authentication and the download, BUT the file comes without an extension -.-'
here is a snippet of my spider:
def after_login(self, response):
accview = response.xpath('//span[#class="user-actions welcome"]')
if accview:
print('Logged in')
file_url = response.xpath('//article[#class="widget-single widget-shape-widget widget"]/p/a/#href').get()
file_url = response.urljoin(file_url)
items = StockfwItem()
items['file_urls'] = [file_url]
yield items
my settings.py:
ITEM_PIPELINES = {'scrapy.pipelines.files.FilesPipeline': 1}
items.py:
file_urls = scrapy.Field()
files = scrapy.Field()
The reason why I am sure, that there is a problem with my spider, is that, if I download the file regular via brower, it always comes as a regular csvfile.
When I try to open the downloaded file(filename is hashed in sha1), I get the following error_msg:
File "/usr/lib/python3.6/csv.py", line 111, in __next__
self.fieldnames
File "/usr/lib/python3.6/csv.py", line 98, in fieldnames
self._fieldnames = next(self.reader)
_csv.Error: line contains NULL byte
Also when I open the downloaded file with notepad++ and save encoding as utf-8, it works without any problems...
scrapy console output:
{'file_urls': ['https://floraworld.be/Servico.Orchard.FloraWorld/Export/Export'] ,
'files': [{'checksum': 'f56c6411803ec45863bc9dbea65edcb9',
'path': 'full/cc72731cc79929b50c5afb14e0f7e26dae8f069c',
'status': 'downloaded',
'url': 'https://floraworld.be/Servico.Orchard.FloraWorld/Export/Expo rt'}]}
2021-08-02 10:00:30 [scrapy.core.engine] INFO: Closing spider (finished)
2021-08-02 10:00:30 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 2553,
'downloader/request_count': 4,
'downloader/request_method_count/GET': 2,
'downloader/request_method_count/POST': 2,
'downloader/response_bytes': 76289,
'downloader/response_count': 4,
'downloader/response_status_count/200': 3,
'downloader/response_status_count/302': 1,
'elapsed_time_seconds': 20.892172,
'file_count': 1,
'file_status_count/downloaded': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 8, 2, 8, 0, 30, 704638),
'item_scraped_count': 1,
'log_count/DEBUG': 6,
'log_count/INFO': 10,
'log_count/WARNING': 1,
'memusage/max': 55566336,
'memusage/startup': 55566336,
'request_depth_max': 1,
'response_received_count': 3,
'scheduler/dequeued': 4,
'scheduler/dequeued/memory': 4,
'scheduler/enqueued': 4,
'scheduler/enqueued/memory': 4,
'splash/render.html/request_count': 1,
'splash/render.html/response_count/200': 1,
'start_time': datetime.datetime(2021, 8, 2, 8, 0, 9, 812466)}
2021-08-02 10:00:30 [scrapy.core.engine] INFO: Spider closed (finished)
snippet of downloaded file and opened with vim on ubuntu server:
"^#A^#r^#t^#i^#c^#l^#e^#C^#o^#d^#e^#"^#;^#"^#D^#e^#s^#c^#r^#i^#p^#t^#i^#o^#n^#"^#;^#"^#B^#B^#"^#;^#"^#K^#T^#"^#;^#"^#S^#i^#z^#e^#"^#;^#"^#P^#r^#i^#c^#e^#"^#;^#"^#S^#t^#o^#c^#k^#"^#;^#"^#D^#e^#l^#i^#v^#e^#r^#y^#D^#a^#t^#e^#"^#^M^#
^#"^#1^#0^#0^#0^#L^#"^#;^#"^#A^#l^#o^#e^# ^#p^#l^#a^#n^#t^# ^#x^# ^#2^#3^# ^#l^#v^#s^#"^#;^#"^#4^#"^#;^#"^#4^#"^#;^#"^#6^#5^#"^#;^#"^#4^#6^#,^#7^#7^#"^#;^#"^#1^#1^#8^#,^#0^#0^#0^#0^#0^#"^#;^#"^#"^#^M^#
^#"^#1^#0^#0^#0^#M^#"^#;^#"^#A^#l^#o^#e^# ^#p^#l^#a^#n^#t^# ^#x^# ^#1^#7^# ^#l^#v^#s^#"^#;^#"^#4^#"^#;^#"^#1^#2^#"^#;^#"^#5^#0^#"^#;^#"^#3^#2^#,^#6^#1^#"^#;^#"^#2^#0^#6^#,^#0^#0^#0^#0^#0^#"^#;^#"^#"^#^M^#
^#"^#1^#0^#0^#0^#S^#"^#;^#"^#A^#l^#o^#e^# ^#p^#l^#a^#n^#t^# ^#x^# ^#1^#6^# ^#l^#v^#s^#"^#;^#"^#4^#"^#;^#"^#2^#4^#"^#;^#"^#4^#0^#"^#;^#"^#2^#2^#,^#3^#2^#"^#;^#"^#-^#6^#,^#0^#0^#0^#0^#0^#"^#;^#"^#2^#3^#/^#0^#8^#/^#2^#0^#2^#1^#"^#^M^#
^#"^#1^#0^#0^#2^#M^#"^#;^#"^#B^#A^#T^#O^#N^# ^#P^#L^#A^#N^#T^# ^#6^#7^# ^#C^#M^# ^#W^#/^#P^#O^#T^#"^#;^#"^#2^#"^#;^#"^#6^#"^#;^#"^#6^#7^#"^#;^#"^#2^#2^#,^#4^#2^#"^#;^#"^#3^#3^#,^#0^#0^#0^#0^#0^#"^#;^#"^#5^#/^#0^#9^#/^#2^#0^#2^#1^#"^#^M^#
^#"^#1^#0^#0^#2^#S^#"^#;^#"^#B^#A^#T^#O^#N^# ^#P^#L^#A^#N^#T^# ^#4^#2^# ^#C^#M^# ^#W^#/^#P^#O^#T^#"^#;^#"^#4^#"^#;^#"^#1^#2^#"^#;^#"^#4^#2^#"^#;^#"^#1^#0^#,^#5^#4^#"^#;^#"^#-^#9^#5^#,^#0^#0^#0^#0^#0^#"^#;^#"^#5^#/^#0^#9^#/^#2^#0^#2^#1^#"^#^M^#
^#"^#1^#0^#0^#4^#N^#"^#;^#"^#B^#a^#t^#o^#n^# ^#P^#l^#a^#n^#t^#"^#;^#"^#2^#"^#;^#"^#2^#"^#;^#"^#9^#9^#"^#;^#"^#1^#2^#0^#,^#9^#5^#"^#;^#"^#5^#3^#,^#0^#0^#0^#0^#0^#"^#;^#"^#3^#0^#/^#0^#9^#/^#2^#0^#2^#1^#"^#^M^#
^#"^#1^#0^#0^#5^#N^#"^#;^#"^#N^#a^#t^#u^#r^#a^#l^# ^#s^#t^#r^#e^#l^#i^#t^#z^#i^#a^# ^#w^#/^#p^#o^#t^#"^#;^#"^#1^#"^#;^#"^#1^#"^#;^#"^#1^#3^#0^#"^#;^#"^#2^#0^#7^#,^#4^#4^#"^#;^#"^#1^#4^#,^#0^#0^#0^#0^#0^#"^#;^#"^#1^#/^#1^#2^#/^#2^#0^#2^#1^#"^#^M^#
what the heck is this??
When I change the filename to file.csv, downloading the file to my windoof desktop and open it with notepad++ again, it looks good:
"ArticleCode";"Description";"BB";"KT";"Size";"Price";"Stock";"DeliveryDate"
"1000L";"Aloe plant x 23 lvs";"4";"4";"65";"46,77";"118,00000";""
"1000M";"Aloe plant x 17 lvs";"4";"12";"50";"32,61";"206,00000";""
"1000S";"Aloe plant x 16 lvs";"4";"24";"40";"22,32";"-6,00000";"23/08/2021"
"1002M";"BATON PLANT 67 CM W/POT";"2";"6";"67";"22,42";"33,00000";"5/09/2021"
"1002S";"BATON PLANT 42 CM W/POT";"4";"12";"42";"10,54";"-95,00000";"5/09/2021"

for all those who suffer on the same problem:
I just hit in my terminal:
cat Inputfile | tr -d '\0' > Outputfile.csv

First of all try to change the encoding in vim:
set fileencodings=utf-8
or open it in a different text editor in your ubuntu machine, maybe it's just a problem with vim.
Second thing to do is to download the file with the correct name:
import os
from urllib.parse import unquote
from scrapy.pipelines.files import FilesPipeline
from scrapy.http import Request
class TempPipeline():
def process_item(self, item, spider):
return item
class ProcessPipeline(FilesPipeline):
# Overridable Interface
def get_media_requests(self, item, info):
urls = ItemAdapter(item).get(self.files_urls_field, [])
return [Request(u) for u in urls]
def file_path(self, request, response=None, info=None, *, item=None):
# return 'files/' + os.path.basename(urlparse(request.url).path) # from scrapy documentation
return os.path.basename(unquote(request.url)) # this is what worked for my project, but maybe you'll want to add ".csv"
also you need to change settings.py:
ITEM_PIPELINES = {
'myproject.pipelines.MyImagesPipeline': 300
}
FILES_STORE = '/path/to/valid/dir'
Try those two things and if it still doesn't work then update me please.

I think your file containing null bytes.
The issue might be:
Your items.py contains two fields file_urls and files. But, your spider yields only one item i.e., file_urls. Thus, CSV gets created with two columns (file_urls , files) but files column does not contain any data (which might causing this problem). Try commenting this line and see if it works #files = scrapy.Field().

Persistent ModuleNotFoundError: No module named after setting path to project on Windows OS

In my attempts to run coverage in the terminal coverage run mytestsuite.py -v on this code:
import unittest
import coverage
import mytestsuite as testSuite
if __name__ == '__main__':
cov = coverage.Coverage()
cov.start()
framework = unittest.TestLoader().loadTestsFromModule(testSuite)
test_result = unittest.TextTestRunner(verbosity=2).run(framework).wasSuccessful()
cov.stop()
cov.save()
print("\nTesting Concluded with result:", test_result)
cov.html_report()
I am observing the following error:
ModuleNotFoundError: No module named 'mysource.py'
where mysource.py is imported into mytestsuite.py using the following syntax:
import sourcefolder.mysource as src
If I run mytestsuite.py in the IDE, I see the following output:
---------------------------------------------------------------------- Ran 0 tests in 0.000s
OK Coverage.py warning: No data was collected. (no-data-collected)
Traceback (most recent call last): File "C:\Users\Anthem
Rukiya\IdeaProjects\DiTTo_YoutubePredictor\test\youtubePredictor_testRunner.py",
line 33, in
cov.html_report() File "C:\Users\Anthem Rukiya\AppData\Local\Programs\Python\Python39\lib\site-packages\coverage\control.py",
line 972, in html_report
return reporter.report(morfs) File "C:\Users\Anthem Rukiya\AppData\Local\Programs\Python\Python39\lib\site-packages\coverage\html.py",
line 241, in report
for fr, analysis in get_analysis_to_report(self.coverage, morfs): File "C:\Users\Anthem
Rukiya\AppData\Local\Programs\Python\Python39\lib\site-packages\coverage\report.py",
line 66, in get_analysis_to_report
raise CoverageException("No data to report.") coverage.misc.CoverageException: No data to report.
Testing Concluded with result: True
Process finished with exit code 1
I am currently running Python 3.9.2 on Windows OS in IntelliJ and have attempted setting my project file as the python path using the following command in the command prompt:
set PYTHONPATH%PYTHONPATH%;C:\path_to_my_project_folder
Any suggestions?

Unable to run pathos program from spyder IDE

I have the following simple program:
from pathos.core import connect
tunnel = connect('192.168.1.5', port=50004)
print(tunnel)
print(type(tunnel._lport))
print(tunnel._rport)
def sleepy_squared(x):
from time import sleep
sleep(1.0)
return x**2
from pathos.pp import ParallelPythonPool as Pool
p = Pool(8, servers=('192.168.1.5:6260',))
print(p.servers)
x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
y = p.map(sleepy_squared, x)
print(y)
When I try running this program from the Spyder 4 IDE I get the following error:
Tunnel('-q -N -L 4761:192.168.1.5:50004 192.168.1.5')
<class 'int'>
50004
('192.168.1.5:6260',)
Traceback (most recent call last):
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-e89974d31563>", line 20, in <module>
y = p.map(sleepy_squared, x)
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/pathos/parallel.py", line 234, in map
return list(self.imap(f, *args))
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/pathos/parallel.py", line 247, in imap
return (subproc() for subproc in list(builtins.map(submit, *args)))
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/pathos/parallel.py", line 243, in submit
return _pool.submit(f, argz, globals=globals())
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/pp/_pp.py", line 499, in submit
sfunc = self.__dumpsfunc((func, ) + depfuncs, modules)
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/pp/_pp.py", line 683, in __dumpsfunc
sources = [self.__get_source(func) for func in funcs]
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/pp/_pp.py", line 683, in <listcomp>
sources = [self.__get_source(func) for func in funcs]
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/pp/_pp.py", line 750, in __get_source
self.__sourcesHM[hashf] = importable(func)
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/dill/source.py", line 957, in importable
src = _closuredimport(obj, alias=alias, builtin=builtin)
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/dill/source.py", line 876, in _closuredimport
src = getimport(func, alias=alias, builtin=builtin)
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/dill/source.py", line 764, in getimport
return _getimport(head, tail, alias, verify, builtin)
File "/home/mahmoud/anaconda3/envs/trade_fxcm/lib/python3.6/site-packages/dill/source.py", line 713, in _getimport
try: exec(_str) #XXX: check if == obj? (name collision)
File "<string>", line 1
from __main__'> import sleepy_squared
^
SyntaxError: EOL while scanning string literal
When I run this program from the terminal using the following command python test_connect.py the program works fine. My question is why isn't the program running on the Spyder IDE 4 and how can I make the program run on Spyder IDE 4?

I'm the pathos author. Spyder, Jupyter, and other IDEs add an additional execution layer on top of the interpreter, and in some cases, even wrap the execution in a closure to add additional hooks into the rest of the IDE. You are using a ParallelPool, which uses ppft, which uses dill.source to "serialize" by extracting the source code of an object and it's dependencies. Since the IDE is adding a closure layer, dill.source has to try to serialize that as well, and it's not successful -- so in short it's a compatibility issue between dill.source and Spyder. If you pick one of the other pathos pools, it may succeed. The ProcessPool is essentially the same as the ParallelPool, but serializes by object instead of by source code -- it uses multiprocess, which uses dill. Then there's ThreadPool, which is probably the most likely to succeed, unless Spyder also messes with the main thread -- which most IDEs do. So, what can you do about it? Easy thing is to not run parallel code from the IDE. Essentially, write your code in the IDE, and then swap out the Pool and it should run in parallel. IDEs don't generally play well with parallel computing.

how to save scraped data in db?

I m trying to save scraped data in db but got stuck,
first I have saved scraped data in csv file and using glob library to find newest csv and upload data of that csv into db-
I m not sure what i m doing wrong here plase find code and error
i have created table yahoo_data in db with same column name as that of csv and my code output
import scrapy
from scrapy.http import Request
import MySQLdb
import os
import csv
import glob
class YahooScrapperSpider(scrapy.Spider):
name = 'yahoo_scrapper'
allowed_domains = ['in.news.yahoo.com']
start_urls = ['http://in.news.yahoo.com/']
def parse(self, response):
news_url=response.xpath('//*[#class="Mb(5px)"]/a/#href').extract()
for url in news_url:
absolute_url=response.urljoin(url)
yield Request (absolute_url,callback=self.parse_text)
def parse_text(self,response):
Title=response.xpath('//meta[contains(#name,"twitter:title")]/#content').extract_first()
# response.xpath('//*[#name="twitter:title"]/#content').extract_first(),this also works
Article= response.xpath('//*[#class="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm"]/text()').extract()
yield {'Title':Title,
'Article':Article}
def close(self, reason):
csv_file = max(glob.iglob('*.csv'), key=os.path.getctime)
mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='prasun',
db='books')
cursor = mydb.cursor()
csv_data = csv.reader(csv_file)
row_count = 0
for row in csv_data:
if row_count != 0:
cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', row)
row_count += 1
mydb.commit()
cursor.close()
gettting this error
ana. It should be directed not to disrespect the Sikh community and hurt its sentiments by passing such arbitrary and uncalled for orders," said Badal.', 'The SAD president also "brought it to the notice of the Haryana chief minister that Article 25 of the constitution safeguarded the rights of all citizens to profess and practices the tenets of their faith."', '"Keeping these facts in view I request you to direct the Haryana Public Service Commission to rescind its notification and allow Sikhs as well as candidates belonging to other religions to sport symbols of their faith during all examinations," said Badal. (ANI)']}
2019-04-01 16:49:41 [scrapy.core.engine] INFO: Closing spider (finished)
2019-04-01 16:49:41 [scrapy.extensions.feedexport] INFO: Stored csv feed (25 items) in: items.csv
2019-04-01 16:49:41 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method YahooScrapperSpider.close of <YahooScrapperSpider 'yahoo_scrapper' at 0x2c60f07bac8>>
Traceback (most recent call last):
File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\MySQLdb\cursors.py", line 201, in execute
query = query % args
TypeError: not enough arguments for format string
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\twisted\internet\defer.py", line 151, in maybeDeferred
result = f(*args, **kw)
File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "C:\Users\prasun.j\Desktop\scrapping\scrapping\spiders\yahoo_scrapper.py", line 44, in close
cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', row)
File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\MySQLdb\cursors.py", line 203, in execute
raise ProgrammingError(str(m))
MySQLdb._exceptions.ProgrammingError: not enough arguments for format string
2019-04-01 16:49:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 7985,
'downloader/request_count': 27,
'downloader/request_method_count/GET': 27,
'downloader/response_bytes': 2148049,
'downloader/response_count': 27,
'downloader/response_status_count/200': 26,
'downloader/response_status_count/301': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2019, 4, 1, 11, 19, 41, 350717),
'item_scraped_count': 25,
'log_count/DEBUG': 53,
'log_count/ERROR': 1,
'log_count/INFO': 8,
'request_depth_max': 1,
'response_received_count': 26,
'scheduler/dequeued': 27,
'scheduler/dequeued/memory': 27,
'scheduler/enqueued': 27,
'scheduler/enqueued/memory': 27,
'start_time': datetime.datetime(2019, 4, 1, 11, 19, 36, 743594)}
2019-04-01 16:49:41 [scrapy.core.engine] INFO: Spider closed (finished)

This error
MySQLdb._exceptions.ProgrammingError: not enough arguments for format string
seems motivated by the lack of a sufficient number of arguments in the row you passed.
You can try to print the row, to understand what is going wrong.
Anyway, if you want to save scraped data to DB, I suggest to write a simple item pipeline, which exports data to DB, without passing through CSV.
For further information abuot item pipelines, see http://doc.scrapy.org/en/latest/topics/item-pipeline.html#topics-item-pipeline
You can found a useful example at Writing items to a MySQL database in Scrapy

seems like you are passing list to the parameters that need to be mentioned by the comma
try to add asterix to 'row' var:
cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', row)
to:
cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', *row)

MPI_Send(100): Invalid rank has value 1 but must be nonnegative and less than 1

I am learning MPI in python by myself. I just started from the basic documentation of MPI4py. I started with this code:
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
data = {'a': 7, 'b': 3.14}
comm.send(data, dest=1, tag=11)
elif rank == 1:
data = comm.recv(source=0, tag=11)
When I ran this program, I got following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "MPI/Comm.pyx", line 1175, in mpi4py.MPI.Comm.send (src/mpi4py.MPI.c:106424)
File "MPI/msgpickle.pxi", line 211, in mpi4py.MPI.PyMPI_send (src/mpi4py.MPI.c:42120)
mpi4py.MPI.Exception: Invalid rank, error stack:
MPI_Send(174): MPI_Send(buf=0x10e137554, count=25, MPI_BYTE, dest=1, tag=11, MPI_COMM_WORLD) failed
MPI_Send(100): Invalid rank has value 1 but must be nonnegative and less than 1
I didn't find any working solution for this problem. I am using Mac OS X El Capitan.
Thanks in Advance!

The program complains that 1 is not a valid rank for MPI_Send(): it means that your program is running on a single process.
Are you running it by using python main.py ? Try to use mpirun -np 2 python main.py, where 2 is the number of processes. The latter is the usual way to run mpi programs.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why does Pycharm start a test when I name a function test_ - python-3.x

Related

Scrapy using files.middleware downloads given file without extension

Persistent ModuleNotFoundError: No module named after setting path to project on Windows OS

Unable to run pathos program from spyder IDE

how to save scraped data in db?

MPI_Send(100): Invalid rank has value 1 but must be nonnegative and less than 1

Categories

Resources