IMPORTING XML FROM SERVER using CRON on presta cPanel - cron

Im trying to set up CRON task to import from this url https://www.vapefully.com/pl/feed-b2b/ to my server.
I used this code on Cpanel
wget "https://www.vapefully.com/pl/feed-b2b/" --output-document=vapefully.xml
But in result i got this almost empty file
<?xml version="1.0" encoding="UTF-8"?>
<SHOP> </SHOP>
This is what i get from CRON
> --2019-12-21 12:25:02-- https://www.vapefully.com/pl/feed-b2b/
Resolving www.vapefully.com (www.vapefully.com)... 35.242.195.100
Connecting to www.vapefully.com (www.vapefully.com)|35.242.195.100|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://vapefully.com/pl/feed-b2b/ [following]
--2019-12-21 12:25:02-- https://vapefully.com/pl/feed-b2b/
Resolving vapefully.com (vapefully.com)... 35.242.195.100
Connecting to vapefully.com (vapefully.com)|35.242.195.100|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/xml]
Saving to: ‘vapefully.xml’
0K 6.69M=0s
2019-12-21 12:25:02 (6.69 MB/s) - ‘vapefully.xml’ saved [65]
What am I doing wrong ?

Are you sure that when your cron runned, the xml was filled?
Try to call simply from browser the link, and if the xml is filled with the product, try to run the crone, and compare the result.
(I think that when the cron ran, the xml didn't contain the products, only "shop> SHOP>")

Related

Download multiple file using wget by looping through a text file of IDs

I am trying to download multiple files using wget. I have a text file containing the ID of the files that I want to download (mannifest.tsv, one line for one ID).
Currently, I am using the below command:
while read id; do wget https://target-data.nci.nih.gov/Public/AML/miRNA-seq/L3/expression/BCCA/TARGET-FHCRC/$id.txt; done < manifest.tsv
However, I got the following error:
--2022-08-12 23:43:28-- https://target-data.nci.nih.gov/Public/AML/miRNA-seq/L3/expression/BCCA/TARGET-FHCRC/TARGET-00-BM3897-14A-01R.isoform.quantification%0D.txt
Resolving target-data.nci.nih.gov... 129.43.254.217, 2607:f220:41d:21c1::812b:fed9
Connecting to target-data.nci.nih.gov|129.43.254.217|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-08-12 23:43:30 ERROR 404: Not Found.
Probably because when I loop through manifest.tsv file, the new line character was also read, therefore, the file ID is not correct anymore.
Could someone help me? I really appreciate!

Scrapy how to update image URL if current image url returns 404

I need to change the image link in case if current image URLS return 404 code
I have implemented own pipeline by extending FilesPipeline.
I have supposed the method media_failed will be called it we got 404 code, but it didn't happen.
in the method item_completed I see that results for failed URL contains the following info
<class 'tuple'>: (False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)
in this case I have to update origin image link and retry downloading
I see the following info in logs:
[scrapy.pipelines.files] WARNING: File (code: 404): Error downloading file from <GET https://any_dummy_link.jpg> referred in <None>
Request the image URL. If response.status is 404 you can handle it differently.

LetsEncrypt-ACMESharp http-01 challenge on IIS invalid

On server A (non-IIS) I executed:
Import-Module ACMESharp
Initialize-ACMEVault
New-ACMERegistration -Contacts mailto:somebody#derryloran.com -AcceptTos
New-ACMEIdentifier -Dns www.derryloran.com -Alias dns1
Complete-ACMEChallenge dns1 -ChallengeType http-01 -Handler manual
Response back asked:
* Handle Time: [08/05/2017 22:46:27]
* Challenge Token: [BkqO-eYZ5sjgl9Uf3XpM5_s6e5OEgCj9FimuyPACOhI]
To complete this Challenge please create a new file
under the server that is responding to the hostname
and path given with the following characteristics:
* HTTP URL: [http://www.derryloran.com/.well-known/acme-challenge/BkqO-eYZ5sjgl9Uf3XpM5_s6e5OEgCj9FimuyPACOhI]
* File Path: [.well-known/acme-challenge/BkqO-eYZ5sjgl9Uf3XpM5_s6e5OEgCj9FimuyPACOhI]
* File Content: [BkqO-eYZ5sjgl9Uf3XpM5_s6e5OEgCj9FimuyPACOhI.X-01XUeWTE-LgpxWF4D-W_ZvEfu6ue2fAd7DJNhomQM]
* MIME Type: [text/plain]
Server B is serving www.derryloran.com a page at http://www.derryloran.com/.well-known/acme-challenge/BkqO-eYZ5sjgl9Uf3XpM5_s6e5OEgCj9FimuyPACOhI correctly I believe but when I then, back on Server A execute:
Submit-ACMEChallenge dns1 -ChallengeType http-01
(Update-ACMEIdentifier dns1 -ChallengeType http-01).Challenges | Where-Object {$_.Type -eq "http-01"}
...but the status goes invalid after a few seconds. FWIW I've tried this several times always with same result. Why? What am I doing wrong?
I appreciate there's a lot more to go once I've got the certificate but the site is being served in a docker container hence the Server A/B complexities...
Omg, how many times?!? The file had a BOM when created in VS. Recreating using Notepad++ and saving as UTF-8 (without BOM) and I'm getting a valid response now.

GitHub Repository Redirect and trying to find it through the GitHub Search API

I have a link for a GitHub repository and I'm using github3 with Python in order to try and search for it.
Take this link for example:
https://github.com/GabrielGrimberg/OOP-Assignment1-UI
If you go to it, you will see that it redirects to
https://github.com/GabrielGrimberg/RuneScape-UI
And thus, I can't figure out how to construct a search query that will find this specific repo.
I've tried:
GabrielGrimberg/OOP-Assignment1-UI in:url
GabrielGrimberg/OOP-Assignment1-UI
GabrielGrimberg/OOP-Assignment1-UI in:full_name
According to Github blog if a repo is renamed the old address is redirected to new address!
We're happy to announce that starting today, we'll automatically redirect all requests for previous repository locations to their new home in these circumstances. There's nothing special you have to do. Just rename away and we'll take care of the rest.
Moreover you can check Gabriel Grimberg does not have any repo named "OOP-Assignment1-UI".
Corrected answer:
If we can first check repo details to make sure it exists/where it has moved!
Check out the following query:
curl -i https://github.com/GabrielGrimberg/OOP-Assignment1-UI
You can get the url where it moved from the header
HTTP/1.1 301 Moved Permanently
Server: GitHub.com
Date: Sun, 12 Feb 2017 18:19:25 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Status: 301 Moved Permanently
Cache-Control: no-cache
Vary: X-PJAX
Location: https://github.com/GabrielGrimberg/RuneScape-UI
X-UA-Compatible: IE=Edge,chrome=1
If the repo already existed there it would have given you the content instead of the header!
For example , try this:
curl -i https://github.com/GabrielGrimberg/RuneScape-UI
Basically you need to make a request yourself and check for the redirection if the first search provided no result.
def get_redirection(full_name):
try:
json_object = json.loads(urllib.request.urlopen('https://api.github.com/repos/{0}'.format(full_name)).read().decode('utf-8'))
except urllib.error.HTTPError:
return None
return json_object["full_name"] # Will return the new full-name of the project

.css, .js and .png files not found when opening Tracking URI in YARN

I can view a list of running jobs on YARN at this URI:
https://server1.company.com:8443/gateway/yarnui/yarn/apps/RUNNING
Further I can access job specific information by opening the TrackingUI:
https://server1.company.com:8443/gateway/yarnui/yarn/proxy/application_1481927689976_0178
However, when I do this, I only get the HTML document, none of the other required .js, .css and .png files :
GET https://server.company.com:8443/gateway/yarnui/yarn/proxy/application_1481927689976_0178
200 OK (text/html)
GET https://server.company com:8443/proxy/application_1481927689976_0178/static/bootstrap.min.css
404 Not Found (text/html)
If I go directly to the server on which the job is running :
http://server2.company.com:8088/proxy/application_1481927689976_0178
Everything works fine:
GET http://server2.company.com:8088/proxy/application_1481927689976_0178
200 OK (text/html)
GET http://server2.company:8088/proxy/application_1481927689976_0178/static/bootstrap.min.css
200 OK (text/css)
Sounds like a YARN config issue – but I’ve set the yarn.resourcemanager.webapp.address to the correct value:
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>server2.company.com:8088</value>
</property>
Any ideas why I can’t access these files?
IBM's support fix addresses this exact issue:
http://www-01.ibm.com/support/docview.wss?uid=swg21980169

Resources