I'm currently trying to pass a proxy to a splash instance running on docker Desktop launched from WSL.
I start tor using sudo service tor start.
To make sure my WSL tor service is communicating with Windows, I passed it as proxy to Firefox with the following Parameters:
IP: 127.0.0.1
Port: 9050
Proxy Type: SOCKS5
Then I go to https://check.torproject.org/ and tadaa it works.
I run my container using the following command:
sudo docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash --disable-browser-caches
The easiest way I found to test it was to go to localhost:8050 and type in the following lines:
splash:on_request(function(request)
request:set_proxy{
host = "127.0.0.1",
port = 9050,
username = "",
password = "",
type = "SOCKS5"
}
end
)
assert(splash:go(args.url))
assert(splash:wait(0.5))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
I query https://check.torproject.org/ , and I get error 99.
Am I missing something important here?
Have you looked at proxy profiles? They look to be the preferred way to attach proxies with a docker Splash container.
Related
I have a VPS Linux (Ubuntu) server with specific IP and I would like to run my shiny app on my own domain http://my_domain.com. Therefore, I have built a docker container and I have run my app on the Ubuntu terminal:
sudo docker run -rm -p 4096:4096 my_app
It works well on the localhost. Then, I have modified the following lines in my Dockerfile:
EXPOSE 80
CMD ["R", "-e", "shiny::runApp('/srv/shiny-server/my_app/app', host = 'server_ip', port = 80)"]
I have built and run my docker container again and I have got the following error:
sudo docker run --rm -p 80:80 my_app
Listening on http://server_ip:80
createTcpServer: address not available
Error in initialize(...) : Failed to create server
Maybe I need to configure nginx. I would be grateful if someone could tell me what would be the best solution to run my app on my own domain and how to do it.
Docker container does not have access to server_ip, try this :
EXPOSE 80
CMD ["R", "-e", "shiny::runApp('/srv/shiny-server/my_app/app', host = '0.0.0.0', port = 80)"]
I can't figure out how to use puppeteer with browserless and proxy. I keep getting proxy connection errors.
I run browserless in docker like so:
docker run -p 3000:3000 -e "MAX_CONCURRENT_SESSIONS=5" -e "MAX_QUEUE_LENGTH=0" -e "PREBOOT_CHROME=true" -e "CONNECTION_TIMEOUT=300000" --restart always browserless/chrome
Puppeteer options in config I tried to connect with:
const args = [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-infobars',
'--window-position=0,0',
'--ignore-certifcate-errors',
'--window-size=1400,900',
'--ignore-certifcate-errors-spki-list',
];
const options = {
args,
headless: true,
ignoreHTTPSErrors: true,
defaultViewport: null,
browserWSEndpoint: `ws://localhost:3000?--proxy-server=socks5://127.0.0.1:9055`,
}
How I connect:
const browser = await puppeteer.connect(config.options);
const page = await browser.newPage();
await page.goto('http://example.com', { waitUntil: 'networkidle0' }
Error I get:
Error: net::ERR_PROXY_CONNECTION_FAILED at http://example.com
at navigate (C:\...\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:115:23)
at processTicksAndRejections (internal/process/task_queues.js:94:5)
at async FrameManager.navigateFrame (C:\...\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:90:21)
at async Frame.goto (C:\...\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:417:16)
at async Page.goto (C:\...\node_modules\puppeteer\lib\cjs\puppeteer\common\Page.js:825:16)
The proxy I'm using in example above is TOR browser, that runs in the background. I can connect through it when I'm not using browserless and use puppeteer.launch() function. I put this proxy in args and everything works fine, the requests are going through tor proxy. I can't figure out know why it doesn't work with browserless and websockets though.
Of course I tried different proxies. I created local proxy in node similar to that How to create a simple http proxy in node.js? (the proxy-server option is then --proxy-server=http://127.0.0.1:3001), but the error is the same and I can't even see incoming requests in server's terminal, it looks like they don't even reach a proxy.
I tried public proxies addresses, same error.
Chaninng website I'm trying to connect to in page.goto() function doesn't change anything, still get the same error.
I'm beginner at web scraping and run out of options here. Any idea would be helpful.
In order to fix the issue specifically with tor, you need to make sure that the torrc file has 0.0.0.0:9050 open so that you can use it on any network ip otherwise it will only work with localhost. Once you set that, you can pass socks5://172.17.0.1:9050 to your browserless docker container and it can access the tor proxy form the host system. Keep in mind that the docker0 ip may be different, run ip addr show docker0 to find the ip address of the host to use the right one when passing it as the proxy.
Ok, it looks like some docker issue. Apparently, there are problems when I'm trying to connect from from browserless inside container to tor which is on host. I used host.docker.internal instead of localhost in connection string and it worked.
Splash browser does not send anything to through the http proxy. The pages are fetched even when the proxy is not running.
I am using scrapy with splash in python 3 to fetch pages after authentication for a an Angular.js website. The script is able to fetch pages, authenticate, and fetch pages after authentication. However, it does not use the proxy setup at localhost:8090 and wireshark confirms that traffic coming from port 8050 goes to some port in the 50k range.
The setup is
- splash running locally on a docker image (latest) on port 8050
- python 3 running locally on a mac
- Zap proxy running locally on a mac at port 8090
- Web page accessed through VPN
I have tried to specify the proxy host:port through the server using Chrome with a LUA script. Page is fetched without the proxy.
I have tried to specify the proxy in the python script with both Lua and with the api (args={'proxy':'host:port'} and the page is fetched without using the proxy.
I have tried using the proxy-host file and I get status 502.
Proxy set through Lua on Chrome (no error, not proxied):
function main(splash, args)
splash:on_request(function(request)
request:set_proxy{
host = "127.0.0.1",
port = 8090,
username = "",
password = "",
type = "HTTP"
}
end
)
assert(splash:go(args.url))
assert(splash:wait(0.5))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
req = SplashRequest("http://mysite/home", self.log_in,
endpoint='execute', args={'lua_source': script})
Proxy set through api (status 502):
req = SplashRequest("http://mysite/home",
self.log_in, args={'proxy': 'http://127.0.0.1:8090'})
Proxy set through Lua in Python (no error, not proxied):
def start_requests(self):
script = """
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(0.5))
splash:on_request(function(request)
request:set_proxy{
host = "127.0.0.1",
port = 8090,
username = "",
password = "",
type = "HTTP"
}
end
)
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
"""
req = SplashRequest("http://mysite/home", self.log_in,
endpoint='execute', args={'lua_source': script})
# req.meta['proxy'] = 'http://127.0.0.1:8090'
yield req
Proxy set through proxy file in docker image (status 502):
proxy file:
[proxy]
; required
host=127.0.0.1
port=8090
Shell command:
docker run -it -p 8050:8050 -v ~/Documents/proxy-profile:/etc/splash/proxy-profiles scrapinghub/splash --proxy-profiles-path=/etc/splash/proxy-profiles
All of the above should display the page in zap proxy at port 8090.
Some of the above seem to set the proxy, but the proxy can't reach localhost:8090 (status 502). Some don't work at all (no error, not proxied). I think this may be related to fact that a docker image is being used.
I am not looking to use Selenium because that is what this replacing.
All methods returning status 502 are working correctly. The reason for this issue is that docker images cannot access localhost on the host. To resolve this, use http://docker.for.mac.localhost:8090 as the proxy host:port on mac host and use docker run -it --network host scrapinghub/splash for linux with localhost:port. For linux, -p is invalidated since all services on the container will be on localhost.
Method 2 is best for a single proxy without rules. Method 4 is best for multiple proxies with rules.
I did not try other methods to see what they would return with these changes and why.
Alright I have been struggling with the same problem for a while now, but I found the solution for your first method on GitHub, which is based on what the Docker docs state:
The host has a changing IP address (or none if you have no network access). From 18.03 onwards our recommendation is to connect to the special DNS name host.docker.internal, which resolves to the internal IP address used by the host.
The gateway is also reachable as gateway.docker.internal.
Meaning that you should/could use the "host.docker.internal" as host instead for your proxy E.g.
splash:on_request(function (request)
request:set_proxy{
host = "host.docker.internal",
port = 8090
}
end)
Here is the link to the explanation: https://github.com/scrapy-plugins/scrapy-splash/issues/99#issuecomment-386158523
My local environment for API
node -v: v8.9.4
npm -v: 5.6.0
Package
memcached.js: "memcached": "^2.2.2"
We have a Node API in which we are using package memcached.js to connect to Memcache server with below configurations.
MEMCACHED_CONFIG:
{
MAX_VALUE: 1024,
SERVER: "X.X.X.X",
PORT: 11211,
COMPLETE_PATH: "X.X.X.X:11211",
CACHE_TIMEOUT: 3600,
POOL_SIZE: 50,
maxKeySize: 1024,
timeout: 5000
}
So X.X.X.X is remote server IP where our Memcache server is running.
and I am able to connect this X.X.X.X server from my system by using telnet command like c:/> telnet X.X.X.X 11211 and it works.
cacheUtility.js
var MEMCACHED_CONFIG= require('./MEMCACHED_CONFIG');
var Memcached = require('memcached');
Memcached.config.maxValue = MEMCACHED_CONFIG.MAX_VALUE;
Memcached.config.poolSize = MEMCACHED_CONFIG.POOL_SIZE;
Memcached.config.maxKeySize= MEMCACHED_CONFIG.maxKeySize;
Memcached.config.timeout= MEMCACHED_CONFIG.timeout;
var memcached = new Memcached();
memcached.connect(MEMCACHED_CONFIG.COMPLETE_PATH, function( err,
conn ){
if( err ) {
CONFIG.CONSOLE_MESSAGE("Cache Connect Error "+conn.server);
}
});
We are using above code to connect to Memcached Server and as you can see remote server IP is coming from MEMCACHED_CONFIG.
My issue is that it is always trying to connect to 127.0.0.1 server instead of passing Remote Memcached Server. So in order to run it, I have to make changes in the memcached.js file of the core package.
C:\BitBucketProjects\Licensor Server\node_modules\memcached\lib\memcached.js
function Client (args, options) {
var servers = []
, weights = {}
, regular = 'localhost:11211'
//, regular = 'X.X.X.X:11211'
, key;
I don't want to make any change in core package.
Why is it not connecting to the given server?
When you have memcached server setup on a different machine than the server using it then always mention the server IP and options otherwise it defaults to localhost. You can see that if you view the "server" property of the client (using NodeJs memcached client version 2.2.2):
var Memcached = require('memcached');
var memcached = new Memcached();
console.log(memcached.server);
Seems to be some issue with the "memcache.connect" method as it does not override the localhost server. To make it work, you have to mention the IP of memcached server in the constructor as mentioned in the documentation:
var Memcached = require('memcached');
var memcached = new Memcached('192.168.10.10:11211');
Now you should be able to connect to the server without an issue if you have the 11211 port opened on the host. If not allowed, you can execute the following command on Memcached host to open the port:
$ sudo ufw allow 11211
To ensure you are able to connect to memcached server use following command:
telnet 192.168.10.10:11211
If even that does not work, your server might have stopped working so you need to start it either as a service or as a process:
Start as a process:
$ memcached -u memcached -d -m 30 -l 192.168.10.10 -p 11211
Start as a service:
$ sudo systemctl start memcached
OR
$ sudo service memcached start
Just for reference for those who might not know, to expose memcached server on the network, you can either specify the IP and port like in the command above or in the memcached configuration file. To provide default configuration, look for "-l 127.0.0.1" in the following file and replace the loopback address with your host server's network IP:
$ sudo nano /etc/default/ufw
Of course, above commands will work only if you have memcached installed on the server, if not installed then run the following to install it first:
$ sudo apt-get install memcached
I hope it helps.
I have found Google provides some guidelines on how to run Nodejs on a custom runtime environment. Everything seems fine and I am managing to start my Nodejs app on local machine running gcloud preview app run ..As I can see, it probably creates a Docker container and runs Nodejs program in there. I am saying "probably", because it is my first experience with Docker, however I am 2+ years experienced Nodejs developer.
So my question is how to debug (with breakpoint stops) my Nodejs program when it is running inside Docker container? Using Chrome Developer Tools or how can I set up Webstorm debug configuration to make it stop on breakpoints. Is it possible to configure Docker on how it starts node or even start Docker via gcloud inside Webstorm to assure debugging is working? Any help or clarifications are appreciated.
Please don't provide answers on how to debug Nodejs app outside of Docker container – I know how to do that very well.
I'm sorry, but I only know a solution with node-inspector, I hope it can help you:
You can install node-inspector package inside your container: https://github.com/node-inspector/node-inspector
Map the port 8080 of your container on your host (run you container with parameter -p 8080:8080)
Run this inside your container (with docker exec, or docker-enter)
node-debug --web-host 0.0.0.0 yourScript.js
Go to http://localhost:8080/debug?port=5858
There is an easier way, at least from Docker 0.11 or something.
Run, on your development machine only, Docker with --net="host". This makes Docker bind to the localhost directly, and not creating a bridging network adapter, so the Docker machine runs like any other process on your machine and opens the ports it needs on the local interface.
This way, you can connect to your debug port as if Node was not running inside Docker.
More documentation here : https://docs.docker.com/reference/run/
Before Docker 0.11 you have other two ways of debugging, apart from using node-inspector :
Run sshd inside your Docker machine and setup an ssh tunnel, as if you were to debug on a remote machine.
"Mess up" with ip-tables to "revert" the Docker mapping of local ports. There is something about it here Exposing a port on a live Docker container .
By default, the node debugger will listen only for connections for the same host (127.0.0.1). But in Docker, you need to accept connections from any host (0.0.0.0):
# inside Docker
node --inspect=0.0.0.0:9229 myapp.js
Also you have to expose the debug port (9229). Then the application should be automatically detected and listed as a Remote Target in chrome://inspect/#devices in Chrome (tested in Chrome 67).
Example
Here is a minimal example. It runs a simple JavaScript application in Docker and shows how to attach the Chrome debugger to it:
$ cat example.js
setInterval(() => console.log('Hallo world'), 1000);
$ cat Dockerfile
FROM node
COPY example.js /
CMD node --inspect=0.0.0.0:9229 /example.js
Run with:
$ docker build . -t myapp && docker run -p 9229:9229 --rm -it myapp
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM node
---> aa3e171e4e95
Step 2/3 : COPY example.js /
---> Using cache
---> 3ef6c0311da2
Step 3/3 : CMD node --inspect=0.0.0.0:9229 /example.js
---> Using cache
---> e760739c2802
Successfully built e760739c2802
Successfully tagged debug-docker:latest
Debugger listening on ws://0.0.0.0:9229/4177f6cc-85e4-44c6-9ba3-5d8e28e1b124
For help see https://nodejs.org/en/docs/inspector
Hallo world
Hallo world
Hallo world
...
Open Chrome and go to chrome://inspect/#devices. It should soon after the start of the application, detect it and list it.
Troubleshooting
For debugging Docker network issues, docker inspect is useful:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ae83d50e24c8 debug-docker "/bin/sh -c 'node --…" 2 minutes ago Up 2 minutes 0.0.0.0:9229->9229/tcp blissful_sammet
$ docker inspect ae83d50e24c8
...
"NetworkSettings": {
"Bridge": "",
"SandboxID": "682d3ac98b63d4077c5d66a516666b6615327cbea0de8b0a7a2d8caf5995b0ae",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {
"9229/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "9229"
}
]
},
...
If want to see the requests sent between Docker and Chrome, ngrep can help:
$ sudo ngrep -d any port 9229
interface: any
filter: (ip or ip6) and ( port 9229 )
############################
T ::1:38366 -> ::1:9229 [AP]
GET /json/version HTTP/1.1..Host: [::1]:9229....
#####
T ::1:38368 -> ::1:9229 [AP]
GET /json HTTP/1.1..Host: [::1]:9229....
##############
T 172.17.0.1:56782 -> 172.17.0.2:9229 [AP]
GET /json HTTP/1.1..Host: [::1]:9229....
#
T 172.17.0.1:56782 -> 172.17.0.2:9229 [AP]
GET /json HTTP/1.1..Host: [::1]:9229....
###
T 172.17.0.1:56784 -> 172.17.0.2:9229 [AP]
GET /json/version HTTP/1.1..Host: [::1]:9229....
#
T 172.17.0.1:56784 -> 172.17.0.2:9229 [AP]
GET /json/version HTTP/1.1..Host: [::1]:9229....
###
T 172.17.0.2:9229 -> 172.17.0.1:56782 [AP]
HTTP/1.0 200 OK..Content-Type: application/json; charset=UTF-8..Cache-Contro
l: no-cache..Content-Length: 465....
#
T 172.17.0.2:9229 -> 172.17.0.1:56782 [AP]
HTTP/1.0 200 OK..Content-Type: application/json; charset=UTF-8..Cache-Contro
l: no-cache..Content-Length: 465....
###
T 172.17.0.2:9229 -> 172.17.0.1:56782 [AP]
[ {. "description": "node.js instance",. "devtoolsFrontendUrl": "chrome-de
vtools://devtools/bundled/inspector.html?experiments=true&v8only=true&ws=[::
1]:9229/f29686f9-e92d-45f4-b7a2-f198ebfc7a8e",. "faviconUrl": "https://node
js.org/static/favicon.ico",. "id": "f29686f9-e92d-45f4-b7a2-f198ebfc7a8e",.
"title": "/example.js",. "type": "node",. "url": "file:///example.js",.
"webSocketDebuggerUrl": "ws://[::1]:9229/f29686f9-e92d-45f4-b7a2-f198ebfc7a
8e".} ]..
#
As far as I can see, you need to provide the parameter --debug-brk= to node upon startup - this will enable debugging. After that, access the specified port on your docker container. You probably have to expose it or tunnel (using ssh).
After that, point the Webstorm remote debugger at the specified port, and you should be set.
If you are using bridge networking for your containers, and you don't want to install node-inspector inside the same container as your node process, I've found this to be a convenient solution:
In the main node.js container, map port 5858 to the host
Run the main node process with debug enabled
Use a separate container for running node-inspector
Use host networking for the node-inspector container
This say, the node-inspector container will connect to localhost:5858 which will then be port mapped through to the main node container.
If you're running this on a public VM, I'd then recommend:
Make sure port 5900 is not exposed publicly (e.g. by the firewall)
Make sure the node inspector port (e.g. 8080) us exposed publicly, so you can connect to it
I wrote a few more details about it here: https://keylocation.sg/our-tech/debugging-nodejs-in-docker-using-node-inspector