I'm currently using ABCPDF to transform HTML into PDF. We have a proxy and since ABCPDF internally calls MSIE under the AppPools user account, NetworkService in my case, the proxy has not been set.
I have tried to use Microsofts BITSAdmin tool to set the proxy for NetworkService (bitsadmin /util /setieproxy NetworkService MANUAL_PROXY PROXY_NAME null) However its still timing out.
Anyone have any idea on how to get around this?
With AbcPdf 8+ Its much easier to just change the render engine to Geko and then it dosn't need MSIE and the output seams to much more accurate.
var theDoc = new Doc();
theDoc.HtmlOptions.Engine = EngineType.Gecko;
Related
I have a simple Intranet Website that is just a few HTML pages with a little JavaScript and CSS.
If Allow Anonymous is ON, everyone can see it. It works.
In IIS, I turn on Basic Authentication and it only partially works as expected.
The company only allows IE and Edge installed on Windows 10 PCs for now.
Specific users have been added to that server running IIS.
In IE when users go to the website now, they are prompted for their username and password. Then the website loads.
However, in Edge, the users are never prompted for the their username and password. A 401 errors loads instead.
I have already tried putting the username and password in the URL like so: https://username:password#URL but that did not work.
I want the same or similar behavior that works in IE for Edge.
I assume you're using Edge Chromium browser, correct me if I'm wrong. The issue might be related with this policy: AuthSchemes.
You can visit edge://policy in Edge and check if it has an AuthSchemes policy set. The policy can be used to disable Basic Authentication. If your browser has this policy set, you need to enable 'basic' value in the policy.
I don't have this policy set and I visit the test page https://jigsaw.w3.org/HTTP/Basic/, the Basic Authentication works well in Edge.
You can also refer to this thread and this thread which have similar issues.
We're looking to do some scraping on a specific URL that uses cloudflare. Has anyone experienced issues using Zombie.js/user-agents while trying to crawl cloudflare hosted sites.
Would love some help!
I am trying to interface to an API on a client's site and I am getting a 403 error indeed. The request doesn't even reach my server.
Turning security to "essentially off" did not help. The final solution was to white-list the developer machine's IP.
The error is triggered on a single URL (json serving API) with a Java client with standards compliant libraries.
Solution:
1. try to set a rule to allow direct access for that URL
2. try setting security to weaker and weaker ("essentially off")
3. if both fails: try whitelisting
4. set up an alternate non-cloudflare url (direct.domain.com)
These will of course only work if you can negotiate with the site owners.
Backup solution: use an embedded browser that you can "frame" and "remote control" or a testing framework that does the same through a plugin, and extract the content from there (if you can)
Hope this helps.
You're probably triggering one of our security features by trying to scrape a site on us. The only option, really, would be to ask the site owner to whitelist your IP(s) to override the behavior.
i have a simple XPage and i access it through an reverse proxy.
My problem is now to get the correct URL on server side.
context.getUrl().toString()
and
XSPContext xspContext = new ServletXSPContextFactory().getXSPContext(FacesContext.getCurrentInstance());
XSPUrl xspUrl = xspContext.getUrl();
return xspUrl.toString();
did not work correct.
For example:
URL in the Browser is https://myip/db.nsf
But the SSJS function as well as the Java function returns just http://myip/db.nsf
When i try this without a reverse proxy, everything work fine.
Is there a way to get location.href on server side?
Unless you want to send out links to other places, you don't need the protocol part. If you are on the same browser //someserver/somepage will link to a different server using the currently used protocol. Other than that the proxy probably set a header.
You can use the following code to create the URL manually:
var path = facesContext.getExternalContext().getRequest().getContextPath()
var url = "https://" + path
This will return the path to your nsf-file with the https prefix
Hmm... this may be an administrative setting: with an internet site document you can additionally create a website rule (type = substitution) to automatically compute the whole URL by the incoming pattern. Have a look at the IBM Domino administration help on how to setup a site document as well as a web site rule.
The goal is to get both URLs to have the same scheme so that XSP computation will result in correct values dynamically.
I believe what you want is to set the $WSIS header from the reverse proxy to Domino to True. Much like the other WebSphere connector headers, this should cause Domino to think that the incoming protocol is HTTPS in all situations. Note that this also has the unfortunate side effect of causing Domino to revert to its behavior of only using one Site document per IP; if you've been taking advantage of the reverse proxy to avoid this bug, you will have to find another route, such as looking for an X-SSL header from the proxy.
We have been using ABCpdf for years now. In fact we are on version 6.1 still. It has just always worked. But we have recently upgraded to Windows 2008 x64 / IIS 7.5.
Our code that converts HTML pages (Invoices) to PDF now does not work. The basics are that there is a QueryString based URL that renders the Invoice in HTML, this allows us to "preview" it, then to send it to the client we use ASP .NET to execute the ABCpdf code (calling that same URL from the server to the server). This time the output is PDF, and that's what is attached to an email and sent off to the client.
Pretty simple and straight forward stuff right?
This is what we noticed about ABCpdf:
1) PdfObj.AddImageUrl("http://localhost/..."); // Localhost does not work.
2) PdfObj.AddImageUrl("http://127.0.0.1/..."); // Local IP does not work.
3) PdfObj.AddImageUrl("http://41.XX.XX.XX/..."); // Live IP does not work.
Now this:
4) PdfObj.AddImageUrl("http://www.google.com/"); // Works perfectly!
So we know the code and everything about it technically can and does work.
But it seems that any time the AddImageUrl() function calls a location that points to itself, the page does not render and we get "Unable to render HTML. Page load timed out. Unable to load page."
I know it's not to do with the timeout because if I use Fiddler (on the server) to execute the exact same code, it works perfectly.
I suspect this is to do with permissions... what what permissions? I read this: "... this is because ABCpdf uses the Microsoft MSHTML component" but how do I set the permissions on this component. I have already turned off "IE ESC".
What am I missing?
So it turned out after fiddling with just about every setting, that it came down to the fact that IIS did not allow URL calls from w3wp.exe to the same "site" within the same IIS.
There is more on that here: http://support.microsoft.com/kb/316451
It wasn't the "MSXML2.ServerXMLHTTP.3.0" requests, these seems to work - and why it was so confusing. But in ABC PDF, there is obviously something similar, and so IIS was blocking it... in fact the entire "site" locked up while it was failing.
In the end all it took was to make a clone of the main site ("site2"), and changing the URL that was parsed to ABC pdf to use the clone site.
How is it possible to get browser IP with Watir? I'm using proxy and I want to verify if it's working correctly.
Perhaps there is some other way if proxy is working?
Here's my current code:
profile = Selenium::WebDriver::Firefox::Profile.new
profile.proxy = Selenium::WebDriver::Proxy.new :http => 'my.proxy.com', :ssl => 'my.proxy.com'
browser = Watir::Browser.new :firefox, :profile => profile
browser.goto 'http://someurl.com'
The browser will open the url, although the proxy is not working.
Thanks for help
This is really not a pretty way of getting around this but you could use the following to get the ip.
browser.goto("http://www.whatsmyip.org/")
ip = browser.span(:id, "ip").text
As i said it is really not an ideal way but i am not sure if watir has access to the ip you are on.
Note that if you use the site above, please respect the author's wishes and do not generate a high volume of requests against the site. If you look at the source there, you will find this comment:
Please DO NOT program a bot to use this site to grab your IPs. It
kills my server and thats not nice. Just get some cheap or free web
hosting and make your own IP-only page to power your bot. Then you
won't even have to parse any html, just load the IP directly - better
for everyone!!
As good citizens of the net we need to respect that. I doubt he would be upset by a few hits a day, but if your scripts are doing this a lot, make your own reflector page to report your IP back to you.
You don't need Watir to go through a proxy to get the IP. You can use net/http, which has less overhead and is easier. BTW, I used whatsmyip.com here but I do not believe that it's so reliable. there are others including http://whatismyipaddress.com, http://show-ip.net, http://ipchicken.com, http://www.ipaddresslocation.org, http://www.myipaddress.com/show-my-ip-address/, http://www.lawrencegoetz.com/programs/ipinfo/, http://www.find-ip-address.org.
require 'net/http'
uri = URI("http://automation.whatismyip.com/n09230945.asp")
Net::HTTP::Proxy(proxyhost, proxyport, proxyuser, proxypassword).start(uri.host) do |http|
req = Net::HTTP::Get.new(uri.path=='' ? '/' : uri.path)
#ip = http.request(req).body.scan /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/
end