Can anyone recommend a library who really support RTL languages such as Hebrew or arabic, and also Supports links, support images, thumbnails/in-place images, Handle large spread-sheets, set filters and page breaks and good performance – very important criteria-
I started to use PDFKit who is really good, but the language support is awful and the data that we work with cannot be manipulated, because of that we need to change the library, the price is not a problem. Someone recommended Docmosis, someone has experience with it?
Thank you very much
Docmosis has customers that have been generating RTL documents for successfully for years. Docmosis supports the image handling requirements you mention and large data sets, but has no direct spreadsheet functionality.
I suggest doing a free trial (please note I work for Docmosis) and asking their support team about specific points of interest. The cloud is the easiest way to test functionality (since you have no install or setup to do) even if you wouldn't choose to use cloud for your actual deployment.
I am trying to create shipping labels for a lot of different customers by filling forms on ups website. Is there a programmatic way of doing this?
It is different from the usual auto-fill web form. Because the name, address, etc. fields aren't filled with "constants". 100 customers needs 100 different forms.
Before I dig into python-mechanize, or autoit IE.au3, is there an easier way doing this?
UPDATE 2019-09-09: Generally, would no longer recommend FF.au3 unless you're very much into AutoIt or have tons of legacy code. I'd rather suggest using Selenium with a "proper" programming language.
You could check out FF.au3 for AutoIt. Together with FireFox and MozRepl it allows for web automation, including dynamic websites/forms.
The feature-set should be sufficient for your task (eg. XPath for content extraction and for filling out forms, but just have a look at the link and you'll get an idea of what it can do). It's also fairly easy to use.
The downside is that it's not the most performant approach and I've encountered a bug, but that doesn't say much. Overall it did work well for me for small or medium-ish projects.
Setup:
Install AutoIt: https://www.autoitscript.com/site/autoit-tools/
Get the FF.au3 lib: https://www.thorsten-willert.de/index.php/software/autoit/ff/ff-au3
Get an old Firefox version <v57 or ESR (see remarks on ff.au3 page above)
Install MozRepl: http://legacycollector.org/firefox-addons/264678/index.html
I have an email account whose sole purpose it is to store interesting and useful links to programming articles, code, and blog posts. It has become a little knowledgebase of sorts. I can even do a search on it, which is pretty cool.
However, after using this account for a couple of years, I now have 775 links, and it has become this big blob of amorphous information, most of which I have never looked at again. I take comfort in the fact that, if I really needed to, I could find something in there again, if I even remember putting it in there in the first place. But it has developed a "smell," if you will.
How are you organizing your programming library of cool stuff? Do you have a system or tool, and is it better than the way I'm doing it?
I would use something that is made for storing bookmarks. I use delicious.com for all of my bookmarks. The tagging system works perfectly for technology sites because you can tag each page with a specific language or tech abbreviation. This coupled with the Delicious Bookmarks plugin will make it very easy to tag sites and get back to them.
Use one word or abbreviations for languages: java c# vb.net python
Use acronyms for technologies: wpf wcf
I used to use the standard bookmark system in the browser but since I bounce around through various machines and browsers throughout the day I started to use bookmark synchronizers. Both Foxmarks and the one that google came out with. But neither I was completely satisfied with. Plus delicious has a great web interface to it as a decent api to extend for your own purpose.
IMHO, using Evernote to store this information is great.
1) you can go back and search through it easily
2) organize by tags and "notebooks collections"
3) available on multiple platforms (even mobile)
4) available as browser plugins (for direct archiving in-browser)
The only drawback is it's copy-paste functionality is a little lacking (it sometimes doesn't import/display the CSS styles correctly).
Otherwise, it's a great alternative to store web "bookmarks" (and also archive the content at the same time).
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Hi I want to create a desktop app (c# prob) that scrapes or manipulates a form on a 3rd party web page. Basically I enter my data in the form in the desktop app, it goes away to the 3rd party website and, using the script or whatever in the background, enters my data there (incl my login) and clicks the submit button for me.I just want to avoid loading up the browser!
Not having done much (any!) work in this area I was wondering would a scripting language like perl, python, ruby etc allow me to do such? Or simply do it all the scraping using c# and .net? Which one is best IYO?
I was thinking script as may need to hook into the same script something from applications on different platforms (eg symbian mobile where I wouldnt be able to develop it in c# as I would the desktop version).
Its not a web app otherwise I may as well use the original site. I realise it all sounds pointless but the automation for this specific form would be a real time saver for me.
Do not forget to look at BeautifulSoup, comes highly recommended.
See, for example, options-for-html-scraping.
If you need to select a programming language for this task, I'd say Python.
A more direct solution to your question, see twill, a simple scripting language for Web browsing.
I use C# for scraping. See the helpful HtmlAgilityPack package.
For parsing pages, I either use XPATH or regular expressions. .NET can also easily handle cookies if you need that.
I've written a small class that wraps all the details of creating a WebRequest, sending it, waiting for a response, saving the cookies, handling network errors and retransmitting, etc. - the end result is that for most situations I can just call "GetRequest\PostRequest" and get an HtmlDocument back.
You could try using the .NET HTML Agility Pack:
http://www.codeplex.com/htmlagilitypack
"This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams)."
C# is more than suitable for your screen scraping needs. .NET's Regex functionality is really nice. However, with such a simple task, you'll be hard to find a language that doesn't do what you want relatively easily. Considering you're already programming in C#, I'd say stick with that.
The built in screen scraping functionality is also top notch.
We use Groovy with NekoHTML. (Also note that you can now run Groovy on Google App Engine.)
Here is some example, runnable code on the Keplar blog:
Better competitive intelligence through scraping with Groovy
IMO Perl's built in regular expression functionality and ability to manipulate text would make it a pretty good contender for screen scraping.
Ruby is pretty great !...
try its hpricot/mechanize
Groovy is very good.
Example:
http://froth-and-java.blogspot.com/2007/06/html-screen-scraping-with-groovy.html
Groovy and HtmlUnit is also a very good match:
http://groovy.codehaus.org/Testing+Web+Applications
Htmlunit will simulate a full browser with Javascript support.
PHP is a good contender due to its good Perl-Compatible Regex support and cURL library.
HTML Agility Pack (c#)
XPath is borked, the way the html is cleaned to make it xml compliant it will drop tags and you have to adjust the expression to get it to work.
simple to use
Mozilla Parser (Java)
Solid XPath support
you have to set enviroment variables before it will work which is a pain
casting between org.dom4j.Node and org.w3c.dom.Node to get different properties is a real pain
dies on non-standard html (0.3 fixes this)
best solution for XPath
problems accessing data on Nodes in a NodeList
use a for(int i=1;i<=list_size;i++) to get around that
Beautiful Soup (Python)
I don't have much experience but here's what I've found
no XPath support
nice interface to pathing html
I prefer Mozilla HTML Parser
Take a look at HP's Web Language (formerly WEBL).
http://en.wikipedia.org/wiki/Web_Language
Or stick with WebClient in C# and some string manipulations.
I second the recommendation for python (or Beautiful Soup). I'm currently in the middle of a small screen-scraping project using python, and python 3's automatic handling of things like cookie authentication (through CookieJar and urllib) are greatly simplifying things. Python supports all of the more advanced features you might need (like regexes), as well as having the benefit of being able to handle projects like this quickly (not too much overhead in dealing with low level stuff). It's also relatively cross-platform.
What are the best open source image gallery engines? Both stand-alone, and for existing frameworks such as Wordpress or Drupal.
Hopefully we can build a good list here over time.
Gallery is the classic choice. It has skins, security layers, heaps of plugins, etc, but can be run with the default settings easily if you want to. I've used it for years.
GOOD QUESTION, lots of people ask this in many web forums so hopefully we will get some good responses to this, and have a good list of solutions.
Personally I always used to say something like Gallery or some other OS script, but recently I have found myself using more and more something like a simple php script which just spits our a list of images (maybe 7 a page) but relying on a Javascript library such as mootools or Ext to provide all the functionality, particularly for small or individual galleries. Im particularly loving the noobslide mootools class at the moment which has some lovely gallery effects.
Noobslide
I suppose at the end of the day its all down to what you need, there will be no one answer that fits all but a number of different solutions will hopefully show up here that will suit different peoples needs.