Server-side node.js templating and merging for text files

Server-side node.js templating and merging for text files - node.js

I am trying to see if there is a library that will allow me to do merging of a JSON object with a template (txt file) on the server side. Ideally, I would like it to be able to handle some conditional statements (e.g. if, greater than, equals etc.) and looping (e.g. for).
I know there are binding libraries (e.g. angularjs), and one option might be to hack it to extract the code required to do this. Alternatively, I could create my own solution, but would rather not re-invent the wheel.
I am new to Nodejs, so I'm thinking this seems like a problem that might have been solved already.
Any ideas?

All good.
Ended up going with doT.js. Great library for what I'm doing.

Related

Suitescript - 1 big script file, or multiple smaller files

From a performance/maintenance point of view, is it better to write my custom modules with netsuite all as one big JS, or multiple segmented script files.

If you compare it with a server side javascript language, say - Node.js the most popular, every module is written into separate file.
I generally take the approach of Object oriented javascript and put each class in a separate file which helps to organise the code.
One of the approach you can take is in development keep separate files and finally merge all files using js minifier tool like Google closure compiler when you deploy your code for production usage which can give you best of both worlds, if you are really bothered about every nano/mini seconds of performance.
If you see SuiteScript 2.0 architecture, it encourages module architecture which is easier to manage as load only those modules that you need, and it is easier to maintain multiple code files i.e. one per module considering future enhancements, bug fixes and code reuse.

Performance can never be judge by the line count of your module. We generally maintain modules for maintaining the readability and simplicity of the code. It is a good practice to put all generic functionalities in to an Utility script and use it as a library across all the modules. Again it depends on your code logic and programming style. So if you want to create multiple segments of your js file for more readability I dont think its a bad idea.

Custom log processing/parsing

I have such log format:
[26830431.7966868][4][0.013590574264526367][30398][api][1374829886.320353][init]
GET /foo
{"controller"=>"foo", "action"=>"index"}
[26830431.7966868][666][2.1876697540283203][30398][api][1374829888.4944339][request_end]
200 OK
The entry is constracted using such pattern:
[request_id][user_id][time_from_request_started][process_id][app][timestamp][tagline]
payload
Durring request I have many point where I log something - app basically has complex behaviour. This helps me debug a lot the user behaviour.
The way I would like to parse it is that I would like to make have directory structure like this:
req_id
|
|----[time_from_request_started][process_id][timestamp][tagline]
|
etc
Basically each directory will have name based on req_id, with files wchich names are rest of tagline. These files will include payload.
And also I will have other directory, with users ids, which will contain symlinks to request done by this user.
First question: Is this structure correct? In my opinion it will make easy fast log access. The reason I want to use directories and files is that I like unix approach, and try it (feel by myself its drawbacks and advantages)
Second question: I will have no problem to use ruby for creating this. But I would like to learn some new tool, which is better suited for this. I am thinking about using just unix tools (pipe, awk etc) to achieve this, or write parser in golang which I am learning right now (even have time to implement simple map reduce). What tool is best suited for this?

I would not store logs in a directory to see how the users behave.
Depending on what behaviour you want to keep track of you could use different tools. One of these could be mixpanel or keen.io.
Instead of logging what the user did in a log file you would sent an event to either of those (they are pretty similar, pick the one you think has better docs / lib), then you would graph those events to better understand the behaviour of your users. I've done this a lot recently, to display data in a nice way I've used rickshaw.
The key point why I'm suggesting this is that if you go the file route you will still have to find a way to understand your data, something that graphs will help you a lot at. Also, visualization is something keen.io does by default, you may still want to do your graphs but it's a good start.
Hope this helped.

Is this structure correct?
Only you can know that, it depends directly on how the data needs be accessed and used.
What tool is best suited for this?
You could probably use UNIX tools to achieve this but it may as well be a good exercise to practice your Go skills by writing this. It would also be more extensible.

Best method to screen-scrape data off of many different websites

I'm looking to scrape public data off of many different local government websites. This data is not provided in any standard format (XML, RSS, etc.) and must be scraped from the HTML. I need to scrape this data and store it in a database for future reference. Ideally the scraping routine would run on a recurring basis and only store the new records in the database. There should be a way for me to detect the new records from the old easily on each of these websites.
My big question is: What's the best method to accomplish this? I've heard some use YQL. I also know that some programming languages make parsing HTML data easier as well. I'm a developer with knowledge in a few different languages and want to make sure I choose the proper language and method to develop this so it's easy to maintain. As the websites change in the future the scraping routines/code/logic will need to be updated so it's important that this will be fairly easy.
Any suggestions?

I would use Perl with modules WWW::Mechanize (web automation) and HTML::TokeParser (HTML parsing).
Otherwise, I would use Python with the Mechanize module (web automation) and the BeautifulSoup module (HTML parsing).

I agree with David about perl and python. Ruby also has mechanize and is excellent for scraping. The only one I would stay away from is php due to it's lack of scraping libraries and clumsy regex functions. As far as YQL goes, it's good for some things but for scraping it really just adds an extra layer of things that can go wrong (in my opinion).

Well, I would use my own scraping library or the corresponding command line tool.
It can use templates which can scrape most web pages without any actual programming, normalize similar data from different sites to a canonical format and validate that none of the pages has changed its layout...
The command line tool doesn't support databases through, there you would need to program something...
(on the other hand Webharvest says it supports databases, but it has no templates)

Is there a typical config or property file format and library in Haskell?

I need a set of key-value pairs for configuration read in from a file. I tried using show on a Data.Map and it doesn't look at all like what I want. It seems this is something many others might have already done so I'm wondering if there is a standard way to do it and what library to use.

Go to hackage.
Click on "packages"
Search for "config".
Notice ConfigFile(TH), EEConfig, and tconfig.
Read the Haddock documentation
Select a couple and implement your task.
Blog about your findings so the rest of us can learn from your new found expertise (thanks!).
EDIT:
I've recently used configurator - which was easy enough. I suggest you try that one!
(Yes, yes. If I took my own advice I would have made a blog for you all)

The configuration category on Hackage should list all relevant libraries:
http://hackage.haskell.org/packages/#cat:Configuration
I have researched the topic myself now, and my conclusion is:
configurator is very good, but it's currently only for user-edited configurations. The application only reads the configuration and cannot modify it. So it's more for server-side applications.
tconfig has a a simple API and looked like it was what I wanted, maybe a bit raw, until I realized it's unmaintained and that some commits which are really important to use the app are applied on github but the hackage package was not updated
Other solutions didn't look like they'd work for me, I didn't like the API, but every application (and tastes) are different.
I think using JSON for instance is not a good solution because at least with Aeson when you add new settings in a new release, the old JSON without the new member from the previous version won't load. Also, i find that solution a bit verbose.
The conclusion of my research is that I wrote my own library, app-settings, which aims to be key-value, read-write, with a as succint and type-safe API as possible. And you'll find it also in the hackage links for the configurations category that I gave.
So to summarize, I think configurator is the standard for read-only configurations (and it's very powerful too, you can split the configuration file with imports for instance). For read-write there are many small libraries, some unmaintained, and no real standard I think.
UPDATE 2018 be sure to look at dhall

I'd also suggest just using Text.JSON or one of the yaml libraries available (I prefer JSON myself, but...).

The configfile package looks like what you want.

Are there any GTD apps that sync with any of the common bug tracking apps?

I'm trying to decide on a GTD app. Does anyone know of one that automatically syncs with Trac or, better yet, FogBugz?
My suspicion is that none does. Which leaves me with writing a script that does it for me.
Things stores its data in XML, but the contents of the tags are all binary, which makes writing a script nigh impossible.
OmniFocus stores its data in XML, and the contents are literal text. Plugin or script is possible.
The Hit List stores its data in a sqlite3 database. Possibly easier than XML, but I'm not sure yet. The downside is that THL doesn't support recurring tasks, which makes it less useful as a GTD app.
Has anyone tried this? Have I missed an obvious app?

ThinkingRock - Java application, XML data format with plain text, supports recurring tasks. No automatic integration built yet that I know of, but another possible option to script for.

Tomboy has some level of Bugzilla integration but nothing complex. Alternatively it would be fairly trivial to sync something plaintext based such as Vimoutliner (IMO: possibly the best GTD application ever) or Taskpaper.
Probably in terms of easiness it would go:
plaintext > XML > Database > Binary format X
You could just use wget and/or a simple perl script to download the tasks then run a few regular expressions to get it formatted correctly e.g.
<li> ... </li> -> [ ] ...
or in code:
s!<li>(.*)</li>![ ] $1!g

Why not use the task features of the bug tracking systems you're looking at as your GTD tool? Also have you looked at (task coach)[http://en.wikipedia.org/wiki/Task_Coach] It stores all its info in XML.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string