Leverage Google's spidering infrastructure to build your own niche index? - search

Let's say I want to build a specialised catalog of information that organisations can provide about themselves. We agree a metadata standard, and they include this information on webites.
Is it possible to use Google's infrastructure somehow to solve the problem of discovering sites with that metadata, and regularly re-spidering to pick up any updates?
The way this kind of problem is often solved seems to involve "registering" the site with the central index, who then build infrastructure to regularly visit each registered site. But I wonder if it can be done smarter - and without the need to formally "register".
For example, presumably you could make part of the metadata standard a unique string, which you could then literally Google search for. Then you'd process the rest of the page. But is there a more streamlined, smarter, more formal way to do this?

Related

What vulnerabilities are there in a custom react-based e-commerce website and why should I just use Shopify?

I have been programming a small e-commerce platform to sell jewelry.
Initially I wanted to make it web3 compatible ( accept meta mask payments ) and given that I work as a dev I wanted to take the DIY approach as opposed to platforms like Shopify specifically.
Now that I’m getting closer to finishing the website , I contemplate to myself - should I just switch to using Shopify instead ? My contemplation stems from unknown vulnerabilities that I am anticipate ..
My site uses Stripesnd paypal for payments. I don’t save any other data besides order info and shipping address .
Is there any underlying vulnerabilities that Shopify takes care of that I’m not thinking of ?
It seems simple enough to take payments on a site but I have a feeling I am not thinking about some major implications of not using a platform like Shopify .
On one hand I’d really like to use my own website given all the time I’ve spent making it ( also like my front end design better than any template I’ve seen ) so this post is for people to give me their perspective on both pros and cons so I can decide whether I just neeed to dump my work and start over with Shopify or continue on the way forward with DIY coming out as hero ;)
Thanks In advance fam
It is perfectly possible to make your own website and make it secure enough, somebody made Shopify too after all. :) It is also easily possible to leave vulnerabilies in your code that then get exploited. The problem is that if you don't have a good grasp of what you should have even looked at, it will be quite challenging to actually get it right.
You should be aware of potential code level vulnerabilities, and use secure coding and architecture principles to structure and code your website. OWASP is a great resource that helps with learning about those. Higher level principles include things like least privilege, segregation of duties, defense in depth, minimizing attack surface, secure defaults, failing securely and so on. Actual code level vulnerabilities include things like SQL injection, cross-site scripting (XSS), cross-site request forgery (CSRF), tampering with parameters, session management errors, authentication or authorization errors and so on, there is a lot of these. And your 3rd party libraries that you included can (and will) also have some of these, how will you discover that, and will you have the capacity to keep up with latest versions?
When hosting your own service (even in an IaaS cloud environment like AWS EC2), that brings its own challenges in terms of security too - you need to care about the ops side of security as well. Things like would you even notice if there's an attack? Would you know what to do if a customer called with their money spent on things they didn't buy? Would you have forensic evidence to prove if they are actually lying? :)
You can use tools to scan for some of these vulnerabilities, but that will never be comprehensive - actually, nothing will. Automated tools are very useful, but will miss a lot of things. You can also buy penetration testing services, some of those are really good (and some not), they will find vulnerabilities the same way attackers would - but those are quite expensive.
However, having said all this, the most important thing is to keep your defenses proportionate to the risk. This basically means you don't want to spend more on securing your website than the maximum amount you may lose in case of a compromise. Outsourcing payments to Stripe or Paypal is a great start, because if you have the integration correctly set up, you likely already limited the maximum possible loss quite a bit.
So should you code an ecommerce website yourself, and sell stuff? In the light of the above, it's very opinionated, but I think why not, just consider the above, manage your risks, learn about potential vulnerabilities, mitigate them the best you can, and prepare for things to go wrong. In the end, it's probably cheaper to just use a ready-made service, though a lot less fun. :)
I would say, you should... and you can use any SaaS eCommerce platform: Shopify or BigCommerce or Snipkart without giving up your DIY custom features, because those platforms can be used as a headless eCommerce platform.
This way you don't have risks around managing backend and data (platform will do this, and not loosing the custom features and fine-tuned customer experience you implemented yourself)

Should I be using SharePoint sub-sites or metadata?

We're starting to use SharePoint 2013 to manage our department's process documentation and I have some questions about best practices for site structure. I'm a little surprised I can't find the answers via web search, since this seems like a basic question every new SharePoint user must deal with.
Moving from a file share environment, I'm trying hard to get out of that mindset and I understand the many benefits of SharePoint over file shares. I also understand why creating folders in SharePoint forces arbitrary divisions on files whereas one large set of documents with metadata lets you filter and group the files based on different needs.
What's confusing me is that I also read that it's better to have too many sub-sites than not enough. It seems like sub-sites can easily become pseudo-folders and I'm not sure where that line is crossed.
Here's an example.
We have a SharePoint site devoted to our department. We've create a sub-site dedicated to an application we developed to load data into our business systems. It mainly holds technical documentation about the application. This application supports many different data sources, each with its own set of user instructions for loading, its own schedule (calendar), contact lists, supporting files, etc. There's no compelling reason to separate them to restrict access. However, there doesn't seem like a lot of value in having them all in the same sub-site, either, since someone working on a job will only want to see the docs and supporting files for that data source. I just can't foresee someone wanting to view supporting files across all data sources, although, I could see someone wanting to see the schedule for all data sources combined.
My question is, should I create separate sub-sites under the application for each data source or do I just store everything in the application sub-site and use metadata and views to group things by data source? Putting all the items for a specific data source into its own sub-site seems to be much simpler to manage and present than having to specify metadata for every new file and creating a lot of views. However, I can't shake the feeling that I'm still using file share thinking. Or maybe I'm just missing some basic concept of SharePoint.
Any advice or links to good discussions of this topic would be greatly appreciated. Thanks.
I would recommend that you use metadata and views to separate data in one repository/site.
My reasons are as below:
In SharePoint, it is recommended to use metadata than "evil"
folders(or subsites in your case). Keep in mind that maintaining
multiple subsites requires big administrative efforts in long term,
for example, some sites will be inherited while others unique
permission.
As time passes by and people rotate, it becomes vague
that where the data was stored and where the new data should go to,
especially with large volume subsites.
Since confidentiality is not concerned in your case, keep data centralized and open to people working in related field increases sharing and collaborating phenomenon. In contrast, using subsites increases the possibility of data silos.
people are all lazy :). Taken me as example, I dont want to remember all those xyz URLs, I want to go to one place and be able to fetch everything that I need.

Need technology recommendation/suggestion

My company is in need of a task management system to handle scenarios as simple as "Purchase a computer for X" to "Relocate a person to another country". The simple scenarios are a single tasks handled by a single person, whereas bigger tasks can be broken down into multiple sub tasks delegated to multiple people during the workflow. Additionally the clients and vendors need their own views into the process.
We are evaluating different solutions from a custom application built on Workflow Foundation to SharePoint to BPM products like Metastorm and BPM.Net.
Here's my current understanding of these solutions:
Workflow Foundation - Low level workflow designer and/or library with no host environment. It seems we would have to reinvent some wheels if we went this route such as fault tolerance and document management. Some of the answers on stack also cause concerns such as the lack of versioning and a complete overhaul for VS10/.NET 4.0
SharePoint - Built for document management and collaboration but trying to create advanced workflows and tasking on top of that seems like a hack. Plus all workflows have to be tied to either documents or lists. I cant envision how a list (or list of lists) can address this issue.
BPM products - Mature workflow engine at a seemingly high price. BPM.Net is the only solution for which I could find some level of technical detail but im still not sure how different developing against this product would be from developing against Workflow Foundation.
Are there any workflow engines dedicated to solving all the workflow pains that can be easily deployed with their own hosting environment and initiated through a webservice?
Are there any other options I am missing?
Thanks in advance.
****Edit**
To answer the questions below the workflow needs are pretty light. Basic routing of tasks to approvers and subcontractors.
Whats driving us too look deeper than PM software is the nature of the business not the need for advanced workflow. We are basically in the business of procuring goods and services through subcontractors for our clients which can also include full employee relocation. The interface of the package should reflect this by being customer branded as well as intuitive for this line of business.
Basically if im moving my family to the other side of the world Im not sure i'd want to interface with Jira or Sharepoint or any other PM software to facilitate this.
If you are on Microsoft stack I would definitely recommend SharePoint for this scenario. As it seems to be very simple you can go with Windows SharePoint Services edition because it is free and it has everything you need.
You are right when you say that ShartePoint workflow are bit limited. IMHO the best way to overcome that limitation is to purchase Nintex workflow to create your workflows. It is cost effective solution that can help you design workflows you need.
You can find workflow samples inside the product (as workflow templates) and on the web site.
Nothing you mentioned has much to do with workflow. You're just doing project management. If that's the case, a simple bug tracker (like FogBugz! ;) would work - but if you're going to show it externally, it may not be the most professional presentation.
The closest off the shelf solution I can think of would be Project Server - though, depending on the number of projects and project managers, the desktop Project with a sync to a webserver for client views may be enough.
If that's overkill - because your projects don't require a lot of resource scheduling, Gantt charts, or other PM artifacts - you can take something like Trac and replace "bug" with "task". ;) (Seriously though, that'd probably get you 90% of the way there.....)
Have you looked at RT? I believe it can handle all your requirements, including that it's designed to let customers interact with the system by email, rather than having to log into the website. If you've emailed IT support desks then you've probably interacted with it without knowing... You can also completely customise the web interface and allow customer access.
Can't vouch for the quality as I haven't used it, but I did watch an online-demo video of Intalio, which has BPM and workflow capabilities.
We use Basecamp to control this sort of "task management" stuff. I'm not sure if it fits your needs totally, as it's a little light on the document management side, but it has a web service (REST) API, customer / vendor facing components, and basic interaction / chat capabilities.
The best part about it is that the API is simple enough where you can offload a lot of the "management" for it to admin support personnel, like assistants and interns, by providing custom scripts. If you've got people who aren't programmers using it you'll probably have better luck with it than even something like Trac or FogBugz.
I have/am going through a similar process. We wanted a lightweight workflow for internal use by our sales team. Most of the third party apps we looked at ,K2 and Skelta BPM.Net in particular, looked way over the top for what we needed. I'm now 2 months into working with Windows Workflow Foundation 3.0 and I have to say it isn't the most pleasant coding experience I've had.
If your workflows will truely be simple then it is pretty easy to build a workflow and hook it up to some web pages for the UI. But if you need to be able to change it on the fly, or do versioning (ie the user says we want another step added, then its a whole lot of hacking to get it to work - and it only works if you limit your workflow to being really simple), then you are in for a fair bit of work. And forget about it if you use an Oracle database.
The next version of windows workflow will have it's own runtime environment, code name dublin, with will provide a WCF interface into the workflows.
If your timeframe allows you could use that.
For information on Dublin and the next version of WF see:
http://www.microsoft.com/net/dublin.aspx
My vote is for FogBugz. Unless I am missing something in your requirements, why would you want to reinvent the wheel by using a code based workflow solution where you have to code up the flows yourself when you can use a perfectly good project dependency solution like FB or even MS Project Server - which lets you create nice dependencies for resources and people.
Check FileNet
FileNet is expensive but makes a good job with content and process management, but I guess is not what you are looking for.
We use Captaris Workflow, it is pretty good but it may be expensive for your needs.

network drive file sharing

For the better part of 10 years + we have relied on various network mapped drives to allow file sharing. One drive letter for sharing files between teams, a seperate file share for the entire organization, a third for personal use etc. I would like to move away from this and am trying to decide if an ECM/Sharepoint type solution, or home grown app, is worth the cost and the way to go? Or if we should simply remain relying on login scripts/mapped drives for file sharing due to its relative simplicity? Does anyone have any exeperience within their own organization or thoughts on this?
Thanks.
SharePoint is very good at document sharing.
Documents generally follow a process for approval, have permissions, live in clusters... and these things lend themselves well to SharePoints document libraries.
However there are somethings that don't lend themselves well to living inside SharePoint... do you have a virtual hard drive (.vhd) file that you want to share with a workmate? Not such a good idea to try and put a 20GB file into SharePoint.
SharePoint can handle large files, and so can SQL Server behind it... but do you want your SQL Server bandwidth being saturated by such large files? Do you want your backup of SQL Server to hold copies of such large files multiple times?
I believe that there are a few Microsoft partners who offer the ability to disassociate file blobs from the SharePoint database, so that SharePoint can hold the metadata and a file system holds the actual files, and SharePoint simply becomes the gateway to manage access, permissions, and offer a centralised interface to files throughout an organisation. This would offer you the best of both worlds.
Right now though, I consider SharePoint ideal for documents, and I keep large files (that are not document centric) on Windows file shares.
Definetely, use a tool.
The main benefit here is version control. Being able to jump easily to a previous version, diff'ing and seeing who modified what (see most VCS' blame/annotate tool- it prints out a text file showing when/who modified each line in the text file).
Second, you can probably benefit from issue tracking/task tracking.
Other benefits include web access from the internet, having a wiki (which can be great in some situations), etc.
I use Subversion + Redmine at work, and I find it highly useful- test a few solutions and you will surely find out further advantages for you.
One thing that can be overlooked in the change to an document management tool is the planning required around how much is going to be stored and information architecture issues like where different content is going to end up.
SharePoint particularly is easy to setup without a good plan going forward and is particularly vulnerable to difficulties later on when things get to busy.
I would not recommend a home grown app for something like this. The problem has been solved by off the shelf tools and growing one from scratch is going to cost a huge amount and not get you any way near the features for the money.
Did I mention how important planning your security groups and document areas (IA) was?
If you need just document storage then sharepoint can do very well. WSS is ewen free and it provides very good document storage capabilities.
But you have to plan carefully as updating existing applications is painfull. If you decide to go with Sharepoint then I can give you few advices from top of my head
Pay attention to security configuration (user groups, privilegies,..)
Plan your document libraries well as it is not easy to just move documents betveen them
Also consider limiting number of versions that one document can have, because sharepoint stores full backups betveen verions, not just changes
Don't use infopath:) we have very bad experience with it (just don't tell this to managers)
If you don't really need to change graphical look of Sharepoint than don't bother with it as it brings many problems (I'm talking about custom masterpages and custom site templates)
Try to use as much OOB stuff as possible, because developing your own webparts not only cost more, but it can be quite complicated.
Make sure to turn-on search indexing. This is quite tricky, because it is by default turned off and then you will be as surprised that search is not working as I was :)
If you try to just deploy it and load 10.000 documents into it then you will surely have problems with it later. If you give a little thought about structure then you will end up with really good document storage.
Migrating is very probably worth the cost in the long term. You will gain reliability, versioning, traceability, and extensibility.
Be sure to first identify the groups/rights, and to identify which links need to be fixed (maybe you have applications that use links to the shares).
An open source alternative to SharePoint is Alfresco, it is very good for CIFS (Windows shares) too.

WSS/MOSS Development ... Where to draw the line?

Our organization started on the SharePoint path about two years ago. Before that, we (the developers) wrote mostly asp.net front ends for SQL back ends. Now it seems like every time a new project comes up, we are asked to “make” it fit in SharePoint; and we have stuffed some things into SharePoint that probably should have been stand alone applications or web applications due to complexity and interactions with other technologies.
My question is: Where do you draw the line as to developing a project in SharePoint versus Web/Winform application, and how do you convince your manager(s) that SharePoint may not be the best solution for a particular project?
I sort of agree with you that this is sometimes a tough question. In general, though, I agree with the cliche that you just have to think about a sharepoint app a little differently. If your data can be considered as list-based, then SharePoint probably isn't a necessarily bad development framework. It may seem like more work on the surface, but IMO the challenges just move from one place to another. If you use things like custom field templates and web parts, you can relatively naturally handle all sorts of data. And you get the positive aspects of SharePoint for free (an already mature security framework, built-in searching, site and list templates/definitions, personalized page customizations, yada, yada).
I also I don't know what you mean by "complexity and interactions with other technologies" here, so it's hard to imagine what specific issues might be introduced when SharePoint is added to the mix.
If your dev team is relatively inexperienced with SharePoint and you care about quality and deadlines, I can definitely see your point. It's not an easy learning curve, but I think the SharePoint product is more naturally extensible than many people give it credit for.
There is, in some cases, a third option between a SharePoint application and an ASP.NET application. You can build custom site and application pages and deploy them to a SharePoint site. (The book Inside Windows SharePoint Services 3.0 gives a good overview of how to do this.) This will allow you to use ASP.Net and SQL Server within a SharePoint environment (which means you can also take advantage of things like SharePoint security). It's not as easy as developing a plain ASP.Net application, but it's a compromise.
Of course, this is sort of a technicality if they're wanting these new applications to be built on SharePoint technologies (lists, libraries, workflow, etc.), not just to be "inside" SharePoint.
One of the primary reasons why you might put an applicaiton in SP is when you want to take advantage of the building blocks SP gives you:
Security (share security with the site)
Data (store some or all of your data in lists)
Provisioning (if you want you app on multiple sites)
Some basic data UI e.g. Lists give you that and you dont need to build it.
One thing to consider when trying to 'integrate' a new app into the existing pool is whether there is any overlap in data (customers, inventory, etc) that would benefit from the merger.
There is also the benefit of being able to back up multiple applications and all of their respective data in one place.
Why are they asking for it all to go into SharePoint?
In my experience it is because the 'ole SharePoint intranet is being great as a portal to keep everything together and findable under the one information architecture.
Approach the issue from a uses perception of the application space in the organisation.
So long as the application looks and feels just like part of the intranet site and the user does not have to think about how to get to it (and how to get back out), you can pretty much take any architecture decisions necessary to get the best bang for the organisations buck when it comes to implementation and maintainence.
When we started thinking about the site less from SharePoint vs other stuff to the nice woolly concepts of Information Architecture, findability and usability, our decisions not to make it actually inside SharePoint, but still skin it like the Intranet became easier to sell.

Resources