How can I find files that aren't needed on my site so I can delete them? - web

I'm developing a website, and after testing different ways to do things, I know that I have many files on my site that are not being used, including HTML/PHP files, images, stylesheets, and external scripts. Is there some program I can use or something so I can find all of the files that I don't need so I can delete them?
I need to find all files that are safe to delete, don't have anything to do with the site anymore, and that deleting them won't have any effect on how my site works.
I've tried finding orphaned files in Dreamweaver, but it lists a lot of files that I do actually need.

Here's one idea: Crawl the site and create a list of every file you can find, then check anything that's not on that list. Wikipedia has a list of crawlers including some open source ones.

Xenu's linksleuth is the easiest way I've found.
http://home.snafu.de/tilman/xenulink.html
After you do the scan you have the option to put in your FTP info. If you do so, it will also generate a list of files that are not accessible (orphans).

How would you qualify unnecessary? That's something you need to be sure of before beginning this. I guess one way to garbage collect your site is to delete files not being referenced by any other files.

The idea with the crawler #Brendan to get all files that actually are used is very nice.
Then you can start deleting files from your website and after that use a program to find any broken links in your website like Xenu or LinkTiger or then one you prefer.

You can connect with some ftp application, and delete files manual. This is the safest way, because scripts and programs don't know what is needed and what not...

This did not exist at the time this question was asked, but there is a Python script called weborphans designed for this purpose.
Here's a blog entry by the author with some more info: Finding orphaned files on websites

Related

Tabcompletion and docview while editing rc.lua

I saw that there is a lua plugin for eclipse and there is a docpage on the awesome main page api_doc and all the .lua files in /usr/share/awesome/lib.
So I thought it must be possible to create a Library or Execution Environment so that one has tabcompletion and docview.
So I tried making my own Execution Environment:
wrote the standard .rockspec file
downloaded the documentation made an ofline version of it and put it in docs/ folder
ziped the files and folders in /usr/share/awesome/lib
ziped all up
tried it out ... and it failed.
When I try to view a documentaion for a .lua file I get "Note: This element has no attached documentation."
Questions: Am I totaly wrong in my doing (because I have the feeling I am)? Is there any way to edit the rc.lua with tabcompletion and docview?
Koneki will probably take a while to setup, but it's definitly worth it. Going for the".doclua"(by using version 1.2) would certainly make it, but I doubt that using a script to generate the information you need, would work out on the long run.
Most likely, you'll probably pass a bit of time to define what kind of object you're dealing with every time you come across one. The right to do, would be to actually take the time to see if the object/module/inner type inherit from an another object, so can actually have more completion feature as you keep using autocomplete to go from one object to another by pressing "dot"+ctrl_space.
In an ideal world, one person could probably make it right and share to other, so they can enjoy a full featured autocomplete editor.
Found solution for eclipse.
First off the idea of setting up an Execution environment was the wrong one.
So the whole thing about downloading the doc although.
For more information on that visit eclipse Wiki for LUA Development Tool.
The right thing to do is to add a source folder which contains the /usr/share/awesome/lib directory.
The bad news is that my comment from above was totally right, which means one has to configure each .lib file in /usr/share/awesome/lib to meet the requirements of the Documentation Language described here.
Than editing the rc.lua (which one can add to the project in eclipse) works with tabcompletion and doc view.
Since the Documentation Language used in the lib files is similar to the one used by "LUA Development Tool" one has not to change many things. Maybe there are even scripts for that.

Search through scripts of (multiple) cimplicity screens

We are using Cimplicity to operate some installations at our plant. The frontend consists of a lot of .cim files, which are the screens presented to the operator. These files are built with 'cimedit', which is basically a graphical click and drag program with which you can assemble the screens. Each object you drag onto the screen has the option to run a script, which brings me to my problem.
Because each screen contains a lot of small scripts and functions it is hard to keep track of what does what. For example I'm trying to figure out where a certain table from my database is being accessed or updated. Since the files all seem to be compressed (or so) I can't use a regular 'search the contents of this file' search.
Things I've tried so far are searching using windows, with the content option enabled and also tried the compression option. This had no success. It makes sense because like I said, the files seem to be compressed, so the actual script is not stored in plain text.
So, my question in short:
How do I search all the scripts of (preferably multiple) cimplicity screens?
Any tips on how to search compressed files are also very much appreciated.
I stumbled upon another stackoverflow post while searching for a better windows search tool and ended up finding this post: https://superuser.com/questions/26593/best-way-to-confidently-search-files-and-contents-in-windows-without-using-an
This posts recommends Agent Ransack and it is actually possible to search through the .cim files with this tool.

Script for PDF files check

I wanna perform some check for PDF files.
I wish to check the width of pages and also figure out if the file contains double-pages.
Is there any frameworkfor that?
Thanks!
Greetings
Magda
Indeed there is. PDF::API2 looks like it will might you what you need.
It's designed for creation and modification of PDF files. If not, search CPAN for other PDF APIs.
I don't fully understand your question, but for a start check out these utility scripts that come with the CAM-PDF distribution. Look at the lower half of this web page, e.g. getpdfpage.pl.
A lot depends on the complexity of your PDFs, though.

Ignore file extensions only on my machine

My team lead just added a lot of binary files that shouldn't be in source control. I have to pick and choose my battles with him and this isn't one I think is worth bringing up, but I'd like to just ignore these files on my machine without affecting everyone elses. Is this possible?
We're using TortoiseSvn. I've honestly never used the command line so until I learn how to do that I would prefer a solution using the GUI. Thanks!
If all your files resides in a special directory, you could simply use the Add to ignore list from the shell-context menu.
From the settings/general tab you can also add global ignore patterns, based on extension.

linux script, standard directory locations

I am trying to write a bash script to do a task, I have done pretty well so far, and have it working to an extent, but I want to set it up so it's distributable to other people, and will be opening it up as open source, so I want to start doing things the "conventional" way. Unfortunately I'm not all that sure what the conventional way is.
Ideally I want a link to an in depth online resource that discusses this and surrounding topics in depth, but I'm having difficulty finding keywords that will locate this on google.
At the start of my script I set a bunch of global variables that store the names of the dirs that it will be accessing, this means that I can modify the dir's quickly, but this is programming shortcuts, not user shortcuts, I can't tell the users that they have to fiddle with this stuff. Also, I need for individual users' settings not to get wiped out on every upgrade.
Questions:
Name of settings folder: ~/.foo/ -- this is well and good, but how do I keep my working copy and my development copy separate? tweek the reference in the source of the dev version?
If my program needs to maintain and update library of data (gps tracklog data in this case) where should this directory be? the user will need to access some of this data, but it's mostly for internal use. I personally work in cygwin, and I like to keep this data on separate drive, so the path is wierd, I suspect many users could find this. for a default however I'm thinking ~/gpsdata/ -- would this be normal, or should I hard code a system that ask the user at first run where to put it, and stores this in the settings folder? whatever happens I'm going ot have to store the directory reference in a file in the settings folder.
The program needs a data "inbox" that is a folder that the user can dump files, then run the script to process these files. I was thinking ~/gpsdata/in/ ?? though there will always be an option to add a file or folder to the command line to use that as well (it processed files all locations listed, including the "inbox")
Where should the script its self go? it's already smart enough that it can create all of it's ancillary/settings files (once I figure out the "correct" directory) if run with "./foo --setup" I could shove it in /usr/bin/ or /bin or ~/.foo/bin (and add that to the path) what's normal?
I need to store login details for a web service that it will connect to (using curl -u if it matters) plan on including a setting whereby it asks for a username and password every execution, but it currently stores it plane text in a file in ~/.foo/ -- I know, this is not good. The webservice (osm.org) does support oauth, but I have no idea how to get curl to use it -- getting curl to speak to the service in the first place was a hack. Is there a simple way to do a really basic encryption on a file like this to deter idiots armed with notepad?
Sorry for the list of questions, I believe they are closely related enough for a single post. This is all stuff that stabbing at, but would like clarification/confirmation over.
Name of settings folder: ~/.foo/ -- this is well and good, but how do I keep my working copy and my development copy separate?
Have a default of ~/.foo, and an option (for example --config-directory) that you can use to override the default while developing.
If my program needs to maintain and update library of data (gps tracklog data in this case) where should this directory be?
If your script is running under a normal user account, this will have to be somewhere in the user's home directory; elsewhere, you'll have no write permissions. Perhaps ~/.foo/tracklog or something? Again, add a command line option, and also an option in the configuration file, to override this.
I'm not a fan of your ~/gpsdata default; I don't want my home directory cluttered with all sorts of directories that programs created without my consent. You see this happen on Windows a lot, and it's really annoying. (Saved games in My Documents? Get out of here!)
The program needs a data "inbox" that is a folder that the user can dump files, then run the script to process these files. I was thinking ~/gpsdata/in/ ?
As stated above, I'd prefer ~/.foo/inbox. Also with command-line option and configuration file option to change this.
But do you really need an inbox? If the user needs to run the script manually over some files, it might be better just to accept those file names on the command line. They could just be processed wherever, without having to move them to a "magic" location.
Where should the script its self go?
This is usually up to the packaging system of the particular OS you're running on. When installing from source, /usr/local/bin is a sensible default that won't interfere with package managers.
Is there a simple way to do a really basic encryption on a file like this to deter idiots armed with notepad?
Yes, there is. But it's better not to, because it creates a false sense of security. Without a master password or something, secure storage is not possible! Pidgin, for example, explicitly stores passwords in plain text, so that users won't make any false assumptions about their passwords being stored "securely". So it's best just to store them in plain text, complain if the file is world-readable, and add a clear note to the manual to warn the user what's going on.
Bottom line: don't try to reinvent the wheel. There have been thousands of scripts and programs that faced the same issues; most of them ended up adopting the same conventions, and for good reasons. Look at what they do, and mimic them instead of reinventing the wheel.
You can start with the Filesystem Hierarchy Standard. I'm not sure how well followed it is, but it does provide some guidance. In general, I try to use the following:
$HOME/.foo/ is used for user-specific settings - it is hidden
$PREFIX/etc/foo/ is for system-wide configuration
$PREFIX/foo/bin/ is for system-wide binaries
sym-links from $PREFIX/foo/bin are added to $PREFIX/bin/ for ease of use
$PREFIX/foo/var/ is where variable data would live - this is where your input spools and log files would live
$PREFIX should default to /opt/foo even though almost everyone seems to plop stuff in /usr/local by default (thanks GNU!). If someone wants to install the package in their home directory, then substitute $HOME for $PREFIX. At least that is my take on how this should all work.

Resources