How do I modify the following squid url_rewriting_program - linux

Please modify the above program, so it replaces all the images (e.g. .jpg or .gif images) inside any page with a picture of your choice. When an HTML web page contains images, the browser will identify those image URLs, and send out an URL request for each image. You should be able to see the URLs in your URL rewriting program. You just need to decide whether a URL is trying to fetch an image file of a particular type; if it is, you can replace the URL with another one (of your choice).
use strict;
use warnings;
# Forces a flush after every write or print on the STDOUT
select STDOUT; $| = 1;
# Get the input line by line from the standard input.
# Each line contains an URL and some other information.
while (<>)
{
my #parts = split;
my $url = $parts[0];
# If you copy and paste this code from this PDF file,
# the ˜ (tilde) character may not be copied correctly.
# Remove it, and then type the character manually.
if ($url =˜ /www\.cis\.syr\.edu/) {
# URL Rewriting
print "http://www.yahoo.com\n";
}
else {
# No Rewriting.
print "\n";
}
}

Look at the if block. It's matching on the URL and then performing an action. You just need to do something similar.
Have a look at the perlrequick page for a quick overview of how regular expressions work in Perl.

Related

Read the mail attachment from Linux command line

Is it possible to read the emails based on the Subject line and then get the base64 attachment or directly get the attachment ?
Server : Linux System
Your question seems to presuppose that there is a single attachment and that it can be reliably extracted. In the general case, an email message can have a basically infinite amount of attachments, and the encoding could be one out of several.
But if we assume that you are dealing with a single sender which consistently uses a static message template where the first base64 attachment is always going to be the one you want, something like
case $(formail -zcxSubject: <"$message") in
"Hello, here is your report for "*)
awk 'BEGIN { h=1 }
h { if ($0 ~ /^$/) h=0 ; next } # skip headers
/^Content-Disposition: attachment/ { a=1 } # find att
a && /^$/ { p=1; next }
p && /^$/ { exit }
p' "$message" |
base64 -d ;;
esac
This will extract the Subject: header and compare it to a glob pattern. I expect this is what you mean by "based on subject" -- if we find a matching subject header, examine this message, otherwise discard.
The crude Awk script attempts to isolate the base64 data and pass it to base64 -d for extraction. This contains a number of pesky and somewhat crude assumptions about the message format, and probably requires significant additional tweaking. Briefly, we skip the headers, then look for MIME headers identifying an attachment, and print that, skipping everything else in the message. If this header is missing, or identifies the wrong MIME part, you will get no results, or (worse) incorrect results. Also, the /^Content-Disposition:/ regex could theoretically match on a line which is not a MIME header, though this seems highly unlikely (but might actually happen if you are looking e.g. at a bounce message).
A more robust approach would involve a MIME extraction tool or perhaps a custom script to actually parse the MIME structure and extract the part you want. Without details about what exactly you need, I'm not able to provide that. (This would also allow you to use the sender's specified filename; the above script simply prints the decoded payload to standard output.)
Note also that formail has no idea about RFC2047 encoding, so if the subject is not plain ASCII, you have to specify the encoded form in the script.

Incrementing Through URLs and Downloading

I would just like a simple browser automation that increments one number in a URL and downloads the information from that place. For example, if the address looks like this:
www.test.com/something/part1_0.jpg
How could I increment the '1' and download the file from each successive web page?
Thanks
P.S. I'm using OS X 10.9
Here's a ruby solution using open-uri:
require 'open-uri'
(1..100).each do |num|
File.open("part#{num}_0.jpg", 'wb') do |f|
f.write open("www.test.com/something/part#{num}_0.jpg").read
end
end
This snippet A) creates a range of numbers; B) iterates over the range of numbers; C) opens an image file in binary mode and interpolates the current number into the file name; and D) reads the image from the URL and writes it.
But the easiest way would probably be to use curl from your command line:
curl -O www.test.com/something/part[1-100]_0.jpg
Depending on the number of webpages that you need to access, modify the numbers in brackets accordingly.

How to get full path from relative path

I'm trying to access a page from another domain, I can get all other html from php, but the files like images and audio files have relatives paths making them to be looked inside the local server whereas they're on the other server.
I've allowed cross-domain access though PHP from the other page.
header('Access-Control-Allow-Origin: *');
Then I use AJAX load to load that pages' content.
$('#local_div').load('page_to_load_on_side_B #div_on_that_page');
Now, the path looks like this:
../../user/6/535e55ed00978.jpg
But I want it to be full like.
http//:www.siteB.com/user/6/535e55ed00978.jpg
Correction: I have full access to both sites so I need to get the absolute paths from the site where these files are originating.
For this problem would use one of the following:
Server Side Approach
I would create a parameter in server B named for example abspath. When this param is set to 1 the script would start an output buffer ob_start() then before submiting would get ob contents with ob_get_clean() and finally using regular expressions make a replace of all urls for http//:www.siteB.com/. So, the script on server A would look like follows:
<?php
$abspath=(isset($_REQUEST["abspath"])?$_REQUEST["abspath"]:0);
if($abspath==1) ob_start();
// Do page processing (your actual code here)
if($abspath==1)
{
$html=ob_get_clean();
$html=preg_replace("\.\.\/\.\.\/", "http://siteb.com/");
echo $html;
}
?>
So in client side (site A) your ajax call would be:
$('#local_div').load('page_to_load_on_side_B?abspath=1#div_on_that_page');
So when abspath param is set to 1 site B script would replace relative path (note I guessed all paths as ../..) to absolute path. This approach can be improved a lot.
Client Side Approach
This replace would be done in JavaScript locally avoiding changing Server B scripts, . The replacements in Javascript would be the same. If all relative paths starts with ../.. the regex is very simple, so in site A replace $('#local_div').load('page_to_load_on_side_B #div_on_that_page'); for the following (note that I asume all relatives urls starts with ../..):
$.get('page_to_load_on_side_B #div_on_that_page', function(data) {
data=data.replace(/\.\.\/\.\.\//, 'http://siteb.com/');
$('#local_div').html(data);
});
That will do the replacement before setting html to DIV so images will be loaded from absolute URL.
Ensure full CORS access to site B.
The second approach is clean than the first so I guess would use Javascript to do the replacements, both are the same only changes where the replace is done.
There is a PHP function that can make absolute path from relative one.
realpath()
If you mean URL path, simply replace all occurences of "../" and add domain in front.
Try this one:
function getRelativePath($from, $to)
{
// some compatibility fixes for Windows paths
$from = is_dir($from) ? rtrim($from, '\/') . '/' : $from;
$to = is_dir($to) ? rtrim($to, '\/') . '/' : $to;
$from = str_replace('\\', '/', $from);
$to = str_replace('\\', '/', $to);
$from = explode('/', $from);
$to = explode('/', $to);
$relPath = $to;
foreach($from as $depth => $dir) {
// find first non-matching dir
if($dir === $to[$depth]) {
// ignore this directory
array_shift($relPath);
} else {
// get number of remaining dirs to $from
$remaining = count($from) - $depth;
if($remaining > 1) {
// add traversals up to first matching dir
$padLength = (count($relPath) + $remaining - 1) * -1;
$relPath = array_pad($relPath, $padLength, '..');
break;
} else {
$relPath[0] = './' . $relPath[0];
}
}
}
return implode('/', $relPath);
}
Also you can find below solution:
In general, there are 2 solutions to this problem:
1) Use $_SERVER["DOCUMENT_ROOT"] – We can use this variable to make all our includes relative to the server root directory, instead of the current working directory(script’s directory). Then we would use something like this for all our includes:
include($_SERVER["DOCUMENT_ROOT"] . "/dir/script_name.php");
2) Use dirname(FILE) – The FILE constant contains the full path and filename of the script that it is used in. The function dirname() removes the file name from the path, giving us the absolute path of the directory the file is in regardless of which script included it. Using this gives us the option of using relative paths just as we would with any other language, like C/C++. We would prefix all our relative path like this:
include(dirname(__FILE__) . "/dir/script_name.php");
You may also use basename() together with dirname() to find the included scripts name and not just the name of the currently executing script, like this:
script_name = basename(__FILE__);
I personally prefer the second method over the first one, as it gives me more freedom and a better way to create a modular web application.
Note: Remember that there is a difference between using a backslash “\” and a forward (normal) slash “/” under Unix based systems. If you are testing your application on a windows machine and you use these interchangeably, it will work fine. But once you try to move your script to a Unix server it will cause some problems. Backslashes (“\”) are also used in PHP as in Unix, to indicate that the character that follows is a special character. Therefore, be careful not to use these in your path names.

Issue with filepath name, possible corrupt characters

Perl and html, CGI on Linux.
Issue with file path name, being passed in a form field, to a CGI on server.
The issue is with the Linux file path, not the PC side.
I am using 2 programs,
1) program written years ago, dynamic html generated in a perl program, and presented to the user as a form. I modified by inserting the needed code to allow a the user to select a file from their PC, to be placed on the Linux machine.
Because this program already knew the filepath, needed on the linux side, I pass this filepath in a hidden form field, to program 2.
2) CGI program on Linux side, to run when form on (1) is posted.
Strange issue.
The filepath that I pass, has a very strange issue.
I can extract it using
my $filepath = $query->param("serverfpath");
The above does populate $filepath with what looks like exactly the correct path.
But it fails, and not in a way that takes me to the file open error block, but such that the call to the CGI script gives an error.
However, if I populate $filepath with EXACTLY the same string, via hard coding it, it works, and my file successfully uploads.
For example:
$fpath1 = $query->param("serverfpath");
$fpath2 = "/opt/webhost/ims/DOCURVC/data"
A comparison of $fpath1 and $fpath2 reveals that they are exactly equal.
A length check of $fpath1 and $fpath2 reveals that they are exactly the same length.
I have tried many methods of cleaning the data in $fpath1.
I chomp it.
I remove any non standard characters.
$fpath1 =~ s/[^A-Za-z0-9\-\.\/]//g;
and this:
my $safe_filepath_characters = "a-zA-Z0-9_.-/";
$fpath1 =~ s/[^$safe_filepath_characters]//g;
But no matter what I do, using $fpath1 causes an error, using $fpath2 works.
What could be wrong with the data in the $fpath1, that would cause it to successfully compare to $fpath2, yet not be equal, visually look exactly equal, show as having the exact same length, but not work the same?
For the below file open block.
$upload_dir = $fpath1
causes complete failure of CGI to load, as if it can not find the CGI (which I know is sometimes caused by syntax error in the CGI script).
$uplaod_dir = $fpath2
I get a successful file upload
$uplaod_dir = ""
The call to the cgi does not fail, it executes the else block of the below if, as expected.
here is the file open block:
if (open ( UPLOADFILE, ">$upload_dir/$filename" ))
{
binmode UPLOADFILE;
while ( <$upload_filehandle> )
{
print UPLOADFILE;
}
close UPLOADFILE;
$msgstr="Done with Upload: upload_dir=$upload_dir filename=$filename";
}
else
{
$msgstr="ERROR opening for upload: upload_dir=$upload_dir filename=$filename";
}
What other tests should I be performing on $fpath1, to find out why it does not work the same as its hard-coded equivalent $fpath2
I did try character replacement, a single character at a time, from $fpath2 to $fpath1.
Even doing this with a single character, caused $fpath1 to have the same error as $fpath2, although the character looked exactly the same.
Is your CGI perhaps running perl with the -T (taint mode) switch (e.g., #!/usr/bin/perl -T)? If so, any value coming from untrusted sources (such as user input, URIs, and form fields) is not allowed to be used in system operations, such as open, until it has been untainted by using a regex capture. Note that using s/// to modify it in-place will not untaint the value.
$fpath1 =~ /^([A-Za-z0-9\-\.\/]*)$/;
$fpath1 = $1;
die "Illegal character in fpath1" unless defined $fpath1;
should work if taint mode is your issue.
But it fails, and not in a way that takes me to the file open error block, but such that the call to the CGI script gives an error.
Premature end of script headers? Try running the CGI from the command line:
perl your_upload_script.cgi serverfpath=/opt/webhost/ims/DOCURVC/data

Regular Expression validating hyperlink for export to Excel

I have a web application that takes input from a user, usually in the form of a filepath, hyperlink, or fileshare, but not always. A user may enter "\my.fileshare.com", "http://www.msdn.com", or "In my file cabinent". These inputs are exported to a Excel file. However, if the input is in the form of "\look on my desk" or "http://here it is" (notice the spaces), after the file is exported, and opened, Excel raises the ever so descriptive error message of, and I quote, "Error".
I'm adding to the existing code a regular expression validator to the textbox the user enters and edits these locations in. Because there are a large number of existing entries, the validator needs to be specific as possible, and only toss out the inputs that cause the Excel export to break. For example "\Will Work" will work, as will "Will Work", and "\This\will also work". I need a regular expression that if the string starts with \, http://, https://, ftp://, ftps://, the server or fileshare name does not have a space in it, and if it does not start with the \, http://, https://, ftp://, ftps://, its fine regardless.
I've been able to write the first part
^(\\)[^ \]+(\.)$|^(((ht|f)tp(s)?)://)[^ /]+(/.)$
but I can't figure out how to say ignore everything if it does not start with \, http://, https://, ftp://, ftps://.
^(?:(?:\\|(?:ht|f)tps?://)\S+|(?!\\|(?:ht|f)tps?://).*)$
Explained:
^ # start-of string
(?: # begin non-capturing group
(?:\\|(?:ht|f)tps?://)\S+ # "\, http, ftp" followed by non-spaces
| # or
(?!\\|(?:ht|f)tps?://).* # NOT "\, http, ftp" followed by anything
) # end non-capturing group
$ # end-of-string
This is pure, unescaped regex. Add character escaping according to the rules of your environment.
EDIT: Ooops premature.
This expression still doesn't allow "http://www.google.com/hello world" :/
EDIT FOR A THIRD TIME
Here we go!
^(?:(?:\\|(?:ht|f)tps?://)[^ /\]+([/\].)?|(?!\\|(?:ht|f)tps?://).)$

Resources