how to remove multiple whitespaces and newline character from HTML entity - string

I am trying to implement a crawler using codeigniter and simplehtmldom.
$page = "URL to be Crawled";
$html = file_get_html($page);
$ad_description = $html->find('#ad_description',-1);
$description = $ad_description->innertext;
$description contains multiple consecutive spaces and newline which I need to convert in to single appearances.
I tried
str_replace("\n\r",' ',$description),
reduce_multiples($ad_description->innertext,"\r")
preg_replace("/[\r\n]+/", "\n", $description)
ascii_to_entities($description,ENT_HTML5, "ISO-8859-1")
and many other possible options but without success. Any help would be appreciated.

i think that pref_replace does work
$description = "This
is a
test string
";
echo $description = preg_replace('/\s+/', ' ', $description); // This is a test string

Related

Add character between column combine in a CSV

I'm looking for the correct syntax to add some (") between my variable.
I need something like that :
"firstname","lastname","email","",""
Here is the first script I have :
foreach($line in Get-Content .\extract.csv)
{ $firstname = $line.split(';')[0]
$lastname = $line.split(';')[1]
$email = $line.split(';')[2]
$newLine = "$firstname - $lastname - $email"
echo $newLine }
I'm really new in scripting and I'm a bit lost with all these (') (")
My second question is : I need to extract my data only from the second row and ignore the first one, can you help me for this too ?
Thanks !
Have you try escaping your " and ' ?
In powershell you can use backtick ` (AltGr + 7) or doubling the char to do so :
Example :
Write-Host(" `" ")
Write-Host(" "" ")
Please add more code if this doesn't solve you issue !

php strips [[:char_class:]] from the string

When concatenating mysql regex character classes in php they disappear from the resulting string i.e.:
$regexp_arr = array('(word1)', '(word2)');
$value = 'word3';
$regexp_str = implode('[[:space:]]', $regexp_arr);
$v1 = '[[:<:]](' . $value . ')';
echo $regexp_str;
// gives
'(word1)(word2)';
// instead of
'(word1)[[:space:]](word2)'
echo $v1;
// gives
'(word3)'
//instead of
'[[:<:]](word3)'
I've tried with double quotation marks ", the result still the same.
Is there a special way to concatenate this in php? Why are the '[[:char_class:]]' getting stripped?
server php version is 5.6.36
In MODX, [[ and ]] are special characters used to indicate they are tags MODX needs to process. Even when you echo or retrieve it from the database, MODX will process them when rendering.
For debugging, you can follow-up your echo with an exit().
echo $regexp_str;
exit();
That short-circuits MODX and gives you the actual value of the string including the square brackets.
If you want the value to be visible in a MODX-rendered resource or template, then you'll have to replace them with their html entities first:
$regexp_str = str_replace(['[',']'], ['[', ']'], $regexp_str);

Getting the characters of a string up to the first "."

I'm attempting to use Perl's gethostnamebyaddr function. The annoying thing is that it returns the entire domain name in scalar format. I want to parse out only the hostname and discard the rest.
I'm using split to divide the domain name into an array and then taking only the first value but this doesn't seem to work.
#!/usr/bin/perl
use Socket;
my $name;
my $hostname;
my #tmpStr;
$name = gethostbyaddr(inet_aton("192.168.2.3"), AF_INET);
print "$name\n";
#tmpStr = split ".", $name;
$hostname = $tmpStr[0];
print "Host name is $hostname\n";
When the above code is executed, I get the following:
dc1-ent.ent.ped.local
Host name is
According to this website the return value is not a string but is rather a scalar value and so my attempt at splitting it doesn't work.
I can't figure out how to convert it to a string before I can split it or parse out the hostname by itself.
The dot character has special meaning for regular expressions in Perl, and the 1st argument to split is a regular expression. You need to escape the dot:
use warnings;
use strict;
my $name = 'dc1-ent.ent.ped.local';
print "$name\n";
my #tmpStr = split /\./, $name;
my $hostname = $tmpStr[0];
print "Host name is $hostname\n";
This outputs:
dc1-ent.ent.ped.local
Host name is dc1-ent
I would write it like this
my $name = gethostbyaddr(inet_aton('192.168.2.3'), AF_INET);
my ($host) = $name =~ /([^.]+)/;
say $host;
Your problem is not related to gethostbyaddr() but by what follows.
Proof:
DB<1> $name = 'dc1-ent.ent.ped.local';
DB<2> #tmpStr = split ".", $name;
DB<3> print #tmpStr;
(nothing printed)
Try instead using split that way:
DB<8> $name = 'dc1-ent.ent.ped.local';
DB<9> #tmpStr = split(/\./, $name);
DB<10> print #tmpStr;
dc1-ententpedlocal
DB<11> print join(' ', #tmpStr);
dc1-ent ent ped local
DB<12> x #tmpStr;
0 'dc1-ent'
1 'ent'
2 'ped'
3 'local'
Or if you absolutely want a string and not a regex, protect the dot also as your string is still parsed as a regular expression (which is why being explicit with / / has its merits, it forces you to remember that some character have special meaning there, like the dot):
DB<1> $name = 'dc1-ent.ent.ped.local';
DB<2> #tmpStr = split('.', $name);
DB<3> print #tmpStr;
DB<4> #tmpStr = split('\.', $name);
DB<5> x #tmpStr
0 'dc1-ent'
1 'ent'
2 'ped'
3 'local'

How to remove the left side of this string?

My string is:
$dst = "Folder_1\SubFolder_2\3\4\5"
My goal is to have:
$dst_OK = "SubFolder_2\3\4\5"
I tried use split function like this:
$dst_OK = $dst.split("\")[0]
but the result is Folder_1 only.
You could use the following regex to remove the left side of the string:
$dst_OK = $dst -replace '^.*?\\'
However, since it looks like you are dealing with a path, you may consider to using builtin function within the System.IO.Path namespace.
You can do it with this snippet:
$first, $rest = "Folder_1\SubFolder_2\3\4\5" -split '\\'
$rest = $rest -join '\'
Other solution :
($dst -split "\\", 2)[1]
Solution 2
$dst.Substring($dst.IndexOf('\')+1)

Batch Rename Files - Append Lines 1 & 3

I'm using Windows 7. I have a bunch of text files, each containing one email message. Each starts this way:
FROM: Person
TO: Another Person
DATE: 01-Jan-11 at 18:12:00
SUBJECT: Whatever
I want to rename these files so that their names look like this:
2011-01-01 18.12 Email from Person to Another Person re Whatever.txt
Batch programming is all I know, and I don't know it very well. For purposes of restraining this to a project that I can understand quickly, I think my best solution will be to extract the essential data into a text file that I can then massage into a batch renaming file.
In that case, what I'm looking for is a batch file that will extract the data into single lines in a text file that I can then massage into shape with global edits. In other words, I think I'm looking for text lines in this format:
[current filename] [extracted date and time string] [from] [to] [subject]
Example:
file01.txt 01-Jan-11 at 18:12:00 from Person to Another Person re Whatever
If I've got lines like that, I can parse them into renaming commands pretty quickly in Excel.
Thanks!
Given that your using Windows 7, I thought I'd suggest an alternative. Windows Powershell is a a very useful command tool that can be used for a ton of stuff. I think I solved your complete problem:
$folder = "C:\..."
$regex = "FROM: (.*) TO: (.*) DATE: (.*) at (.*) SUBJECT: (.*)"
$files = Get-ChildItem $folder *.txt
ForEach ($file in $files) {
$line = (Get-Content $file.FullName -TotalCount 1)
$match = ([regex]$regex).matches($line)[0]
$date = [DateTime]($match.Groups[3]).Value + [TimeSpan]($match.Groups[4]).Value
$from = ($match.Groups[1])
$to = ($match.Groups[2])
$subject = ($match.Groups[5])
# You can change the naming format in the brackets below
Rename-Item $file.FullName -NewName ( $date.ToString("yyyy-MM-dd_HH-mm-ss") + " Email From " + $from + " to " + $to + " RE " + $subject)
}
It makes a few assumptions (like a match will always be found). You can easily adjust naming format and other things. Save this code as a script (.ps1) and run it in the Powershell prompt (powershell.exe)

Resources