match a number (date) in a matrix - Matlab - search

I have an interesting problem that involves taking the last day from a matrix and finding its last month day. Eg, if the date today is Oct-10-2011, you try to search for Sep-10-2011 or the first day < Sep-10-2011 in the matrix.
Matrix has multiple IDs and last trading dates may not be the same.
Vectorized solution is desired. Thanks!
mat = [
1000 734507 11 ; 1000 734508 12 ; 1000 734509 13 ;
2001 734507 21 ; 2001 734508 22 ; 2001 734513 23 ; 2001 734516 25 ;
1000 734536 14 ; 1000 734537 15 ; 1000 734538 16 ;
2001 734536 26 ; 2001 734537 27 ; 2001 734544 28 ; 2001 734545 29;2001 734546 30
];
% datestr(mat(:,2))
[~,m,~] = unique(mat(:,1), 'rows', 'last') ;
lastDay = mat(m,;) ;
Tried using addtodate to get last-month-date here but it fails (more than 1 row)
Once I get the last-dates for each ID, I need to get the exact_day_lastmonth. After this, I need to get data on this day OR the day nearest to it (should be < exact_day_lastmonth).
Answer:
current_lastdays = [1000 734538 16 ; 2001 734546 30] ; % 4-Feb-2011, 12-Feb-2011
matching_lastmon = [1000 734507 11 ; 2001 734513 23] ; % 4-Jan-2011, 10-Jan-2011

Unless you want to risk rather large arrays with complicated indexing, I think a loop is the way to go.
mat = [ 1000 734507 11 ; 1000 734508 12 ; 1000 734509 13 ;
2001 734507 21 ; 2001 734508 22 ; 2001 734513 23 ; 2001 734516 25 ;
1000 734536 14 ; 1000 734537 15 ; 1000 734538 16 ;
2001 734536 26 ; 2001 734537 27 ; 2001 734544 28 ;2001 734545 29;2001 734546 30];
%# datestr(mat(:,2))
[~,m,~] = unique(mat(:,1), 'rows', 'last') ;
lastDay = mat(m,;) ;
matching_lastmon = lastDay; %# initialize matching_lastmon
oneMonthBefore = datenum(bsxfun(#minus,datevec(lastDay(:,2)),[0,1,0,0,0,0]));
for iDay = 1:size(lastDay,1)
%# the following assumes that the array `mat` is sorted within each ID (or globally sorted by date)
idx = find(mat(:,1)==lastDay(iDay,1) & mat(:,2) <= oneMothBefore(iDay),1,'last')
if isempty(idx)
matching_lastmon(iDay,2:3) = NaN;
else
matching_lastmon(iDay,:) = mat(idx,:);
end
end

Related

Parsing cal output in POSIX compliant shell script by read command

I am trying to write a POSIX compliant script, which will print all months in specified year $3, that have day in $1 (for example Mo, Tu,...) on a same date as $2 (1,2,3,...).
Example:
Input: ./task1.sh Tu 5 2006
Output:
September 2006
Mo Tu We Th Fr Sa Su
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30
December 2006
Mo Tu We Th Fr Sa Su
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
I have written this script:
#!/bin/sh
year=$3
dayInMonth=$2
dayInWeek=$1
index=1
while expr $index '!=' 13 >/dev/null; do
cal -m $index $year| tail -n +2| while read Mo Tu We Th Fr Sa Su ; do
eval theDay='$'$dayInWeek
if [ "$theDay" = "$dayInMonth" ]; then
cal -m $index $year;
fi
done
index=$(expr $index + 1)
done
But there is a problem with reading of third line of cal output. In these lines numbers of days usually don't start at Mo place. How can I parse third line of cal output so the numbers in $Mo, $Tu, $We,... are always correct?
Update: You've added the requirement for a posix conform solution. date -d as used in my answer is not POSIX conform. I'll keep the answer for those who are using GNU/Linux.
Btw, the following command gives you posixly correct the day of week offset of Jan 5, 2006:
cal 01 2006 | awk -v d=5 'NR>2{for(i=1;i<NF;i++){if($i==d){print i;exit}}}'
You need to tinker a little shell script around that.
I would use the date command, like this:
#!/bin/bash
dayofweek="${1}"
day="${2}"
year="${3}"
for m in {01..12} ; do
date=$(LANG=C date -d "${year}-${m}-${day}" +'%a %B')
read d m <<< "${date}"
[ "${d}" = "${dayofweek}" ] && echo "${m}"
done
Results:
$ bash script.sh Thu 05 2006
January
October
It's easier to check dates with the command date.
for month in {1..12}; do
if [[ $(date -d $(printf "%s-%2.2d-%2.2d" "$year" "$month" "$day") "+%a") == "Tue" ]]; then
cal -m $month $year;
fi
done
The script loops over the 12 months and generate a date based on year and day. The date command outputs the day of the in a 3 letters format with +%a.
If you want the day of week in number format, use +%u and == 2 in the if statement.

bash script to get the spent time from a file

I have a log file that shows switch time between my scripts:
Tue Oct 24 11:57:54 IRST 2017 Script switched from abc to XYZ
Tue Oct 24 14:03:41 IRST 2017 Script switched from XYZ to ZEN
Tue Oct 24 15:43:16 IRST 2017 Script switched from ZEN to XYZ
Tue Oct 24 17:07:25 IRST 2017 Script switched from XYZ to ZEN
Tue Oct 24 18:40:48 IRST 2017 Script switched from ZEN to XLS
Tue Oct 24 19:52:26 IRST 2017 Script switched from XLS to XYZ
Tue Oct 24 20:20:30 IRST 2017 Script switched from XYZ to ZEN
Tue Oct 24 20:36:06 IRST 2017 Script switched from ZEN to XLS
Tue Oct 24 21:01:03 IRST 2017 Script switched from XLS to XYZ
Tue Oct 24 21:47:47 IRST 2017 Script switched from XYZ to ZEN
How do I get total time spent on each script with bash
So the output shows like this:
abc 2 hours 30 min 40 sec
XYZ 3 hours 23 min 45 sec
zen ...
XLS ...
Assuming you have a log file named test.txt, following script should work,
#!/bin/bash
dtime=0
sname=""
while read line
do
_dtime=$(echo "$line" | awk '{print $1,$2,$3,$4}')
_sname=$(echo "$line" | awk '{print $10}')
_dtimesec=$(date +%s -d "$_dtime")
_timediff=$(( _dtimesec - dtime ))
[ "x$sname" != "x" ] && printf "$sname %d hours %d minutes %d seconds\n" $(($_timediff/3600)) $(($_timediff%3600/60)) $(($_timediff%60))
dtime=$_dtimesec
sname=$_sname
done < test.txt
This will produce an output like the following:
]$ ./test
abc 2 hours 5 minutes 47 seconds
XYZ 1 hours 39 minutes 35 seconds
ZEN 1 hours 24 minutes 9 seconds
XYZ 1 hours 33 minutes 23 seconds
ZEN 1 hours 11 minutes 38 seconds
XLS 0 hours 28 minutes 4 seconds
XYZ 0 hours 15 minutes 36 seconds
ZEN 0 hours 24 minutes 57 seconds
XLS 0 hours 46 minutes 44 seconds
EDIT
In order to find total amount of time spent by each script, this modified script should do the job:
#!/bin/bash
dtime=0
sname=""
namearr=()
timearr=()
while read line
do
_dtime=$(echo "$line" | awk '{print $1,$2,$3,$4}')
_sname=$(echo "$line" | awk '{print $10}')
_dtimesec=$(date +%s -d "$_dtime")
_timediff=$(( _dtimesec - dtime ))
_rc=1
for n in "${!namearr[#]}"
do
if [ "${namearr[$n]}" == "$_sname" ]; then
export _rc=$?
export ind=$n
break;
else
export _rc=1
fi
done
if [ $_rc -eq 0 ]; then
timearr[$ind]=$(( ${timearr[$ind]} + _timediff ))
else
if [ $dtime -eq 0 ] && [ "x$sname" == "x" ]; then
:
else
namearr+=($_sname)
timearr+=($_timediff)
fi
fi
dtime=$_dtimesec
sname=$_sname
done < test.txt
echo "Total time spent by each script:"
echo
for i in "${!namearr[#]}"
do
_gtime=${timearr[$i]}
printf "${namearr[$i]} %d hours %d minutes %d seconds\n" $(($_gtime/3600)) $(($_gtime%3600/60)) $(($_gtime%60))
done
Result:
$ ./test
Total time spent by each script:
XYZ 4 hours 44 minutes 44 seconds
ZEN 3 hours 28 minutes 34 seconds
XLS 1 hours 36 minutes 35 seconds
You can use the following gawk program:
time_spent.awk
BEGIN {
months["Jan"] = "01"
months["Feb"] = "02"
months["Mar"] = "03"
months["Apr"] = "04"
months["May"] = "05"
months["Jun"] = "06"
months["Jul"] = "07"
months["Aug"] = "08"
months["Seb"] = "09"
months["Oct"] = "10"
months["Nov"] = "11"
months["Dec"] = "12"
}
{
split($4, time, ":")
# mktime() manual: https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html
now = mktime($6" "months[$2]" "$3" "time[1]" "time[2]" "time[3])
prv = $(NF-2)
cur = $(NF)
start[cur] = now
spent[prv]+=start[prv]?now-start[prv]:0
}
END {
for(i in spent) {
printf "%s seconds spent in %s\n", spent[i], i
}
}
Save it into a file time_spent.awk and execute it like this:
gawk -f time_spent.awk input.log
Output from the above example:
5795 seconds spent in XLS
0 seconds spent in abc
17084 seconds spent in XYZ
12514 seconds spent in ZEN
#!/usr/bin/env python
import sys
from time import strptime
from datetime import datetime
intervals = (
('weeks', 604800), # 60 * 60 * 24 * 7
('days', 86400), # 60 * 60 * 24
('hours', 3600), # 60 * 60
('minutes', 60),
('seconds', 1),
)
def display_time(seconds, granularity=2):
result = []
for name, count in intervals:
value = seconds // count
if value:
seconds -= value * count
if value == 1:
name = name.rstrip('s')
result.append("{} {}".format(value, name))
return ' '.join(result[:granularity])
with open(sys.argv[1], "rb") as df:
lines = df.readlines()
totals = {}
for i in range(len(lines)-1):
(_,t1,t2,t3,_,t4,_,_,_,_,_,scr) = lines[i].strip().split(' ')
st = datetime.strptime(' '.join([t1,t2,t3,t4]), "%b %d %H:%M:%S %Y")
(_,t1,t2,t3,_,t4,_,_,_,_,_,_) = lines[i+1].strip().split(' ')
et = datetime.strptime(' '.join([t1,t2,t3,t4]), "%b %d %H:%M:%S %Y")
if scr not in totals:
totals[scr] = 0
totals[scr] += (et-st).seconds
print("{} {}".format(scr,display_time((et-st).seconds, 3)))
print("\nTotals:")
for scr in totals:
print("{} {}".format(scr,display_time(totals[scr], 3)))
Here is the output, assuming your times are in a file named logfile:
$ ./times.py logfile
XYZ 2 hours 5 minutes 47 seconds
ZEN 1 hour 39 minutes 35 seconds
XYZ 1 hour 24 minutes 9 seconds
ZEN 1 hour 33 minutes 23 seconds
XLS 1 hour 11 minutes 38 seconds
XYZ 28 minutes 4 seconds
ZEN 15 minutes 36 seconds
XLS 24 minutes 57 seconds
XYZ 46 minutes 44 seconds
Totals:
XLS 1 hour 36 minutes 35 seconds
XYZ 4 hours 44 minutes 44 seconds
ZEN 3 hours 28 minutes 34 seconds
$
Note: I lifted the handy display_time function from here: Python function to convert seconds into minutes, hours, and days.

Search the front of a string to replace the end of the string Perl

After getting some help here is what I have come up with (I was hoping to learn by trying to put multiple scripts together). The script below will do the HW and OW replacements but does not run the if statement.
*#*!/usr/bin/perl
use strict;
use warnings 'all';
$^I = '.bak'; # create a backup copy
while (<>) {
s/HW/HT/g; # do the replacement of HW with HT
s/OW/OT/g; # do a second replacement OW with OT
*#* Hopefully run the if statement
my #parts = /\s*\S+/g;
if ( $parts[1] =~ s/([HO])W/$1T/ ) {
$parts[5] = sprintf '%*d',
length $parts[5],
$parts[1] =~ /HT/ ? 2002 : 2001;
}
print #parts, "\n";
}
I have left the rest of the post below in case people have similar problems.
I would like to use Perl to replace text in a file by searching for specific letters at the beginning of the string. For example here is a section of the file:
6 HT 4.092000 4.750000 -0.502000 0 5 7
7 HT 5.367000 5.548000 -0.325000 0 5 6
8 OT -5.470000 5.461000 1.463000 0 9 10
9 HT -5.167000 4.571000 1.284000 0 8 10
10 HT -4.726000 6.018000 1.235000 0 8 9
11 OT -4.865000 -5.029000 -3.915000 0 12 13
12 HT -4.758000 -4.129000 -3.608000 0 11 13
I would like to use HT as the search and be able to replace the "0" in the column of zeros with 2002. I know how to replace the entire column of zeros but I don't know how to make it line specific. After using HT as the search I need to then search OT and replace the 0 column with 2001.
Basically I need to search a string that identifies the line and replace a specific string of that line while the text that lies between is variable. The output needs to be printed to a new_file.xyz. Also I will be doing this repeatedly on lots of files.
Thanks for your help.
Here is the python code that I was using but could not figure out how to make the "file.txt" be a variable to accept the file typed after the command. This code requires that I change the "file.txt" to be the name of the file every time I use it. Also I could not get it to print to a new file.
python code:
#!/usr/bin/python
with open('file.txt') as f:
lines = f.readlines()
new_lines = []
for line in lines:
if "HT" in line:
new_line = line.replace(' 0 ', '2002')
new_lines.append(new_line)
else:
new_lines.append(line)
content = ''.join(new_lines)
print(content)
I have been able to do some of the work in Perl and was hoping to have a single script that would carryout all of the replace steps in sequential order since all of the HT start out as HW and all the OT start out as OW.
Perl script:
#!/usr/bin/perl
use strict;
use warnings;
$^I = '.bak'; # create a backup copy
while (<>) {
s/HW/HT/g; # do the replacement
s/OW/OT/g; # do a second replacement
print; # print to the modified file
}
Thanks for your help.
Oh and I am unfortunately limited to Python 2.7 as someone suggested code for python 3.0. I am purely a user of a university cluster but will ask about upgrading python.
Update
So what you really want to do is to change all HW to HT and OW to OT in the second column, and change column six to 2001 if for OW and 2002 for HW?
That looks like this
use strict;
use warnings 'all';
while ( <DATA> ) {
my #parts = /\s*\S+/g;
if ( $parts[1] =~ s/([HO])W/$1T/ ) {
$parts[5] = sprintf '%*d',
length $parts[5],
$1 eq 'H' ? 2002 : 2001;
}
print #parts, "\n";
}
__DATA__
6 HW 4.092000 4.750000 -0.502000 0 5 7
7 HW 5.367000 5.548000 -0.325000 0 5 6
8 OW -5.470000 5.461000 1.463000 0 9 10
9 HW -5.167000 4.571000 1.284000 0 8 10
10 HW -4.726000 6.018000 1.235000 0 8 9
11 OW -4.865000 -5.029000 -3.915000 0 12 13
12 HW -4.758000 -4.129000 -3.608000 0 11 13
output
6 HT 4.092000 4.750000 -0.502000 2002 5 7
7 HT 5.367000 5.548000 -0.325000 2002 5 6
8 OT -5.470000 5.461000 1.463000 2001 9 10
9 HT -5.167000 4.571000 1.284000 2002 8 10
10 HT -4.726000 6.018000 1.235000 2002 8 9
11 OT -4.865000 -5.029000 -3.915000 2001 12 13
12 HT -4.758000 -4.129000 -3.608000 2002 11 13
In case it is important, this solution takes care to keep the positions of all the values constant within each line
The lines to be modified are selected by checking whether the second field contains the string HT or OT. I don't know if that is adequate given the small data sample that you offer
This is for demonstration purposes. I trust you are able to modify the code to open an external file if necessary and read the data from a different file handle from DATA
use strict;
use warnings 'all';
while ( <DATA> ) {
my #parts = /\s*\S+/g;
if ( $parts[1] =~ /[HO]T/ ) {
$parts[5] = sprintf '%*d',
length $parts[5],
$parts[1] =~ /HT/ ? 2002 : 2001;
}
print #parts, "\n";
}
__DATA__
6 HT 4.092000 4.750000 -0.502000 0 5 7
7 HT 5.367000 5.548000 -0.325000 0 5 6
8 OT -5.470000 5.461000 1.463000 0 9 10
9 HT -5.167000 4.571000 1.284000 0 8 10
10 HT -4.726000 6.018000 1.235000 0 8 9
11 OT -4.865000 -5.029000 -3.915000 0 12 13
12 HT -4.758000 -4.129000 -3.608000 0 11 13
output
6 HT 4.092000 4.750000 -0.502000 2002 5 7
7 HT 5.367000 5.548000 -0.325000 2002 5 6
8 OT -5.470000 5.461000 1.463000 2001 9 10
9 HT -5.167000 4.571000 1.284000 2002 8 10
10 HT -4.726000 6.018000 1.235000 2002 8 9
11 OT -4.865000 -5.029000 -3.915000 2001 12 13
12 HT -4.758000 -4.129000 -3.608000 2002 11 13
It looks like it uses fixed-width fields, so
sub trim { $_[0] =~ s/^\s+//r =~ s/\s+\z//r }
while (<>) {
my $code = trim(substr($_, 2, 4));
if ($code eq "HW") {
substr($_, 2, 4, " HT");
substr($_, 43, 6, " 2002");
}
elsif ($code eq "OW") {
substr($_, 2, 4, " OT");
substr($_, 43, 6, " 2001");
}
print;
}
Cleaner:
sub parse {
my ( #format, #row );
while ($_[0] =~ /\G\s*(\S+)/g) {
push #row, $1;
push #format, '%'.( $+[0] - $-[0] ).'s';
}
return ( join('', #format)."\n", #row );
}
while (<>) {
my ($format, #row) = parse($_);
if ($row[1] eq "HW") { $row[1] = "HT"; $row[5] = 2002; }
elsif ($row[1] eq "OW") { $row[1] = "OT"; $row[5] = 2001; }
printf($format, #row);
}
It seems you want to use a regular expression to perform string substitution. IMO, you should do all your operations in a single substitution because it is not more complicated, it is probably faster and less error prone (because shorter).
Here is how I have understood your requirement:
In your lines, you have a H or a O followed by a T or a W that you want to force to T, then 3 fields you want to copy, then a 4th field. If the 4th field is 0, you want to replace it by 2002 or 2001 according to the letter H or O.
This gives:
while (my $line = <>) {
$line =~ s/(\s*)([HO])(T|W)(\s+\S+\s+\S+\s+\S+)(\s+\d+)/$1.$2.'T'.$4.($5 == 0 ? ($2 eq 'H' ? ' 2002' : ' 2001') : $5)/eg;
print $line;
}

Matlab - selecting Only Numbers from a sentence with Text & Numbers

I have a very large text file like this
[1] score in three tests in math :stud1 = 28 26 23
[2] score in three tests in science :stud1 = 23 28 30
[3] score in three tests in english :stud1 = 25 23 27
[4] score in three tests in history :stud1 = 27 24 21
& so on.
I want to collect all the numbers in the text file and arrange in a table like this -
stud1
28 26 23
23 28 30
25 23 27
27 24 21
Any help will be very useful.
There are lots of ways to go about this. Here is one way to do it using the str2num function:
table = [];
test = '[1] score in three tests in math :stud1 = 28 26 23';
pos = strfind(test,'='); %find the '=' token
if(pos > 0)
subStr = test(pos+1:end); %get the sub string
row = str2num(subStr); %parse the row
table(end+1,:) = row; %append the row to your table
end

A Scaling Paginator

<< 1 2 3 4 ... 15 16 17 ... 47 48 49 50 >>
<< 1 2 3 4 5 6 7 ... 47 48 49 50 >>
<< 1 2 3 4 ... 44 45 46 47 48 49 50 >>
(the bold is the selected page)
Is there any cleaver logic out there that creates scaling pagination like this? I have created one of these before but it ended up as a mess of logic statements.
The language i am doing this in now is PHP but if you have examples or tips for any language, it would be appreciated.
By scaling i mean when there are only a few pages. The pagination displays this.
<< 1 2 3 4 5 6 7 >>
As the number of pages grow to a certain point, the pagination stops showing all numbers and starts splitting them up.
<< 1 2 3 4 ... 47 48 49 50 >>
<< 1 2 3 4 5 6 ... 47 48 49 50 >>
<< 1 2 3 4 5 6 7 8 ... 47 48 49 50 >>
<< 1 2 3 4 .. 7 8 9 ... 47 48 49 50 >>
<< 1 2 3 4 .. 15 16 17 ... 47 48 49 50 >>
<< 1 2 3 4 ... 44 45 46 47 48 49 50 >>
<< 1 2 3 4 ... 47 48 49 50 >>
(note, the actual numbers and how many it shows before and after is not relevant)
Sorry for the blob of code but here goes. Hopefully the comments are enough to tell you how it works - if leave a comment and I might add some more.
/**
* Get a spread of pages, for when there are too many to list in a single <select>
* Adapted from phpMyAdmin common.lib.php PMA_pageselector function
*
* #param integer total number of items
* #param integer the current page
* #param integer the total number of pages
* #param integer the number of pages below which all pages should be listed
* #param integer the number of pages to show at the start
* #param integer the number of pages to show at the end
* #param integer how often to show pages, as a percentage
* #param integer the number to show around the current page
*/
protected function pages($rows, $pageNow = 1, $nbTotalPage = 1, $showAll = 200, $sliceStart = 5, $sliceEnd = 5, $percent = 20, $range = 10)
{
if ($nbTotalPage < $showAll)
return range(1, $nbTotalPage);
// Always show the first $sliceStart pages
$pages = range(1, $sliceStart);
// Always show last $sliceStart pages
for ($i = $nbTotalPage - $sliceEnd; $i <= $nbTotalPage; $i++)
$pages[] = $i;
$i = $sliceStart;
$x = $nbTotalPage - $sliceEnd;
$met_boundary = false;
while ($i <= $x)
{
if ($i >= ($pageNow - $range) && $i <= ($pageNow + $range))
{
// If our pageselector comes near the current page, we use 1
// counter increments
$i++;
$met_boundary = true;
}
else
{
// We add the percentate increment to our current page to
// hop to the next one in range
$i = $i + floor($nbTotalPage / $percent);
// Make sure that we do not cross our boundaries.
if ($i > ($pageNow - $range) && !$met_boundary)
$i = $pageNow - $range;
}
if ($i > 0 && $i <= $x)
$pages[] = $i;
}
// Since because of ellipsing of the current page some numbers may be double,
// we unify our array:
sort($pages);
return array_unique($pages);
}

Resources