Create new column base upon existing column in NuShell - nushell

I am new to NuShell and would like to create a quick way to locate my files by their "tags", which are in the filename. This is the method used by tagspaces. For instance, using the ls command my directory might look like this:
#
name
type
size
modified
0
Documents
dir
134 b
a week ago
1
Fake doc1 [Fake clj client project1].txt
file
15 B
2 weeks ago
2
Downloaded_doc.pdf
file
150 B
4 weeks ago
3
Fake doc2 [Important :12/31/2022 client project1].txt
file
365 B
1 week ago
However, I would like to create a new column labeled "tags" and only include the tags (the terms inside the brackets) from the name column. I think this regex will take the bracketed information (includes brackets at this point, which I don't want: \[.+?\]
I would like the end result to look like this:
#
name
tags
type
size
modified
0
Documents
dir
134 b
a week ago
1
Fake doc1 [Fake clj client project1].txt
Fake clj client project1
file
15 B
2 weeks ago
2
Downloaded_doc.pdf
file
150 B
4 weeks ago
3
Fake doc2 [Important :12/31/2022 client project1].txt
Important :12/31/2022 client project1
file
365 B
1 week ago
What would be the best way of doing this? I have read the docs but need to see more real life code before I really "get" this shell.
Thank you!

$ ls | insert tags {|item|
if $item.name =~ '\[.*\]' {
$item.name | str replace '.*\[(.*)\].*' '$1'
}
} | move tags --after name
╭───┬───────────────────────────────────────────────────────┬───────────────────────────────────────┬──────┬─────────┬───────────────╮
│ # │ name │ tags │ type │ size │ modified │
├───┼───────────────────────────────────────────────────────┼───────────────────────────────────────┼──────┼─────────┼───────────────┤
│ 0 │ Documents │ │ dir │ 4.0 KiB │ 9 minutes ago │
│ 1 │ Downloaded_doc.pdf │ │ file │ 0 B │ 9 minutes ago │
│ 2 │ Fake doc1 [Fake clj client project1].txt │ Fake clj client project1 │ file │ 0 B │ 9 minutes ago │
│ 3 │ Fake doc2 [Important :12-31-2022 client project1].txt │ Important :12-31-2022 client project1 │ file │ 0 B │ 8 minutes ago │
╰───┴───────────────────────────────────────────────────────┴───────────────────────────────────────┴──────┴─────────┴───────────────╯
The insert built-in creates a new column
It takes a closure, which runs on each item in the table. The value of $item becomes each directory entry.
All we really care about here is the $item.name. First we test it to see if it has tags in the [] format.
If it does, we set the tags column value to the text matching inside the [].
This might be a bit tricky to parse at first, but keep in mind that the last value (in this case the only value) in the closure or block is essentially its "return value". There's no need to echo or return it, since Nushell has implicit output.
For instance, if that line was simply the string "Has tags", then the tags column would include that string (rather than the actual matched tags) for each directory entry that matched the if statement.
Finally, after the tags column has been inserted, we move it directly after the name column.

Related

Filter by time range in polars rust

I've some times series data which I would like to filter out a certain time range for each day. In my case I would like to filter out everything between 09:00 - 16:00 (i.e. I want all values from 09:00:00 to 16:00:00 inclusive).
I've tried and read as much documentations I can but since most of polars documentation is written for python am I having quite hard time to find a solution.
And the documentation for polars-rust is still missing a lot of examples and explaining texts.
This is the only question I can find on the subject but it does not work:
Filter by hour of datetime in rust polars
It runs but no filtering is performed or returned.
I've tried:
let df = LazyCsvReader::new(path)
.with_parse_dates(true)
.has_header(true)
.finish()?
.collect()?;
let a = df
.lazy()
.filter(
col("time")
.dt()
.hour()
.gt_eq(9)
.and(col("time").dt().hour().lt_eq(16)),
)
.collect();
and it almost works. I get everything from 9:00-16:55.
┌─────────────────────┐
│ time │
│ --- │
│ datetime[μs] │
╞═════════════════════╡
│ 2019-09-03 09:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:30:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:35:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:40:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:25:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:30:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:45:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:55:00 │
└─────────────────────┘
I've tried to add .and(col("time").dt().minute().eq(0) but that will affect ever hour, not just the "16-hour". I could use hour().le_eq(15) as a quick fix but I will still miss the last 5 min (15:55).
I cant seem to find any way to use if/else statement in these expressions.
Could anyone point me in the right direction?
One way to solve this is to get everything before 16:00 OR at exactly 16:00:
let a = df
.lazy()
.filter(
col("time")
.dt()
.hour()
.gt_eq(9)
.and(
// less than 16 handles everything up to 15:59
col("time")
.dt()
.hour()
.lt(16)
.or(
// include also 16:00
col("time")
.dt()
.hour()
.eq(16)
.and(
col("time").dt().minute().eq(0)
)
)
),
)
.collect();

cronExpression 0 * * * *?

please tell me the meaning of "0 * * * * ?" at cronExpression.
<bean id="batchJobTrigger" class="org.springframework.scheduling.quartz.CronTriggerFactoryBean">
<property name="jobDetail" ref="batchJobDetail"/>
<property name="cronExpression">
<value>0 * * * * ?</value>
</property>
</bean>
It means "do this job at the beginning of every hour."
From Wikipedia:
# ┌───────────── min (0 - 59)
# │ ┌────────────── hour (0 - 23)
# │ │ ┌─────────────── day of month (1 - 31)
# │ │ │ ┌──────────────── month (1 - 12)
# │ │ │ │ ┌───────────────── day of week (0 - 6) (0 to 6 are Sunday to
# │ │ │ │ │ Saturday, or use names; 7 is also Sunday)
# │ │ │ │ │
# │ │ │ │ │
# * * * * * command to execute
The question mark is non-standard, and I don't think it really applies in this case. From this StackOverflow answer's reference to this webpage, we find:
? ("no specific value") - useful when you need to specify something in one of the two fields in which the character is allowed, but not the other. For example, if I want my trigger to fire on a particular day of the month (say, the 10th), but don't care what day of the week that happens to be, I would put "10" in the day-of-month field, and "?" in the day-of-week field.
Second Minute Hour Day-of-month Month Day-of-week Year (optional)
0 * * * *
First second of every minute of every hour of every day of every month
?
Specifies no particular value. This is useful when you need to specify a value for one of the two fields Day-of-month or Day-of-week, but not the other.
0 * * * * ?
means "every 1 minute"

Cron expression that spans across next day

My job requirement is:
1.Every 15 minutes
2.Everyday morning 8:00am to next day 03:00am
So the job keeps runs every 15 min from 08:00 am to next day 03:00 am.
Can this be achieved using a cron expression.
Tried this but it does not seem to help.
0 0/15 8-3 * * ?
Thanks,
Wajid
*/15 0-2,8-23 * * * test.sh
─┬── ───┬──── ┬ ┬ ┬
│ │ │ │ │
│ │ │ │ │
│ │ │ │ └───── day of week (all)
│ │ │ └─────── month (all)
│ │ └───────── day of month (all)
│ └─────────────── hour (between 0-2 and between 8-23)
└────────────────────── min (every 15 minutes)
Run every 15 minutes, from 12:00am to 02:45am and from 08:00am to 23:45 of every day.
0-2,8-23 is equivalent to 0,1,2,8,9,10,...,23 while */15 is equivalent to 0,15,30,45.
The above will not include 03:00, because the last execution would be 02:45; if we use 0-3 instead of 0-2, it would have also executed at 03:15,30,45.
To be able to include also 03:00,(02:59 actually) we need to be a bit more verbose:
14,29,44,59 0-2,8-23 * * * test.sh

Is it possible to set cron for 10:01?

Right now I have 0 22 * * *, but my script checks to see if it's between 6 pm and 10 pm, and will not run at exactly 10 pm, can I set cron to run at 10:01?
Indeed you can:
1 22 * * * command to execute
will do what you need
How to setup a cronjob in general:
# * * * * * command to execute
# │ │ │ │ │
# │ │ │ │ │
# │ │ │ │ └───── day of week (0 - 6) (0 to 6 are Sunday to Saturday, or use names; 7 is Sunday, the same as 0)
# │ │ │ └────────── month (1 - 12)
# │ │ └─────────────── day of month (1 - 31)
# │ └──────────────────── hour (0 - 23)
# └───────────────────────── min (0 - 59)
Sidenote
Special characters in cronjobs
Asterisk (*)
The asterisk indicates that the cron expression matches for all values of the field. E.g., using an asterisk in the 4th field (month) indicates every month.
Slash ( / )
Slashes describe increments of ranges. For example 3-59/15 in the 1st field (minutes) indicate the third minute of the hour and every 15 minutes thereafter. The form "*/..." is equivalent to the form "first-last/...", that is, an increment over the largest possible range of the field.
Comma ( , )
Commas are used to separate items of a list. For example, using "MON,WED,FRI" in the 5th field (day of week) means Mondays, Wednesdays and Fridays.
Hyphen ( - )
Hyphens define ranges. For example, 2000-2010 indicates every year between 2000 and 2010 AD, inclusive.
Percent ( % )
Percent-signs (%) in the command, unless escaped with backslash (), are changed into newline characters, and all data after the first % are sent to the command as standard input.
Source of explanation: http://en.wikipedia.org/wiki/Cron

Cron every day at 6 pm

I'm trying to figure out how to set cron to run every day at 6 p.m. Is this correct?
The reason I'm asking is this is for a production server, so I need to be sure.
* 18 * * *
0 18 * * * command to be executed
^ you need to set the minute, too. Else it would be running every minute on the 18th hour
How to setup a cronjob in general:
# * * * * * command to execute
# │ │ │ │ │
# │ │ │ │ │
# │ │ │ │ └───── day of week (0 - 6) (0 to 6 are Sunday to Saturday, or use names; 7 is Sunday, the same as 0)
# │ │ │ └────────── month (1 - 12)
# │ │ └─────────────── day of month (1 - 31)
# │ └──────────────────── hour (0 - 23)
# └───────────────────────── min (0 - 59)
What does Asterisk (*) mean
The asterisk indicates that the cron expression matches for all values of the field. E.g., using an asterisk in the 4th field (month) indicates every month.
Sidenote
Other special characters in cronjobs
Slash ( / )
Slashes describe increments of ranges. For example 3-59/15 in the 1st field (minutes) indicate the third minute of the hour and every 15 minutes thereafter. The form "*/..." is equivalent to the form "first-last/...", that is, an increment over the largest possible range of the field.
Comma ( , )
Commas are used to separate items of a list. For example, using "MON,WED,FRI" in the 5th field (day of week) means Mondays, Wednesdays and Fridays.
Hyphen ( - )
Hyphens define ranges. For example, 2000-2010 indicates every year between 2000 and 2010 AD, inclusive.
Percent ( % )
Percent-signs (%) in the command, unless escaped with backslash (), are changed into newline characters, and all data after the first % are sent to the command as standard input.
(source: https://en.wikipedia.org/wiki/Cron)
You should use:
0 18 * * *
This would execute the cron at the 0th minute at 6 PM. You can use a tool like this one in the future.

Resources