Splunk: Extracting values for table - search

I have events in my logs that look like
{
linesPerSec: 1694.67
message: Status:
rowCount: 35600000
severity: info
}
when i make a search like:
index="apps" app="my-api" message="*Status:*" | table _time, linesPerSec, rowCount
This is what my table ends up looking like
How do I get the number value away from the key for both linesPerSec and rowCount? I want to see all instances. I tried using values(linesPerSec) but that seemed to aggregate only unique.
Thanks,
Nate

Answer with explanation can be found here: https://answers.splunk.com/answers/756524/extracting-values-for-table.html

Is that your complete query? You mention using values(), but there's no stats command in your search. BTW, values() displays unique values; use list() to see all of them.
You may be able to use extract to get the numbers, but I think rex will work better. Try this search:
index="apps" app="my-api" message="*Status:*"
| rex "linesPerSec:\s+(?<linesPerSec>\d+\.\d+)"
| rex "rowCount:\s+(?<rowCount>\d+)"
| table _time, linesPerSec, rowCount

Related

Kusto query language split # character and take last item

If I have a string for example:
"this.is.a.string.and.I.need.the.last.part"
I am trying to get the last part of the string after the last ".", which in this case is "part"
How to I achieve this?
One way I tried was to split the string on ".", I get a array back, but then I don't know how to retrieve the last item in the array.
| extend ToSplitstring = split("this.is.a.string.and.I.need.the.last.part", ".")
gives me:
["this", "is","a","string","and","I","need","the","last", "part"]
and a second try I have tried this:
| extend ToSubstring = substring(myString, lastindexof(myString, ".")+1)
but Kusto do not have a function of lastindexof.
Anyone with tips?
you can access the last member of the array using a negative index -1.
e.g. this:
print split("this.is.a.string.and.I.need.the.last.part", ".")[-1]
returns a single table, with a single column and a single record, with the value part
You can try the code below, and feel free to change it to meet your need:
let lastIndexof = (input:string, lookup: string) {
indexof(input, lookup, 0, -1, countof(input,lookup))
};
your_table_name
| extend ToSubstring = substring("this.is.a.string.and.I.need.the.last.part", lastIndexof("this.is.a.string.and.I.need.the.last.part", ".")+1)

PowerShell on CSV file - looking for string depending on string

I need your help regarding PowerShell programming on CSV file.
I've made some searches but cannot find what I'm looking for (or perhaps I don't know the technical terms). Basically, I have an Excel workbook with large amount of data (more or less 38 columns x 350.000 rows), and there are a couple of formulas that take hours to calculate.
I was first wondering if PowerShell could speed up a bit the calculation compared to Excel. The calculations taking most of my time are in fact not that complex (at least at first glance). My data is more or less constructed like this:
Ref Title
----- --------------------------
A/001 "free_text"
A/002 "free_text A/001 free_text"
... ...
A/005 "free_text A/004 free_text"
A/006 "free_text"
B/001 "free_text"
B/002 "free_text"
C/001 "free_text"
C/002 "free_text"
...
C/050 "free_text C/047 free_text"
... ...
C/103 "free_text"
D/001 "free_text"
D/002 "free_text D/001 free_text"
... ....
Basically the data is as follows:
the Ref field contains unique values, in {letter}/{incremental value} format.
In some rows, the Title field may call up one of the Ref data. For example, in line 2, the Title calls for the A/001 Ref. In the last row, the Title calls for the D/001 Ref, etc.
There is no logic pattern defining when this ref could be called up in a title. This is random.
However, what I'm 100% sure of is the following:
The Ref called in the Title is always belonging to the same {letter} block. For example: the string 'C/047' in the Title field can only be found in the block where the Ref {letter} is C.
The Ref called in the Title will always be located 'after' (or in a lower row) than the Ref it refers to. In other words, I cannot have a line with following pattern:
Ref Title
------------ -----------------------------------------
{letter/i} {free_text {letter/j} free_text} with j<i
→ This is not possible.
→ j is always > i
I've used these characteristics in Excel to minimize my lookup arrays. But it still takes an hour to calculate everything.
I've therefore looked into PowerShell, and started to 'play' a bit with the CSV, and looping with the ForEach-Object hoping I would have quicker results. Up to now I basically ended-up looping twice on my CSV file.
$CSV1 = myfile.csv
$CSV2 = myfile.csv
$CSV1 | ForEach-Object {
# find Title
$TitSearch = $_.$Ref
$CSV2 | ForEach-Object {
if ($_.$Title -eq $TitSearch) {
myinstructions
}
}
}
It works but it's really really really long. So I then tried the following instead of using the $CSV2 | ForEach...:
$CSV | where {$_.$Title -eq $TitleSearch} | % $Ref
In either case, it's too long and not efficient at all. Additionally with these 2 solutions, I'm not using above characteristics which could reduce the lookup array and as already stated, it seems I end up looping twice on the CSV file from its beginning up to the end.
Questions:
Is there a leaner way to do this?
Am I wasting my time with PowerShell?
I though about creating 1 file per Ref {letter} block (1 file for block A, 1 for B, etc...). However I have about 50.000 blocks to create. Or create them one by one, carry out the analysis, put the results in a new file, and delete them. Would that be quicker?
Note: this is for work, to be used by other colleagues, and Excel and PowerShell are really the only softwares we may use. I know VBA but ok... At the end I'm curious about how and if this can be solved in a simple manner using PowerShell.
As far as I can see your base algorithm do N^2 iteration (~120 billion). There is a standard way to make it efficient - you need to build a hashtable first. Hashtable is a key/value storage, and look up is pretty much instantaneous, so algorithm's time complexity will become ~N.
Powershell has built-in data type for that. In your case the key would be ref, and the value an array of cell data (assuming your table is smth like: ref, title, col1, ..., colN)
$hash = #{}
foreach($row in $table} {$hash.Add($row.ref, #($row.title, $row.col1, ...)}
#it will take 350K steps to generate it
#then you can iterate over it again
foreach($key in $hash.Keys) {
$key # access current ref
$rowData = $hash.$key # access to current row elements (by index)
$refRowData = $hash[$rowData[$j]] # lookup from other rows, assuming lookup reference is in some column
}
So it's a general idea how to solve the time issue. To be honest I don't believe you need to recreate a wheel and code it yourself. What you need is a relational database. Since you have excel, you should have MS ACCESS too. Just import your data in there, make ref and title an index, then all you need to do is self join. MS Access suck, but I'm sure it will handle 350K row just fine.
Ideally you'd need to get a database on some corporate MSSQL server (open a ticket, talk to your manger, etc). It will calculate all that in seconds, and then you can link the output to a spreadsheet as well.

Sorting based on certain value in a string.

I have a file with contents like this :
666500872101_002.log
738500861101_003.log
738500861101_002.log
666500872101_001.log
741500881101_001.log
738500861101_001.log
741500881101_002.log
666500872101_003.log
741500881101_003.log
666500872101_004.log
I need to Sort the rows based on the values in fields 5 to 8, i.e. 741500881101_003.log at first and then based on the part number of log i.e.
741500881101_003.log to get something like this :
738500861101_001.log
738500861101_002.log
738500861101_003.log
666500872101_001.log
666500872101_002.log
666500872101_003.log
666500872101_004.log
741500881101_001.log
741500881101_002.log
741500881101_003.log
Can't get any good results using sort please help.
You can use the sort command wit the following options:
sort -n -k1.5,1.8 -n -k1.14,1.16 fileToSort.log
Options:
-n for numerical sorting
-k1.5,1.8 and -k1.14,1.16 to define your sorting keys
Example:
$ sort -n -k1.5,1.8 -n -k1.14,1.16 fileToSort
738500861101_001.log
738500861101_002.log
738500861101_003.log
666500872101_001.log
666500872101_002.log
666500872101_003.log
666500872101_004.log
741500881101_001.log
741500881101_002.log
741500881101_003.log
I solved this problem as part of learning SPARK. I am not UNIX shell programmer. Hence thought of solving the problem using spark
val logList = Array("666500872101_002.log","738500861101_003.log","738500861101_002.log","666500872101_001.log","741500881101_001.log","738500861101_001.log","741500881101_002.log","666500872101_003.log","741500881101_003.log","666500872101_004.log")
val logListRDD = sc.parallelize(logList)
logListRDD.map(x=>((x.substring(4,8), x.slice(x.indexOfSlice("_") +1, x.indexOfSlice("."))),x)).sortByKey().values.collect.take(20)
Output:
Array[String] = Array(738500861101_001.log, 738500861101_002.log, 738500861101_003.log, 666500872101_001.log, 666500872101_002.log, 666500872101_003.log, 666500872101_004.log, 741500881101_001.log, 741500881101_002.log, 741500881101_003.log)
Explaining what I did
sc.parallelize(logList) - is the step to create an RDD which is the core component of spark.
map(x=>((x.substring(4,8), x.slice(x.indexOfSlice("_") +1, x.indexOfSlice("."))),x)) - This extracts the contents from Array and generates a key value pair. In our case, value is the ***.log value and key is an Array of Substrings based on which we wanted to sort (0086, 001). KeyValue pair will look like [(0086, 001),738500861101_001.log]
sortByKey() - Sorts the data based on the Key generated above
values - gets the value corresponding to the key
collect.take(20) -> Displays the o/p on screen

knex.raw with existing columns array using .first statement

Here is existing code:
knex("products")
.first("id", "name", "ingredients")
...
So, currently it just uses array of column names.
Now I want to add calculated column here. It would consists of "constant" + product.id.
For product with id 1 it would be "api/v1/img/1".
For product with id 222 it would be "api/v1/img/222".
Alias of it should be "image".
I have to use knex.raw somehow. Do not understand how and what is the correct syntax to use it with .first().
I'm sorry, I'm unable to understand the question. What kind of result are you trying to achieve? maybe something like this?
knex("products")
.select('*', knex.raw(`'api/v1/img' || ?? as computed`, ['products.id']))
.first()
Like this: https://runkit.com/embed/9okme0czge8z

Pycassa: how to query parts of a Composite Type

Basically I'm asking the same thing as in this question but for the Python Cassandra library, PyCassa.
Lets say you have a composite type storing data like this:
[20120228:finalscore] = '31-17'
[20120228:halftimescore]= '17-17'
[20120221:finalscore] = '3-14'
[20120221:halftimescore]= '3-0'
[20120216:finalscore] = '54-0'
[20120216:halftimescore]= '42-0'
So, I know I can easily slice based off of the first part of the composite type by doing:
>>> cf.get('1234', column_start('20120216',), column_finish('20120221',))
OrderedDict([((u'20120216', u'finalscore'), u'54-0'),
((u'20120216', u'halftimescore'), u'42-0')])
But if I only want the finalscore, I would assume I could do:
>>> cf.get('1234', column_start('20120216', 'finalscore'),
column_finish('20120221', 'finalscore'))
To get:
OrderedDict([((u'20120216', u'finalscore'), u'54-0')])
But instead, I get:
OrderedDict([((u'20120216', u'finalscore'), u'54-0'),
((u'20120216', u'halftimescore'), u'42-0')])
Same as the 1st call.
Am I doing something wrong? Should this work? Or is there some syntax using the cf.get(... columns=[('20120216', 'finalscore')]) ? I tried that too and got an exception.
According to http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1, I should be able to do something like this...
Thanks
If know all the components of the composite column then you should the 'columns' option:
cf.get('1234', columns=[('20120216', 'finalscore')])
You said you got an error trying to do this, but I would suggest trying again. It works fine for me.
When you are slicing composite columns you need to think about how they are sorted. Composite columns sort starting first with the left most component, and then sorting each component toward the right. So In your example the columns would look like this:
+------------+---------------+------------+---------------+------------+----------------+
| 20120216 | 20120216 | 20120221 | 20120221 | 20120228 | 20120228 |
| finalscore | halftimescore | finalscore | halftimescore | finalscore | halftimescore |
+------------+---------------+------------+---------------+------------+----------------+
Thus when you slice from ('20120216', 'finalscore') to ('20120221', 'finalscore') you get both values for '20120216'. To make your query work as you want it to you could change the column_finish to ('20120216', 'halftimescore').

Resources