How to visualize a count of all values in an array field in Kibana - logstash

I am having trouble creating a particular type of visualization in Kibana. My events in Kibana are statistics on communications between two ip address. Two of the fields are lists of ports used by the particular ip address. An example of the fields would be:
ip1 = 192.168.101.2
ip2 = 192.168.101.3
ip2Ports = 80,443
ip1Ports = 80,57000,0
I would like to have a top count of all the values such as
port count
80 2
57000 1
443 1
I have been able to parse ip2Ports to be ip2Ports_List.column1, ip2Ports_List.column2, ect, but I can only choose one term with term aggregation in the visualization. I can split the chart, but that leads to separate counts for each field. If I go by the original ip2Ports field, it is just aggregated as the string such as, "80,443".
Is it even possible to create a top count visualization of fields with multiple values? If so, how would I do so. If not, is there a way to restructure my data so I can do it? Thank you!

My issue stemmed from the format of the values being sent in by Logstash. I had thought that the 'ip2Ports_List.column1' format, which was a result from using the csv filter, was part of an array. It wasn't. After analyzing it, 'ip2Ports_List.column1' didn't seem to be much different from a new field.
Elastic needed an array to give me the visualization I wanted. I wasn't sure what the best way to produce it was, so I just ended up using the ruby filter. This is what the code ended up looking like:
ruby {
code => "fields = event.get('portsIp').split(',')
event.set('portsIpArray',fields)"
}
Where 'portsIp' looked something like "80,443". Splitting it turned 'portsIp' into a Ruby array. I just set that array as the value for a new event field, 'portsIpArray'.
From there when I tried visualize the 'portsIpArray' field, it looked exactly how I wanted it to, treating each port as separate value, and still associating each port with the same event/field.
Extra:
Also something I discovered is if you're writing your code like I was, directly in the Logstash conf file, Logstash doesn't like it if you use double quotes within the double quoted code. In hindsight it makes sense, but it doesn't give a clear error so it's difficult to figure out.

Related

Excel formula that deciphers text and outputs properly

Is there a way to have excel read text and decipher whether it does or doesn’t have certain character/letters?
Here is my example sheet
I am looking for something that deciphers using
these guidelines. 1. If entry has a / then output
URL. 2. If entry is not a URL and has only numbers
and special characters then output IP. 3. If entry is
not a URL or IP and has more than 1
dots/periods/decimals then output HOST. If entry
is not a URL, IP, or HOST (or only has 1
dot/period/decimal) then output FQDN.
Here is an example of what I'm looking for
I have tried using these below:
=IF(LEN(A1)-LEN(SUBSTITUTE(A1,"/“,””))=1,"URL",IF(LEN(A1)-LEN(SUBSTITUTE(A1,”.”,""))=1,"FQDN"‚IF(LEN(A1)-LEN(SUBSTITUTE(A1,".",”"))>1,"HOST")))
That works for reading URL, HOST, and FQDN;
however, it reads IP's as HOST's.
I have also used
=IF(OR(ISNUMBER(SEARCH({"A","B","C",”D","E","F”,"G","H","I","J","K",”L”,"M",”N","O","P","Q”,"R","S","T","U","V","W",”X","Y","Z"},A1))),””,"IP")
That works for reading if an entry contains letters and if not it outputs IP.
Is there a way to combine these or simplify what I am trying to do?
Thanks!
This produces the desired output for your sample (at least)
=IF(COUNTIF(A1,"*/*"),"URL",IF(ISNUMBER(VALUE(SUBSTITUTE(A1,".",""))),"IP",IF(LEN(A1)-LEN(SUBSTITUTE(A1,".",""))>1,"HOST","FQDN")))
A possible solution (tested with O365) :
=IFS(ISNUMBER(VALUE(LEFT(A1:A5)))=TRUE,"IP",LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5,".",""))>1,"HOST",LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5,"/",""))=1,"URL",LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5,".",""))=1,"FQDN")
Classical way (in B1) :
=IF(ISERROR(SEARCH("/",A1))=FALSE,"URL",IF(ISERROR(VALUE(LEFT(A1)))=FALSE,"IP",IF(LEN(A1)-LEN(SUBSTITUTE(A1,".",""))>1,"HOST",IF(ISBLANK(A1)=TRUE,"","FQDN"))))
Output :

Create automated report from web data

I have a set of multiple API's I need to source data from and need four different data categories. This data is then used for reporting purposes in Excel.
I initially created web queries in Excel, but my Laptop just crashes because there is too many querie which have to be updated. Do you guys know a smart workaround?
This is an example of the API I will source data from (40 different ones in total)
https://api.similarweb.com/SimilarWebAddon/id.priceprice.com/all
The data points I need are:
EstimatedMonthlyVisits, TopOrganicKeywords, OrganicSearchShare, TrafficSources
Any ideas how I can create an automated report which queries the above data on request?
Thanks so much.
If Excel is crashing due to the demand, and that doesn't surprise me, you should consider using Python or R for this task.
install.packages("XML")
install.packages("plyr")
install.packages("ggplot2")
install.packages("gridExtra")
require("XML")
require("plyr")
require("ggplot2")
require("gridExtra")
Next we need to set our working directory and parse the XML file as a matter of practice, so we're sure that R can access the data within the file. This is basically reading the file into R. Then, just to confirm that R knows our file is in XML, we check the class. Indeed, R is aware that it's XML.
setwd("C:/Users/Tobi/Documents/R/InformIT") #you will need to change the filepath on your machine
xmlfile=xmlParse("pubmed_sample.xml")
class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument"
Now we can begin to explore our XML. Perhaps we want to confirm that our HTTP query on Entrez pulled the correct results, just as when we query PubMed's website. We start by looking at the contents of the first node or root, PubmedArticleSet. We can also find out how many child nodes the root has and their names. This process corresponds to checking how many entries are in the XML file. The root's child nodes are all named PubmedArticle.
xmltop = xmlRoot(xmlfile) #gives content of root
class(xmltop)#"XMLInternalElementNode" "XMLInternalNode" "XMLAbstractNode"
xmlName(xmltop) #give name of node, PubmedArticleSet
xmlSize(xmltop) #how many children in node, 19
xmlName(xmltop[[1]]) #name of root's children
To see the first two entries, we can do the following.
# have a look at the content of the first child entry
xmltop[[1]]
# have a look at the content of the 2nd child entry
xmltop[[2]]
Our exploration continues by looking at subnodes of the root. As with the root node, we can list the name and size of the subnodes as well as their attributes. In this case, the subnodes are MedlineCitation and PubmedData.
#Root Node's children
xmlSize(xmltop[[1]]) #number of nodes in each child
xmlSApply(xmltop[[1]], xmlName) #name(s)
xmlSApply(xmltop[[1]], xmlAttrs) #attribute(s)
xmlSApply(xmltop[[1]], xmlSize) #size
We can also separate each of the 19 entries by these subnodes. Here we do so for the first and second entries:
#take a look at the MedlineCitation subnode of 1st child
xmltop[[1]][[1]]
#take a look at the PubmedData subnode of 1st child
xmltop[[1]][[2]]
#subnodes of 2nd child
xmltop[[2]][[1]]
xmltop[[2]][[2]]
The separation of entries is really just us, indexing into the tree structure of the XML. We can continue to do this until we exhaust a path—or, in XML terminology, reach the end of the branch. We can do this via the numbers of the child nodes or their actual names:
#we can keep going till we reach the end of a branch
xmltop[[1]][[1]][[5]][[2]] #title of first article
xmltop[['PubmedArticle']][['MedlineCitation']][['Article']][['ArticleTitle']] #same command, but more readable
Finally, we can transform the XML into a more familiar structure—a dataframe. Our command completes with errors due to non-uniform formatting of data and nodes. So we must check that all the data from the XML is properly inputted into our dataframe. Indeed, there are duplicate rows, due to the creation of separate rows for tag attributes. For instance, the ELocationID node has two attributes, ValidYN and EIDType. Take the time to note how the duplicates arise from this separation.
#Turning XML into a dataframe
Madhu2012=ldply(xmlToList("pubmed_sample.xml"), data.frame) #completes with errors: "row names were found from a short variable and have been discarded"
View(Madhu2012) #for easy checking that the data is properly formatted
Madhu2012.Clean=Madhu2012[Madhu2012[25]=='Y',] #gets rid of duplicated rows
Here is a link that should help you get started.
http://www.informit.com/articles/article.aspx?p=2215520
If you have never used R before, it will take a little getting used to, but it's worth it. I've been using it for a few years now and when compared to Excel, I have seen R perform anywhere from a couple hundred percent faster to many thousands of percent faster than Excel. Good luck.

Combining phrases from list of words Python3

doing my best to grab information out of a lot of pdf files. Have them in a dictionary format where the key is a given date and the values are a list of occupations.
looks like this when proper:
'12/29/2014': [['COUNSELING',
'NURSING',
'NURSING',
'NURSING',
'NURSING',
'NURSING']]
However, occasionally there are occupations with several words which cannot be reliably understood in single word-form, such as this:
'11/03/2014': [['DENTISTRY',
'OSTEOPATHIC',
'MEDICINE',
'SURGERY',
'SOCIAL',
'SPEECH-LANGUAGE',
'PATHOLOGY']]
Notice that "osteopathic medicine & surgery" and "speech-language pathology" are the full text for two of these entries. This gets hairier when we also have examples of just "osteopathic medicine" or even "medicine."
So my question is this - How should I go about testing combinations of these words to see if they match more complex occupational titles? I can use the same order of the words, as I have maintained that from the source.
Thanks!

Using indexed types for ElasticSearch in Titan

I currently have a VM running Titan over a local Cassandra backend and would like the ability to use ElasticSearch to index strings using CONTAINS matches and regular expressions. Here's what I have so far:
After titan.sh is run, a Groovy script is used to load in the data from separate vertex and edge files. The first stage of this script loads the graph from Titan and sets up the ES properties:
config.setProperty("storage.backend","cassandra")
config.setProperty("storage.hostname","127.0.0.1")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","db/es")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")
The second part of the script sets up the indexed types:
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make();
The third part loads in the data from the CSV files, this has been tested and works fine.
My problem is, I don't seem to be able to use the ElasticSearch functions when I do a Gremlin query. For example:
g.E.has("property",CONTAINS,"test")
returns 0 results, even though I know this field contains the string "test" for that property at least once. Weirder still, when I change CONTAINS to something that isn't recognised by ElasticSearch I get a "no such property" error. I can also perform exact string matches and any numerical comparisons including greater or less than, however I expect the default indexing method is being used over ElasticSearch in these instances.
Due to the lack of errors when I try to run a more advanced ES query, I am at a loss on what is causing the problem here. Is there anything I may have missed?
Thanks,
Adam
I'm not quite sure what's going wrong in your code. From your description everything looks fine. Can you try the follwing script (just paste it into your Gremlin REPL):
config = new BaseConfiguration()
config.setProperty("storage.backend","inmemory")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","/tmp/es-so")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")
g = TitanFactory.open(config)
g.makeKey("name").dataType(String.class).make()
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make()
g.makeLabel("knows").make()
g.commit()
alice = g.addVertex(["name":"alice"])
bob = g.addVertex(["name":"bob"])
alice.addEdge("knows", bob, ["property":"foo test bar"])
g.commit()
// test queries
g.E.has("property",CONTAINS,"test")
g.query().has("property",CONTAINS,"test").edges()
The last 2 lines should return something like e[1t-4-1w][4-knows-8]. If that works and you still can't figure out what's wrong in your code, it would be good if you can share your full code (e.g. in Github or in a Gist).
Cheers,
Daniel

Can I import SAP tables that were exported by SE16?

I have exported the contents of a table with transaction SE16, by selecting all the entries and going selecting Download, unconverted.
I'd like to import these entries into another system (where the same table exists and is active).
Furthermore, when I import, there's a possibility that the specific key already exists for a number of entries (old entries).
Other entries won't have a field with the same key present in the table where they're to be imported (new entries).
Is there a way to easily update my table in the second system with the file provided from the first system? If needed, I can export the data in the 3 other format types (Spreadsheet, Rich text format and HTML format). It seems to me though like the spreadsheet and rich text formats sometimes corrupt the data, and the html is far too verbose.
[EDIT]
As per popular demand, the table i'm trying to export / import is a Z table whose fields are all numeric, character, date or time fields (flat data types).
I'm trying to do it like this because the clients don't have any basis resource to help them transport, and would like to "kinna" automate the process of updating one of the tables in one system.
At the moment it's a business request to do it like this, but I'm open to suggestions (and the clients are open too)
Edit
Ok I doubt that what you describe in your comment exists out of the box, but you can easily write something like that:
Create a method (or function module if that floats your boat) that accepts the following:
iv_table name TYPE string and
iv_filename TYPE string
This would be the method:
method upload_table.
data: lt_table type ref to data,
lx_root type ref to cx_root.
field-symbols: <table> type standard table.
try.
create data lt_table type table of (iv_table_name).
assign lt_table->* to <table>.
call method cl_gui_frontend_services=>gui_upload
exporting
filename = iv_filename
has_field_separator = abap_true
changing
data_tab = <table>
exceptions
others = 4.
if sy-subrc <> 0.
"Some appropriate error handling
"message id sy-msgid type 'I'
" number sy-msgno
" with sy-msgv1 sy-msgv2
" sy-msgv3 sy-msgv4.
return.
endif.
modify (p_name) from table <table>.
"write: / sy-tabix, ' entries updated'.
catch cx_root into lx_root.
"lv_text = lx_root->get_text( ).
"some appropriate error handling
return.
endtry.
endmethod.
This would still require that you make sure that the exported file matches the table that you want to import. However cl_gui_frontend_services=>gui_upload should return sy-subrc > 0 in that case, so you can bail out before you corrupt any data.
Original Answer:
I'll assume that you want to update a z-table and not a SAP standard table.
You will probably have to format your datafile a little bit to make it tab or comma delimited.
You can then upload the data file using cl_gui_frontend_services=>gui_upload
Then if you want to overwrite the existing data in the table you can use
modify zmydbtab from table it_importeddata.
If you do not want to overwrite existing entries you can use.
insert zmydbtab from table it_importeddata.
You will get a return code of sy-subrc = 4 if any of the keys already exists, but any new entries will be inserted.
Note
There are many reasons why you would NOT do this for a SAP-standard table. Most prominent is that there is almost always more to the data-model than what we are aware of. Also when creating transactional data, there are often follow-on events or workflow that kicks off, that will not be the case if you're updating the database directly. As a rule of thumb, it is usually a bad idea to update SAP standard tables directly.
In that case try to find a BADI, or if that's not available, record a BDC and do the updates that way.
If the system landscape was setup correctly, your client would not need any kind of basis operations support whatsoever to perform the transports. So instead of re-inventing the wheel, I'd strongly suggest to catch up on what the CTS and TMS can do once they're setup with sensible settings.

Resources