What are sub-elements and sub-fields in iso8538 message? - iso

I have some basic knowledge in ISO 8583 message. I have seen data-elements and parsing them. Now I came across a message like DE48 SE62 SF4. I'm curious to know about the spceification of SE and SF. I tried to find what SE and SF are. But I could only end up finding their abbreviation and could not get much information about them. Could anyone please elaborate on this? It will be of great help to me.
Thanks in advance!

Data Element (DE) 48 used for Private Additional data in all original specifications from 1987, 1993 and 2003 years. The definition of Sub-Elements (SE) vary from one customized specification to another as well the format and structure of data inside this data element.
Without having full message dump it is not possible to detect the specification you need to follow.
Feel free to post your test ISO 8583 message here and get additional explanation or use our online service to browse message data definitions, parse the financial messages to sub-elements and sub-fields for known messages.

Related

Problem understanding dataset to start analytics

I have a data set (I have provided a link to a photo of the first 10 lines of the data). For my project I need to filter the “MS” column for “ES” (Spain) and I need to analyze the data and get some conclusions about the data and graphs, Visualizations ect.
But my problem is by looking at the data I just don’t know where to start because it’s just seems like there is nothing to do with the data that is there.
So I have come to all of you to see if you can give em some suggestions and to see if anyone can see any sort of starting point.
These are the details of my project if it helps
Objectives: Represent the protected spaces of Spain and explain the data
Evaluation criteria: Representation is comprehensive, “nice looking” (Visualizations), use of analytics (sql, python, excel ect). Adding to this a report with resources and the conclusion of the análisis with adición representation.
Notes: it’s importe that the cliente “ teachers” (also people from company’s in Spain) can understand the analysis with the report and Visualizations.
Thanks! I have 1863 lines of data (already filtered to ES (SPAIN))

Best practices for creating a customized report based on user form input?

My Question
What are the best practices for creating a customized report based on a user form input? Specifically, how do I create an easy to maintain system which takes user input which is collected in a form and generate multiple paragraphs that explains the results of analysis.
Background
I am working on a very large multiyear project with a startup (who is my client). My job is to program analysis and generate reports to users. The pipeline for data looks like this:
Users enter information into a form -> results are calculated based on user input -> reports are displayed to users that share analysis.
It is really important to my client that some of the analysis results are displayed in paragraphs in a non-formal user friendly tone. The challenge is that the form and analysis are quite complex and will only get more complex over time. An example of the type of template for the paragraphs looks something like this:
resultsParagraphText=`Hi ${userName}. We found that the best ice cream flavour for you is ${bestIceCreamFlavor}. These other flavors ${otherFlavors} might be good for you. Here are the reasons why you might enjoy these flavors: ${reasonsWhyGoodFlavors}.
However we would not recommend these other flavors ${badFlavors}. Here are the reasons you should avoid this bad flavors: ${reasonsWhyBadFlavors}.`
These results paragraphs, of which there of many, have several minor problems which combined are significant:
If there is a bug in the code, minor visual errors would be visible to end users (capitalization errors, missing/extra commas, and so on).
A lot of string comparisons (e.g. if answers.previousFlavors.includes("Vanilla")) are required to generate the results paragraphs. Minor errors in the forms (e.g. vanilla in the form is not capitalized so answers.previousFlavors.includes("Vanilla") returns false even when user enters vanilla.) can cause errors in the results paragraph.
Changes in different parts of the project (form, analysis) directly effect how the results paragraph is made. Bad types, differences in string values, null or undefined values not being caught directly have an impact on how the results paragraph is made.
There are many edge cases (e.g. What if the user has no other suitable good flavors for them? The the sentence These other flavors ${otherFlavors} might be good for you. needs to be excluded).
It is hard to write paragraphs that use templates and have a non-formal tone.
and so on.
I have charts and other types of ways to display results and have explained to the client the challenges of sharing the information in paragraph form.
What I am looking for
I need examples, how tos, best practices on how to build a maintainable system for generating customized paragraphs based on user input. I know how to solve each of the individual issues (as they are fairly simple) but in a large project this will become very hard to maintain.
Notes
I have no clue what tags to use for the post. Feel free to edit/add tags if you know more appropriate ones.
The project is planning to use machine learning in the future other parts of the project. If there is a ML/AI solution that is useful please tell me.
I am working primarily in JavaScript, Python, C, and R, but if there is a library or tool in any other language please tell me. Finding a solution is very important to me and I would be willing to learn a lot find a best solution.
To avoid this question being removed because I have rephrased it to avoid asking for personal opinion, instead asking for existing examples or how tos. I can also imagine that others might find a solution fairly useful. If you can edit it to make the question less subjective please do so.
If you have any questions or need clarification feel free to ask. Any help is appreciated.

What area of machine learning should I look into to automatically extract certain info from messages

I have an app that extracts information from incoming messages. The messages all contain the same information, but they have different forms depending on the source that sent them.
Example:
Message from source A :
A: You spent $50.00 at Macy's on 2/20/12
Message from source B :
Purchase, $50.00, Macy's, 2Feb2012, Balance $5000.00
Every message from a single source has the same form though. So at the moment, I'm doing it by writing a set of regular expressions to first identify which message I'm trying to decode (i.e. what source it came from so I know what the form of the message is), and then extracting the necessary information from the message (in the above example, I want to know the transaction amount, the store where the transaction happened, and the date). If I discover a new source for a message, or a source changes the format of their message (doesn't happen very often, but could happen), I need to manually write the regular expressions for that message. I'm sure however that I could automate this using some kind of machine learning technique. I just don't know much about machine learning, and I don't know where to even start looking for a technique that would apply to my problem. I would like someone to just point me in the right direction on where to start reading.
In order to detect and label amounts, dates, person names and similar information you can use a technique called Named Entity Recognition. The Stanford Named Entity Recognizer comes with pretrained, ready to use models.
You also use whatever labeled data you have generated so far to learn a custom model for your application. The standard techniques used for this purpose are Conditional Random Fields or Sequence Perceptron. There are many toolkits implementing these models, including:
Wapiti - A simple and fast discriminative sequence labelling toolkit.
Sequor - sequence labeler based on Collins's (2002) perceptron.

Open source projects for email scrubbing generating structured data from unstructured source?

Don't know where to start on this one so hopefully you guys can clear up my question. I have project where email will be searched for specific words/patterns and stored in a structured manner. Something that is done with Trip it.
The article states that they developed a DataMapper
The DataMapper is responsible for taking inbound email messages
addressed to plans [at] tripit.com and transforming them from the
semi-structured format you see in your mail reader into a highly
structured XML document.
There is a comment that also states
If you're looking to build this yourself, reading a little bit about
Wrappers and Wrapper Induction might be helpful
I Googled and read about wrapper induction but it was just too broad of a definition and didn't help me understand how one would go about solving such problem.
Is there some open source project out there that does similar things?
There are a couple of different ways and things you can do to accomplish this.
The first part, which involves getting access to the email content I'll not answer here. Basically, I'll assume that you have access to the text of emails, and if you don't there are some libraries that allow you to connect java to an email box like camel (http://camel.apache.org/mail.html).
So now you've got the email so then what?
A handy thing that could help is that lingpipe (http://alias-i.com/lingpipe/) has an entity recognizer that you can populate with your own terms. Specifically, look at some of their extraction tutorials and their dictionary extractor (http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html) So inside of the lingpipe dictionary extractor (http://alias-i.com/lingpipe/docs/api/com/aliasi/dict/ExactDictionaryChunker.html) you'd simply import the terms you're interested in and use that to associate labels with an email.
You might also find the following question helpful: Dictionary-Based Named Entity Recognition with zero edit distance: LingPipe, Lucene or what?
Really a very broad question, but I can try to give you some general ideas, which might be enough to get started. Basically, it sounds like you're talking about an elaborate parsing problem - scanning through the text and looking to apply meaning to specific chunks. Depending on what exactly you're looking for, you might get some good mileage out of a few regular expressions to start - things like phone numbers, email addresses, and dates have fairly standard structures that should be matchable. Other data points might benefit from some indicator words - the phrase "departing from" might indicate that what follows is an address. The natural language processing community also has a large tool set available for text processing - check out things like parts of speech taggers and semantic analyzers if they're appropriate to what you're trying to do.
Armed with those techniques, you can follow a basic iterative development process: For each data point in your expected output structure, define some simple rules for how to capture it. Then, run the application over a batch of test data and see which samples didn't capture that datum. Look at the samples and revise your rules to catch those samples. Repeat until the extractor reaches an acceptable level of accuracy.
Depending on the specifics of your problem, there may be machine learning techniques that can automate much of that process for you.

Integrating with 500+ applications

Our customers use 500+ applications and we would like to integrate these applications with our. What is the best way to do that? These applications are time registration applications and common for most of them is that they can export to csv or similar, some of them are actually home-brewed excel sheets where time is registered.
The best idea so far is to create our own excel sheet, which can be used to integrate with all these applications. The integrations could be in the form of cells containing something like ='[c:\export.csv]rawdata'!$A$3 Where export.csv is the csv file exported from the time registration applications. Can you see a better way to integrate against all these applications? It should be mentioned that almost all our customers have Microsoft Office.
Edit: Answers to the excellent questions from Pontus Gagge:
How similar are the data in the different applications?
I assume that since they time registration applications, they will have some similarities, but I assume that some will register the how long time one has worked in total for a whole month, while others will spesify for each day. If Excel is chosen, I believe that many of the differences could be ironed out using basic formulas.
What quality is the data?
The quality of the data can vary so basic validation must be undertaken, a good way is also to make it transparent for the customers, how our application understands their input, so they are responsible.
How large amounts of data are you talking about?
There will be information about the time worked for up to 50 employees.
Is the integration one-way only?
Yes
With what frequency should information be transferred?
Once per month (when they need to pay salaries).
How often do the applications themselves change, and how often does your product change?
If their application is a home-brewed Excel sheet, then I assume it will change once a year (due for example a mistake someone). If it is a standard proper time registration application, then I do not believe they are updated more often than every fifth year or so, as it is a very stabile concept.
Should the integration be fully automatic or can your end users trigger a data transfer?
They can surely trigger data transfer. The users are often dedicated to the process so they can be trained at doing it, which means that they could make up to, say 30, mouse clicks in order to integrate each month.
Will the customers have somebody to monitor the integrations?
As we have many customers, many of them should be able to undertake the integration themselves. We will though be able to assist them over the telephone. We cannot, though undertake the integration ourselves because we would then be responsible for any errors due to user mistakes, etc.
Does the phrase 'integration spaghetti' mean anything to you...?
I am looking for ideas from the best chefs to cook a nice large portion of that.
You need to come up with a common data format, and a way to translate the individual data formats to the common format. There's really no way around this - any solution you come up with will have to do this in one way or the other. It's the essential complexity of what you're doing.
The bigger issue is actually variances within the source data, in terms of how things like dates are stored, missing columns, etc. Doing a generic conversion for CSV to move columns around is comparatively easy.
I would also look at CSV and then use an OLEDB connection against the CSV file for importing.
If you try to make something that can interface to any data structure in the universe (and 500 is plenty close enough), it is guaranteed to be a maintenance nightmare. Instead I would approach this from multiple angles:
Devise an interface into which a human can enter this data already in the proper format. With 500+ clients, I'd make this a small, raw but functional browser based site that users can use to enter this information manally. This is the fall-back. At the end of the day, a human can re-key the information into the site and solve the import issue. Ideally, everyone would use this instead of their own format. Data entry people are cheap.
Similar to above, but expanded, I would develop a standard application or standardize on an off-the-shelf application that can be used to replace their existing format. This might take more time than #1. The goal would be to only do one-time imports of these varying data schemas into the application and be done with them for good.
The nice thing about spreadsheets is that you can do anything anywhere. The bad thing about spreadsheets is that you can do anything anywhere. With CSV or a spreadsheet there is simply no way to enforce data integrity and thus consistency (which is the primary goal) on the data. If the source data is already in a database, then that is obviously simpler.
I would be inclined to use database format into which each of these files need to be converted rather than a spreadsheet (e.g. use something like Jet (MDB)). If you have non-Windows users then that will make it harder and you might have to use a spreadsheet. The problem is that it is too easy for the user to change their source structure, break their upload and come crying to you. If a given end user has a resident expert, they can find a way of importing the data into that database format . If you are that expert, then I would on a case-by-case basis, write something that would import into that database format. XML would be the other choice, but that will likely take more coding than an import/export into a database format.
Standardization of the apps (even having all the sources in a database format instead of a spreadsheet would help) and control over the data schema is the ultimate goal rather than permitting a gazillion formats. There really is no nice answer other than standardization. Otherwise, you are having to write a converter for every Tom-Dick-and-Harry format and again when someone changes the source format.
With a multitude of data sources mapping each one correctly to an intermediate format is not trivial. Regular expressions are good with a finite set of known data formats. Multipass can help when data is ambiguous without context (month,day fields and have several days of data), and also help defeat data entry errors. But it seems as this data is connected to salaries there needs a good reliable transfer.
An import configuring trick
Get the customer to make a set of training data in the application. It should have a "predefined unique date" and each subsequent data field have a number corresponding to the target data field in your application. On importing your application needs to recognise the predefined date, determine the unique translation required and effect the displaying/saving of this "mapping key", and stop the import. eg If you expect "Duration hours" in field two then get the user to enter 2 in the relevant field which might be "Attendance hours".
On subsequent runs, and with the mapping definition key, import becomes a fairly easy process of translation.
Note on terms
"predefined date" - must be historical, say founding date of your company?, might need to be in PC clock settable range.
"mapping key" - could be string of hex digits and nybble based so tractable to workout
The entered code can be extended to signify required conversions ie customer's application has durations in days and your application expects it in hours.
Interfacing with windows programs (in order if increasing fragility)
Ye Olde saving as CSV file
Print to operating system printer that is setup as a text file/pdf, then scavenge the data out of that
Extract data via the application interface control, typically ActiveX for several windows programs ie like Matlab's Spreadsheet Link
Read native file format xls format ie like Matlab's xlsread
Add an additional intermediate spreadsheet sheet that has extended cell references ie ='[filename]rawdata'!$A$3
Have a look at Teiid by JBoss: http://jboss.org/teiid
Also consider using SOA - e.g., if you're on Java, try JBoss SOA platform: http://www.jboss.com/resources/soa/?intcmp=1004
Use a simple XML format. A non-technical person can easily understand a simple XML format (and could even identify basic problems with XML documents that are not well-formed).
Maybe use a DTD (or even better an XML schema) to do very basic validation, and then supplement this with an XSL stylesheet to do more validation with better error reporting. (An XSL stylesheet simply converts from XML to something else and so can be generate readable error messages.)
The advantage of this approach is that web browsers such as Internet Explorer can apply the XSL stylesheets. A customer need only spend at most a day enhancing their applications or writing excel macros to generate the XML data in the format that you specify.
Recent versions of Excel have support for converting spreadsheet data to XML, and can even validate against schemas.
Once the data passes the XSL validation checks, you have validated XML data.
If you have heaps of data and heaps of money, you could look at existing data management and cleansing tools:
http://www-01.ibm.com/software/data/infosphere/datastage
http://www-01.ibm.com/software/data/infosphere/qualitystage
But even then, you'll likely need to follow kyoryu's suggestion assuming you have 500+ data formats. The problem isn't your side. You need them to standardize their output formats if you have no control over their apps. CSV is likely the easiest. You could even send them a excel template to help them along.

Resources