Split multi-line text in columns [closed] - excel

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 months ago.
Improve this question
Given:
Confessionalized Optics: The Society of Jesus and Early Modern Optics
Author: Purkaple, Brent
University: University of Oklahoma
Year Published: 2022
Abstract:
This dissertation explores the investigation and explanation of optics
among prominent members of the Society of Jesus during the early
modern period. In doing so it aims to explain why it was that optics
became one of the more important scientific subjects among the members
of the Order. In addition to this it aims to explain how it was that
their identity as members of the Order shaped their explanation of
optics at a time when there was no agreed upon meaning of optics. As
argued, this interaction between Jesuit identity and optical theory
may best be understood as an act of confessionalization. The benefit
of this categorization is that it allows for a complex analysis of
optics among the Society of Jesus which avoids any essential
identification of the relationship between science and religion. This
dissertation, then, not only addresses why optics among the Jesuits
should be understood as confessionalized, but also how the category of
confessionalization may provide a path through the complex dynamics of
early modern science and religion.
I would like to have this (i.e. hundreds of strings in this format) converted into a table with columns Number, Title, Author, University, Year Published, Abstract (multi-lines!).
I don't rely on a specific tool but I fail doing it with Excel. I think I will need to use a RegEx formula.

It can be done in power query. Some Abstracts where missing, for those I inserted an extra row manually in Excel.
Here you can watch how to do it in Power Query:
https://www.youtube.com/watch?v=0W_0tvPIOng
With the help of a formula I added new rows
One or two manual changes needed also.
Final outcome:
Here is a copy of that file:
https://1drv.ms/x/s!AncAhUkdErOkguU9IADpGxeeEOspPQ?e=KyF0oq

Related

Looking for ICD10 API [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Any body knows of a good ICD10 API to do diagnostic code lookups that can recommend. I am currently building a simple app to tag patients with medical condition and the idea is to have a lookup API where one can type asthma for example and get back all the different ICD10 codes for asthma
My R package, icd converts ICD-9 and ICD-10 codes to descriptions, in addition to its main function of finding comorbidities. Documentation at https://jackwasey.github.io/icd/ , and code at https://github.com/jackwasey/icd . It does this using the function explain_code. It currently uses ICD-10-CM, i.e. the USA billing adapted ICD-10 code set, which in general is more specific than the canonical WHO version, but does have some areas of less detail.
E.g. WHO ICD-10 has HIV disease resulting in Pneumocystis jirovecii pneumonia as a subdivision for HIV infection, whereas ICD-10-CM just has HIV. On the other hand, ICD-10-CM has Sucked into jet engine, subsequent encounter whereas the WHO is happy with the terribly vague: Person on ground injured in air transport accident.
The volume of data for all the descriptions is not very high, just handful of megabytes, so although an API may seem convenient, you might consider just having all the data and not having to ping some random server.
I'm going to assume you're ignoring all of the usual stuff around variations of spelling of medical terms, proper terms vs. colloquialisms, labels vs. descriptions, etc. that get to be a pain with term / code finders.
If you want to use a hosted option and are OK with the terms of use, you could use UMLS (https://uts.nlm.nih.gov/home.html#apidocumentation). It's a great resource, but the use case you're describing isn't necessarily what it's intended to address.
Personally - and I usually don't like to roll my own stuff - I'd consider doing your own thing. You could do something focused on your needs and tailor it to any specific behaviors you might want (like preferring specific codes based on an organization - EX: billing preference). You could also probably make it far, far more ... perky ... and address short forms of terms (EX: synonyms like "DVT") or misspellings ("asthma" vs. "athsma"). If you go that route, I'd suggest considering getting your hands on the ICD-10 code info and then mashing it into Elastic Search. You could extend the data by mixing it with other info and really make it hum. And Elastic is wicked fast.
That's just my $0.02, though.
There is a project called "Unified Medical Language System (UMLS)", funded by NIH and apparently they are working on a RESTful Web API for medical terms.
https://documentation.uts.nlm.nih.gov/rest/home.html
I didn't work with their API yest and the samples I am seeing on their website sounds like they are more SNOMED-CT oriented.
The option I would go for is to get the whole ICD-10-CM from CMS and build my own Web API.
https://www.cms.gov/Medicare/Coding/ICD10/2016-ICD-10-CM-and-GEMs.html
you can check the full documentation from WHO https://icd.who.int/icdapi

Why we cant use arithmetic mean when in planning poker? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Why so? I really can't understand that. Why we can only select from numbers proposed by players?
The numbers used are spaced far apart on purpose (typically from the Fibonacci sequence). If you get numbers from all across the board from 1 to 23, you're supposed to ask why the person who voted 1 gave it such a low score ("Did you think about testing and deployment? What about these other acceptance criteria?") and why the person who voted 23 gave it such a high score ("Are you aware we already have an existing code base? Did you know that Karen knows a lot about this and can pair up with you?") and then re-vote. If you're really stuck because half the team says 8 and the other half says 13, you can take the 13 and move on with your lives.
The extra precision isn't necessary when your accuracy is not great. My team goes for even less precision and buckets stories into "small" (one person can do a bunch in an iteration), "medium" (one person can handle a few of these), "large" (one person a week or more), and "extra large" (too big and needs to be split).
You can do what you want to do. However, the thought about choosing the exact numbers that are proposed is that with growing numbers, you cannot estimate small details reliably. That's why with growing numbers, the gaps between numbers become larger.
Once you start giving detailed numbers (like one estimating 8 and the next 13, chosing 11 as a mean) people assume this actually is a detailed estimation. It's not. The method is meant to be a rough guess.
Behind the idea that people should agree on one number is that everybody should have the same understand of the story.
If people pick very different numbers they have a different understanding how much work needed to complete the story or how difficult it will be. The different numbers should start discussions then and finally lead to a shared view of the story.
You should think to numbers as symbols with none arithmetic meaning, except for a (partially) ordered relation, because they are estimates (of effort need to do done a user story).
If you use math to model an estimate you should provide a way:
to represent certainty
to represent uncertainty
to operate with that representations
to define an average as a function of certainty and uncertainty
If you use some kind of average which operates on estimates modeled as single numbers you are supposing that certainty and uncertainty can be handle in the same way, and I guess it's a bad assumption.
I think that the spirit of planning poker session is achiving a team-shared estimates by a discussion among human being and not using arithmetic on human being estimates.

About subject,predicate and object in RDF [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
This is slightly Off-topic!!!. But please answer to this question.
I have studied lots of articles and materials on net about RDF but i can't understand one thing is how programatically subject, predicate and object is dividing in a natural English line.
Ex: Scott Directed Runner.
If i give this above line, then how the above line is divided into subject,predicate and object with respect to programmatical. please answer.
Thx...
subject, predicate, and object, are used in NLP to define aspects of sentences in some languages, as you mentioned. Do not conflate that with their usage in this context. In RDF, they are names for three distinguishing characteristics of a triple/statement.
Read RDF1.1 Concepts and Abstract Syntax and note that one major takeaway is that a statement is formally defined as a 3-tuple (triple) consisting of:
subject:= the node the statement/edge starts at
predicate := a semantically important label for for the statement/edge
object := the node that the statement/edge terminates at
As you learn more about RDF, you'll learn that you have two major problems:
The Pure NLP problem that you have asked earlier, consisting of "How does one map a sentence in a natural language to a statement in RDF". This is not a trivial task, and requires that one study a great deal of NLP in order to solve.
The RDF problem, which will be "what should I define as my representation for this content once I know what I am extracting". This will include direct mapping of language expressions ("bob is a cat" -> :bob rdf:type :Cat) and mapping of more arbitrary concepts
An example of mapping a more arbitrary concept: "All cats have at least one owner" ->
:Cat rdfs:subClassOf _:x .
_:x rdf:type owl:Restriction .
_:x owl:onProperty :hasOwner .
_:x owl:minCardinality "1"^^xsd:nonNegativeInteger .
To risk understating the point, the general problem that you have formulated is an extraordinarily large task that may not be well suited to StackOverflow. You will need to break this task up into many many much smaller issues while you develop an understanding of the domain, and then ask specific technical questions as you work on this.

Text mining - extract name of band from unstructured text [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm aware that this is kind of a general, open-ended question. I'm essentially looking for help in deciding a way forward, and perhaps for some reading material.
I'm working on an algorithm that does unstructured text mining, and trying to extract something specific - the names of bands (single artists, bands, etc) from that text. The text itself has no predictable structure, but it is relatively small (1, 2 rows of text).
Some examples may be (not real events):
Concert Green Day At Wembley Stadium
Extraordinary representation - Norah Jones in Poland - at the Polish Opera
Now, I'm thinking of trying out a classifier but the text seems to small to provide any real training information for it.
There probably are several other text mining techniques, heuristics or algorithms that may yield good results for this kind of problem (or perhaps no algorithm will).
Because of the structure of your data a pre-trained model will probably perform poorly. Besides, the general organization, location, and person categories will probably not be useful for you.
I don't think the text themselves are too small, most NER-systems work on one sentence at a time. So providing your own training set with a NER-library will probably work well, such as http://nlp.stanford.edu/ner/index.shtml
If you don't want to create a training set you will need a dictionary with all the bands/artists. Then you obviously can't find unknown bands/artists.
There is simple NER algorithm that could simplify the task a bit:
take the words which may be (or not be) named entity and search for them in Google or Yahoo (via API) twice: as separate words and as exact phrase (i.e. with quotation marks). Divide numbers of results. There is threshold (<30) which determines if words form a named entity.

How to document software algorithms? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm working on a large project for a university assignment, we're developing an application that is used by a business to compile quotes for their various services.
I need to document the algorithms in a way that the client can sign off on to make sure the way we calculate the prices is correct
So far I've tried using a large flow chart with decisions diamonds like in information systems modelling but it's proving to be overkill for even simple algorithms.
Can anybody please suggest some ways to do this? It needs to be as little like software code as possible, and enough for the client to see how we decide what prices are quoted
Maybe you should then use pseudocode.
Create two documents.
First: The business process model (BPM) that shows the sequence of steps required to be done. This should be annotated with the details for each step.
Second: Create a spreadsheet with each input data item defined so that business can see that you understand the type of field for entry of each data point and the rules for each data point. If the calculation uses a table for the step, then that is where you define the input lookup value from the table. So for each step you know where the data is coming from and then going to. Your spreadsheet can include the link to the BPM so they can walk through each data point in the BPM and see where it is coming from/going to.
You can prepare screen designs to show the users how your system is doing actually.
Well, the usual way to document algorithms is writing papers.
If your clients have studied business, I'm sure they are familiar with reading formulas.
Would a data flow diagrams help? Put psuedo code or math in the bubbles. I've had some success combining data flow models and entity relationship diagrams, but it's non standard.
What about Nassi-Shneiderman-Diagram, it's a diagram from structural programming. I think its good to show decision flows.
http://en.wikipedia.org/wiki/Nassi%E2%80%93Shneiderman_diagram
You could create an algorithm test screen to display and comment on the various steps through the calculations.

Resources