Excel VBA formula: extract file extension from filepath? - excel

Given an excel column containing filepaths, what excel formula returns only the file extension?
src\main\java\com\something\proj\UI.java --> java
src\main\java\com\something\proj\Server.scala --> scala
src\main\java\com\something\proj\include.h\someinclude.hpp --> hpp
Note 1: this formula works great for filepaths with only a single period, but not for case 3:=IF(A1="","",RIGHT(A1,LEN(A1)-FIND(".",A1)))
Note 2: I understand that these filepaths are Windows-specific, I don't need a cross-platform solution.
Related: triming extension from filename in Excel and How to extract file name from path?

With data in A1, use:
=SUBSTITUTE(RIGHT(SUBSTITUTE(A1,".",REPT(".",999)),999),".","")
From:
Jim Cone's old post

This will find everything after the last .:
=MID(A1,FIND("{{{",SUBSTITUTE(A1,".","{{{",LEN(A1)-LEN(SUBSTITUTE(A1,".",""))))+1,LEN(A1))

Here's a nice long answer. :-)
=SUBSTITUTE(A1,LEFT(A1,FIND(CHAR(1),SUBSTITUTE(A1,".",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,".",""))))),"")

A neat trick I sometimes use for string parsing in general is to leverage the FilterXML() function (Excel 2013 and later). The basic strategy is to use Substitute() to format your string in a way that it is parsed across elements in an xml string, and then you can use xpath syntax to conveniently navigate your parsed elements. Using this strategy, getting an extension would look like this...
=FILTERXML("<A><p>" & SUBSTITUTE(A1,".","</p><p>.")&"</p></A>","//p[last()]")
If you're not familiar with xml, this can seem intimidating, but if you can grasp what's going on, I find it to be cleaner, more flexible, and easier to remember than the alternative approaches using len(), substitute(), etc. One reason why it's nicer is because there's only one cell reference.
Illegal Characters
There are two characters that are allowed in paths but not in xml: & and '
The equation above will work if these characters are not present, otherwise, they will need to be handled something like this...
=FILTERXML("<A><p>" & SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(J8,"'",""),"&",""),".","</p><p>.")&"</p></A>","//p[last()]")
Example
Suppose we have a nasty file path like this:
C:\Folder1\Folder2\(ugly characters !##$%^()_+={};;,`)\two.dots.LongExt
1.) The Substitution() portion will convert it to an xml string like this...
<A>
<p>
C:\Folder1\Folder2\(ugly characters !##$%^()_+={};;,`)\two
</p>
<p>
.dots
</p>
<p>
.txt
</p>
</A>
2.) Once formatted like this, it's trivial to pick out the last p element using the xpath syntax //p[last()].

Related

Remove part of a string in each row of a large column of data in KNIME

I am stumbed.
I have a column with some thousand rows of unique adresses regarding universities, pharmacompanies etc. in a KNIME workflow
Example:
55 Shattuck Street Boston Massachusetts 02115 US [NAT: US RES: US] for all designated states
What I need is to clean the data, so each row look like nice and computable like this:
55 Shattuck Street Boston Massachusetts 02115 US.
My problem Is I can't seem to get the system to remove everything after US. Does anyone know a suitable approach in KNIME?
You should be able to use either String Replacer or String Manipulation for this. The first one lets you use either a simple wildcard or a full regular expression pattern while the second one uses a Java-like syntax - the choice comes down to how many different variations on the input data you need to handle and which syntax you prefer.
If you just need to remove any text between square brackets including the space before the open bracket then you can use String Replacer configured like this:
Beside the nodes which were already mentioned by nekomatic and which will work perfectly for the given scenario, there's also a user-friendly regular expression tool in the Palladian nodes extension called Regex Extractor, which allows you to build your regexes with a live preview as you might know from popular online regex testers.
For your scenario, you could e.g. set up a regex like this:
^(?<address>.*)(?:\s\[.*)
In prose, this means: Capture all characters until a space + square opening bracket and output into a column named address.
The Palladian extension is available here as a free plugin for KNIME Desktop and provides a variety of different tools for web, text, and geo data mining and classification.

Can't seem to find an XPath with a value that contains double quotes

I've read through the forums and apparently almost nobody has issues to find XPaths with values with double quotes, most posts I found talked about finding Xpaths to elements with values that had both single quotes and double quotes, so I decided to ask this question, I apologize if this is already answered elsewhere.
Anyway, the element I wanted to find goes more or less like this:
<a class="product" title="REALIZE "WHY NO.T"" width="454" height="423" alt="" id="">
</a>
I tried changing the XPath several times without success, using Selenium Webdriver
'//a[#title="REALIZE "WHY NO.T""]'
'//a[#title="REALIZE \"WHY NO.T\""]'
"//a[#title="REALIZE \"WHY NO.T\""]"
These are a few of the ones I tried, there are a few more but I didn't save all the ones I tried
I feel like it might be a matter of me missing something terribly basic, but I've been looking for the answer for hours without success
//*[#title='REALIZE "WHY NO.T"']
You have to wrap content with single quotes, the python code will be ( escaping the single quotes
driver.find_element_by_xpath('//*[#title=\'REALIZE "WHY NO.T"\']')
You say " the element I wanted to find goes more or less like this:"
"More or less"? What on earth does that mean? Is it more like this, or less like this? How can we help you if we don't know exactly what it's like?
And then you say:
<a class="product" title="REALIZE "WHY NO.T"" width="454" height="423" alt="" id="">
</a>
But that's not well-formed XML. How is the parser supposed to work out where the title attribute ends? The XML parser should throw it out at that point.
OK, you're probably using an HTML parser rather than an XML parser, and HTML parsers try to make sense of any old garbage you throw at them. But I've no idea what an HTML parser will do with this input. HTML parsers are smart, but they're not smart enough to work out which of these quotes are part of the attribute value and which of them mark its beginning and end. It's probably turned it into something quite different from what you were expecting, and that's why your XPath expression doesn't work.
I would recommend
right-clicking the intended element > click inspect element > look over element in console > copy xpath > paste it and analyze how it outputs.
From there I would then compare it to your current solution and maybe tweak a thing or two.

NetSuite Advanced PDF - How to set <#ftl output_format = "HTML" />

I've built many many many Advanced PDFs in the past couple of years. There is one thing that always sticks...
This applies mainly to SuiteScript rendered PDF templates.
The PDFs error if the user fields include & or -- or any other unesdcaped string literal. The default output_format is undefined
I'm looking at FTL documentation and can set <#ftl output_format = "HTML" /> but no matter where I put this in the PDF template, it fails.
Is there a particular place I need to declare this in the template?
It's not feasible to globally replace "&" with "&" everywhere etc...
Not sure that this answers the exact question you're asking, but I don't think it's the output format that's your problem here. My understanding is that the output format refers to what's generated by the template - ie: the final render. The output format, in any case, should be XML, as that's what's consumed by the BFO tag library when you're creating PDFs.
I think the issue is that your XML itself is not valid when string literals contain XML control characters of "&", "<" or ">". To avoid this, when building your templates and adding strings with SuiteScript, you can use the N/xml module's xml.escape() method to wrap anything that could contain one of those characters.
Sorry if I'm off base with this, but hope it helps.

NodeJS Jade (Pug) inline link in dynamic text

I have this NodeJS application, that uses Jade as template language. On one particular page, one text block is retrieved from the server, which reads the text from database.
The problem is, the returned text might contain line-breaks and links, and an operator might change this text at any time. How do I make these elements display correctly?
Most answers suggest using a new line:
p
| this is the start of the para
a(href='http://example.com') a link
| and this is the rest of the paragraph
But I cannot do this, since I cannot know when the a element appears. I've solved how to get newline correct, by this trick:
p
each l in line.description.split(/\n/)
= l
br
But I cannot seem to solve how to get links to render correctly. Does anyone know?
Edit:
I am open to any kind of format for links in the database, whatever would solve the issue. For example, say database contains the following text:
Hello!
We would like you to visit [a("http://www.google.com")Google]
Then we would like that to output text that looks like this:
Hello!
We would like you to visit Google
Looks like what you're looking for is unescaped string interpolation. The link does not work in the output because Pug automatically escapes it. Wrap the content you want to insert with !{} and it should stop breaking links. (Disclaimer: Make sure you don't leave user input unescaped - this only is a viable option if you know for sure the content of your DB does not have unwanted HTML/JS code in it.)
See this CodePen for illustration.
With this approach, you would need to use standard HTML tags (<a>) in your DB text. If you don't want that, you could have a look at Pug filters such as markdown-it (you will still need to un-escape the compilation output of that filter).

Calculate length of string using yahoo pipes

I am using yahoo pipes to fetch articles from various sources including google, however articles from google include the title and source of title in the description, is there a way in yahoo pipes to remove the title & source and leave the rest of article intact. I tried to use sub-string however it requires length of the string which is variable for each article. I guess if there is way to calculate the length of title and source and pass it to sub-string module this may work.
Any help would be great.
Regards
Take a look at http://pipes.yahoo.com/pipes/pipe.info?_id=8KZMRx473hGtVMYsP27D0g, which can be used as a subpipe (i.e. within a loop) to calculate the length of a string. It should be relatively straightforward to add a second text input module and modify the Pipe to cater for your second text string.

Resources