Is there a way to pass a column family containing colons to the shell's scan command - accumulo

I have a table which has, as the column family, a url, when attempting to scan this table from the shell using the -c argument to limit the column families returned I get no results. I suspect that the : in the url is being interpreted as the separator between the column family and column qualifier. My question is, is there a way to escape or quote the colon so that it will be interpreted as part of the string for the column family?

Not at this time, no. Looking at the code, each column is split by a colon, with the first part being the column family and the second part the column qualifier. Because the number of parts is limited to 2, this does mean that you can scan for entries with colons in the column qualifier, but that doesn't really help you here.
As an alternative, your best bet is to use the Java API, using fetchColumnFamily on a Scanner or BatchScanner.

If you really want to use the command line, you can also try using grep or egrep. They'll both have the same problem of the : separator, but you can tell it to look for the part after the :. This is obviously a hack, but given MikeD's answer, it might be your only solution. Depending on the problem, this might be good enough.

Related

Excel formula that produces one of two options

This is my first StackOverflow question, so apologies if I am unclear.
Currently, my work uses an Excel tracking doc to log project info. The column info is like so:
CELL B1 (Project Number) =IF(B2=""," ",MID(B2,FIND("P2",B2),9))
CELL B2 (Project Name) Client / P2XXXXXXX / Name
Thus, the P2XXXXXXX gets pulled out of B2 and populated into B1.
However, management has recently switched systems, so now, some project numbers have the P2XXXXXXX format and others have a PRJ-XXXXX format.
So we need a formula the produces nothing if the cell is blank and EITHER the P2XXXXXXX number or PRJ-XXXXX number if the cell is not blank.
Is it possible? If any further details are needed, let me know. Thanks in advance!
Well, if the / is always there then this can work:
IF(B2="","",MID(B2,FIND("/",B2,1)+2,9))
assuming the name is always 9 characters.
String Between Two Same Characters
Maybe the next month your company will start using a different first letter or could add more numbers e.g. SPRXXXXXXXXXX. So you could solve this problem by extracting whatever is between those two slashes.
=IF(B2="","",TRIM(MID(B2,FIND("/",B2)+1,FIND("/",B2,FIND("/",B2)+1)-FIND("/",B2)-1)))
Find the first character =FIND("/",B2), but we need the next one:
=FIND("/",B2)+1
Find the second character but search from the postition after the first found:
=FIND("/",B2,FIND("/",B2)+1)
Now get the string between them:
=MID(B2,FIND("/",B2)+1,FIND("/",B2,FIND("/",B2)+1)-FIND("/",B2)-1)
(note how the last minus was 'converted' from a plus to a minus (- + + = -)).
Remove the leading and trailing spaces:
=TRIM(MID(B2,FIND("/",B2)+1,FIND("/",B2,FIND("/",B2)+1)-FIND("/",B2)-1))
Add the condition when the cell is blank:
=IF(B2="","",TRIM(MID(B2,FIND("/",B2)+1,FIND("/",B2,FIND("/",B2)+1)-FIND("/",B2)-1)))
Here's another way using LEFT and RIGHT:
=IF(B2="","",TRIM(LEFT(RIGHT(B2,LEN(B2)-FIND("/",B2)),FIND("/",B2))))
Although you can solve this problem with a combination of slicing, trimming, and complex conditionals, the most expressive and easy to maintain solution is to use regular expressions. Regular expressions have a bit of a learning curve, but there's a great playground website where you can experiment with them, and this page has a pretty good writeup on how regular expressions work in excel.
Specifically, this regular expression addresses the two naming conventions you've highlighted, but it can be updated to support more naming conventions as your company inevitably adds more:
P(RJ-)?((\d){9}|(\d){5})
To break that down from left to right:
P: both patterns start with a "P"
(RJ-)? One pattern follows with "RJ-", but the other doesn't. This is a grouped part of the pattern, and the question mark means that this part of the pattern is optional.
((\d){9}|(\d){5}): by far the nastiest part, but this basically means that there is going to be a sequence of numbers (\d), and there will either be nine of them or five of them. By wrapping the whole thing in parenthesis, they are always the second captured group, no matter the length of the sequence of numbers. This means that you can always extract the project id by looking at the value of the second capture group.
You can also make the expression more generalized by replacing ((\d){9}|(\d){5}) with simply (\d+). That just means "one or more digits." That gives you a much more simplified overall expression of this:
P(RJ-)?(\d+)
Depending on whether or not you care about validating strictly that project ids are 5 OR 9 digits long, that pattern above might be suitable, and it has the benefit of being more flexible. Still, the project ID is in the second captured group.

Excel conditional formating based on the multiple cells and values

I am trying to implement various conditional formatting to a specific data base. Looked for answer around here but can not find anything similar. Might not be possible but it is worth a try.
I am preforming various data cleansing and validation.
Here is the case: (small sample, working with 100k data entries in this particular file)
Ultimately what I want is the formula that will compare the low-level Description characters after the last "UNDERSCORE" to the characters after last "UNDERSCORE" of the higher level(highlighted). If it does not match then highlight the cell?
Asking for too much, yes, no, maybe? I am open to any other suggestions on how can I perform various data cleaning and validation!
Thank you!
If you must use the last "UNDERSCORE" character, and can't depend on the suffixes being four characters, the formula becomes quite complex. For simplicity's sake, I assumed the higher level is always missing the last five characters of the lower level, if you must go by the last "DASH" character, then this will be a lot longer.
Use this formula to highlight the cells, defining the two names LEVELS and DESCRS to be the two columns:
=IFNA(MID(B2,FIND("[]",SUBSTITUTE(B2,"_","[]",LEN(B2)-LEN(SUBSTITUTE(B2,"_",""))))+1,999)<>MID(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1),FIND("[]",SUBSTITUTE(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1),"_","[]",LEN(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1))-LEN(SUBSTITUTE(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1),"_",""))))+1,999),FALSE)
This uses a very nice trick with SUBSTITUTE to find the last occurrence of a character.
BTW, I would probably write a Perl program to parse the data and find errors.

Vlookup Not working on text between two tables

This is not your average vlookup error.
I have two Power Query tables that I've setup. One is coming from a CSV file with a list of names. The other is from a website pulling a list of names.
i.e.
=John Smith = John Smith would not be true for some reason.
They vlookup should be able to find the name easily. I've tried proper,upper, clean, trimming and text to columns and everything else that I could think of. I've changed data types to no avail.
I know that one query is causing the issue. I can type the name exactly and do a vlookup from one, and it works. The second query that I do this to doesn't return anything on the typed text.
Anyone encounter this issue while using Power Query?
EDIT: See Jeeped's Answer - When I replace the space from the web query with a normal space it works.
#Jeeped's comment has a good answer:
Assuming you have already trimmed off leading and trailing spaces, one of the John Smith entries (likely the one from the web) uses a non-breaking space (e.e. CHAR(160) or ASCII 0×A0) instead of a regular space (e.g CHAR(32) or ASCII 0×20). Use
=CODE(MID(A$1, ROW(1:1), 1))
on both, fill down to get a ASCII code for each letter and compare the numbers.

Sorting on a numerical value using csvfix for linux - turns numbers to strings

I'm using csvfix to sort a CSV file based on an integer (counter) value in the second column. However it seems that csvfix puts double quotes around all fields in the file, turning them to strings, before it performs the sort. The result is that the rows are sorted by the string value, such that "1000" comes before "2".
There is a command-line option -smq that is supposed to apply "smart quoting" but that's not helping me. If I use the command csvfix echo -smq file.csv, the output has no quotes around numerical fields, but when I pipe that into csvfix sort -f 2 file.csv, the file is written without quotes but still sorted in "string order". It makes no difference whether I include the -smq flag in the sort command or not.
Additionally I would like csvfix to ignore the first row of string headers. Csvfix issue tracking claims this is already implemented but I can only find the -ifn flag that seems to cut the header row out entirely.
These seem pretty basic pieces of functionality for this tool, so I'm probably missing something very simple. Hoping someone on here has used csvfix and figured out.
According to the on line documentation for csvfix, sort has a N option for numeric sorts:
csvfix sort -f 2:N file.csv
Having said this, CSV isn't a particularly good format for text manipulation. If possible, you're much better off choosing DSV (delimiter separated values) such as Tab or Pipe separated, so that you can simply pipe the output to sort, which has ample capability to sort by field, using whatever collation method you need.

Sharepoint: Calculated Column replace all spaces

Seems like it would be a simple thing really (and it may be), but I'm trying to take the string data of a column and then through a calculated column, replace all the spaces with %20's so that the HTML link in the workflow produced email will actually not break off at the first space.
For example, we have this in our source column:
file:///Z:/data/This is our report.rpt
And would like to end up with this in the calculated column:
file:///Z:/data/This%20is%20our%20report.rpt
Already used the REPLACE, and made up a ghastly super nested REPLACE/SEARCH version, but the problem there is that you have to nest for EACH potential space, and if you don't know how many up front, it doesn't work, or will miss some.
Have any of you come across this scenario and how did you handle it?
Thanks in advance!
As far as I know there is no generic solution using the calculated-column syntax. The standard solution for this situation is using an ItemAdded (/ItemUpdated) event and initializing the field value from code.
I was able to solve this issue for my circumstances by using a series of calculated columns.
In the first calculated column (C1) I entered a formula to remove the first space, something like this:
=IF(ISNUMBER(FIND(" ",[Title])),REPLACE([Title],FIND(" ",[Title]),1,"%20"),[Title])
In the second Calculated column (C2) I used:
=IF(ISNUMBER(FIND(" ",[C1])),REPLACE([C1],FIND(" ",[C1]),1,"%20"),[C1]).
In my case, I wanted to encode upto four spaces, so I used 3 calculated columns (C1, C2, C3) in the same fashion and got the desired result.
This is not as efficient as using a single calculated column, but if SUBSTITUTE will not work in your SharePoint environment, and you cannot use an event handler or workflow, it may offer a workable alternative.
I actually used a slightly different formula, but it was on a work machine to which I don't have access at the moment, so I just grabbed this formula from a similar S.O. question. Any formula that will replace the first occurrence of a space with "%20" will work, the trick is to a) make sure the formula returns the original string unchanged if it does not have more spaces in it, and b) test, test, test. Create a view of your list that has the field you are trying to encode, plus the calculated fields, and see if you are getting the results you want.
so that the HTML link in the workflow produced email will actually not break off at the first space.
The browser only does this if you have not enclosed your link in quotes
If you wrap the link in quotes, it does not cut off at the first space
In a SharePoint Formula it would be:
="""file:///Z:/data/This is our report.rpt"""
becuase two quotes are the SP escape notation to output a quote
You can use this formula (Start trim for 1, in my case was 4):
=IF(ISBLANK([EUR Amount]),"",(TRIM(MID([EUR Amount],4,2))&TRIM(MID([EUR Amount],6,2))&TRIM(MID([EUR Amount],8,2))&TRIM(MID([EUR Amount],10,2))&TRIM(MID([EUR Amount],12,2))&TRIM(MID([EUR Amount],14,2)))*1)

Resources