Delete multiple tables in Accumulo? - accumulo

My development instance of Accumulo became quite messy with a lot of tables created for testing.
I would like to bulk delete a large number of tables.
Is there a way to do it other than deleting the entire instance?
BTW - If it's of any relevance, this instance is just a single machine "cluster".

In the Accumulo shell, you can specify a regular expression for table names to delete by using the -p option of the deletetable command.

I would have commented on original answer, but I lack the reputation (first contribution right here).
It would have been helpful to provide a legal regex example.
The Accumulo shell can only escape certain characters. In particular it will not escape brackets []. If you want to remove every table starting with the string "mytable", the otherwise legal regex commands have the following warning/error.
user#instance> deletetable -p mytable[.]*
2016-02-18 10:21:04,704 [shell.Shell] WARN : No tables found that match your criteria
user#instance> deletetable -p mytable[\w]*
2016-02-18 10:21:49,041 [shell.Shell] ERROR: org.apache.accumulo.core.util.BadArgumentException: can only escape single quotes, double quotes, the space character, the backslash, and hex input near index 19
deletetable -p mytable[\w]*
A working shell command would be:
user#instance> deletetable -p mytable.*

There is not currently (as of version 1.7.0) a way to bulk delete many tables in a single call.
Table deletion is actually done in an asynchronous way. The client submits a request to delete the table, and that table will be deleted at some point in the near future. The problem is that after the call to delete the table is performed, the client then waits until the table is deleted. This blocking is entirely artificial and unnecessary, but unfortunately that's how it currently works.
Because each individual table deletion appears to block, a simple loop over the table names to delete them serially is not going to finish quickly. Instead, you should use a thread pool, and issue delete table requests in parallel.
A bulk delete table command would be very useful, though. As an open source project, a feature request on their issue tracker would be most welcome, and any contributions to implement it, even more so.

Related

Adding multiple keywords with Exiftool, but only if they're not already present

I'm running the following command to add multiple keywords to an image:
exiftool -keywords+="Flowering" -keywords+="In Flower" -keywords+="Primula vulgaris" -overwrite_original "/pictures/Some Folder/P4130073.JPG"
However, I've noticed that if I do this for an image which already contains a particular keyword, then it'll get added a second time.
How can I ensure that keywords are added only if they're already missing, and that if they exist, it'll do a no-op (and ideally leave the file untouched). I've read a few questions on the forum and the docs, but NoDups docs isn't clear to me (I'm an exiftool n00b) and all the answers I've found only process a single keyword addition.
For an added bonus, if the 'exists' check could be case-insensitive, so much the better (e.g., so that if I'm doing keywords+="Flowering" and the image already has the keyword "flowering", nothing will be done.
I also need this to work on Linux, MacOS and Windows (I know the quotes can complicate things!).
See Exiftool FAQ #17
To prevent duplication when adding new items, specific items can be
deleted then added back again in the same command. For example, the
following command adds the keywords "one" and "two", ensuring that
they are not duplicated if they already existed in the keywords of an
image:
exiftool -keywords-=one -keywords+=one -keywords-=two -keywords+=two DIR
The NoDups helper function is used to remove duplicates when they already exist in the file. It isn't used to prevent duplicates from being added in the first place.

run ibm data stage job with different file in same job

I created a job to input excel data into database. I need the job to be reusable for different excel version. The columns of the excel will be the same but only the values will change, it's like inserting newest excel values version to the database.
Example, the file of sales_report_january.xlsx , sales_report_february.xlsx both have same columns and only the row values is different. I need the job to be able to process both files without changing anything else except the file path. Because recreating different job with the same everything(except for the filepath) for the same task seems inefficient.
Is it available to do this in ibm data stage or do i need to remap everything despite it doesn't need any change? i already tried it by changing the file path manually but it raised error.
In a word: Parameter
Construct your job using a job parameter for the pathname of the Excel workbook.
Whichever stage you are using to read the worksheet will have the workbook name set up as reference(s) to that parameter.
Tip: Use two parameters; one for the dirname part of the pathname and one for the actual name of the workbook. This is a more flexible design in the long run.
I can think of at least four ways to do this. Usually, if the files are all in the same directory, we use looping in the sequence job to process a list of the file names obtained through an appropriate command (such as ls -m pattern for UNIX/Linux). Capture the output, convert the newlines to a delimiter such as comma if necessary, and use that list in the StartLoop activity.

Regular Expressions and SQL Server Error Logs - All false results

Ok, I have done my searching and I have tried many things. I think it is time to put my question here:
I have been working on taking in other user's SQL Server error logs, parsing out the rows into columns, then bulk inserting the data 1000 at a time. I troubleshoot SQL Server for other people so sp_readerrorlog will only show me my local instance. Finding root cause involves 4 sets of logs (SQL Server, Application Event, System Event, and get-clusterlog outputs and matching up timestamps. A fast load into SQL Server along with the ability to pull the exact timeframe needed will shorten my time spent staring at log files.
I am currently bottlenecked in testing the rows with a regular expression, which does work if I feed it data myself:
def sqlrowmatch(row):
pattern = re.compile(r'\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d.\d\d')
if pattern.search(row):
return True
else:
return False
given any string that matches above (1111-11-11 11:11:11.11) will return as true. The idea is if in a SQL Server Error Log, if this is matched, then it is a separate entry. this will allow memory graphs, deadlock graphs, and dumps to all be grouped in one entry as opposed to being split over several lines.
However, if I point it at one of the SQL Error Logs, there seems to be extra characters. This is giving re.match and re.show a hard time finding a match. If I load any line in this function,sqlrowmatch(), it reports back false for all rows.
ÿþ <-- this appears to be the first 2 characters at the first line. re.search just doesn't even find it anywhere in the in the different elements.
False is what is returned if I put the function in with the 'with open' as statement:
with open(file, 'r') as sqllog:
for line in sqllog:
print(sqlrowmatch(line))
the first line should always be true if sqlrowmatch() is used.
2018-10-13 22:40:09.41 Server Microsoft SQL Server 2016 (SP2-CU2-GDR) (KB4458621) - 13.0.5201.2 (X64)
So I am lost and my current project is at a halt. Perhaps some seasoned insight from this group can get me going again.
TIA
Interesting enough, I found my answer here: Opening huge text file, unicode issue
open should be done with encoding='utf-16'
It now matches appropriately

file transfer Extra attachmate appends username to host file name

Hi when I try to download a file from mainframe, using attachmate extra it appends the username also along with it. I dont know where to turn it off.
like for example - file name is yyyy.file.name, then when i try to transfer of file it transfers username.yyyy.file.name.
in 3.4 the option to append user name is turned off. Still its happening
Enclose the entire dataset name (including the high-level qualifier) in single quotes. This is a TSO (not JCL) convention - if you refer to a dataset without single quotes, it pre-pends your user ID as the high-level qualifier; however if you place single quotes around the dataset name it will take it 'as is' (well, it will uppercase it, since all z/OS dataset names are uppercase, but otherwise it will be 'as is').

How can you hide passwords in command line arguments for a process in linux

There is quite a common issue in unix world, that is when you start a process with parameters, one of them being sensitive, other users can read it just by executing ps -ef. (For example mysql -u root -p secret_pw
Most frequent recommendation I found was simply not to do that, never run processes with sensitive parameters, instead pass these information other way.
However, I found that some processes have the ability to change the parameter line after they processed the parameters, looking for example like this in processes:
xfreerdp -decorations /w:1903 /h:1119 /kbd:0x00000409 /d:HCG /u:petr.bena /parent-window:54526138 /bpp:24 /audio-mode: /drive:media /media /network:lan /rfx /cert-ignore /clipboard /port:3389 /v:cz-bw47.hcg.homecredit.net /p:********
Note /p:*********** parameter where password was removed somehow.
How can I do that? Is it possible for a process in linux to alter the argument list they received? I assume that simply overwriting the char **args I get in main() function wouldn't do the trick. I suppose that maybe changing some files in /proc pseudofs might work?
"hiding" like this does not work. At the end of the day there is a time window where your password is perfectly visible so this is a total non-starter, even if it is not completely useless.
The way to go is to pass the password in an environment variable.

Resources