Delete all the cells of the Databricks Notebook - databricks

I am working on Databricks notebook for some of the spark work that I am doing. I am using notebook just as a proof of concept work initially and then organized that so that I can create jar out of it. As I am doing POC I try adding lot of cells to experiment different ways. Over a period of time the notebook has large number of cells and most of the stuffs is not required as I have organized that and move to a finalized notebook/Jar code. I am deleting the cells one by one but that can be time consuming. So I just wanted to know if there is a way to delete all the cells from the notebook at one.
There is one option at the top of the notebook that says delete cells but when I click on that it just deletes the single cell and not all the cells from the notebook.
Snapshot of the top UI where I see option of delete cells is as below :

When you work with Databricks notebooks, you can work in the two modes, similar to the VI editor:
Edit mode, when you edit the code of the individual cells
Command mode - in this case, you're working not with the code, but with the cells, so you can select several cells at once, cut/copy/paste/delete them, etc. (see documentation or click on the Shortcuts in the ? item of UI).
From the documentation you will see that in the command mode (press ESC to enter into it) you can select all cells, or you can use Shift + cursor to select several cells, and then you can press d two times, and this will trigger deletion of the selected cells.

After I went through #AlexOtt answer I tried that using the UI and it worked.
If you want to delete all the cells in a notebook then first you need to click on the select all cells option in the context menu. this would select all the cells that are present in the notebook and now if you select the option of delete cells then it would delete all the cells.
This was confusing as the option was showing text as delete cells and deleting only the cell that had the cursor but now I know how I can do it, UI options makes sense but could have been more user friendly.

Related

Is there any way to run/execute cells after a certain cell in Databrick notebook?

I'm wondering if it is possible to run cells after a certain cell on Databricks notebook instead of Run All option on top of notebook?
If you click onto the keyboard symbol in the menu, it will show you available shortcuts. What you need is <Shift>+<Option>+<Down>: Run all below commands (inclusive) (on Mac, on Window it could be slightly different combination). Or if you click onto the dropdown symbol of > character, you will see "Run all below" (see the docs)

How can I copy CTRL-C content from a single cell in DBISQL?

I use DBISQL on Win32 on 12.0.1.3769.
Usually, I am interested in copying cell content from the Result pane of DBISQL, not the contents of the complete row or the row headers.
I can do this by using the context menu and choose "copy data / cell" (I use a localised version, the English words may be different). But I would certainly like to use CTRL-C to copy only the contents of selected cells. As stated, however, CTRL-C copies the full row(s) including the header of the column. And all too often I try to insert a single number in a particular window, then insert an entire database unintentionally instead...
Question: Can I use another shortcut or is there a choice in the result pane to change the meaning of CTRL-C?
It looks like Ctrl-C maps to Copy Data Rows effectively, while you (and I!!!) would like to MALL map it to Copy Data-Columns, as defined here: copying columns, rows and cells in an Interactive SQL result collection.
The complete list of Interactive SQL Keyboard Shortcuts dates back to 1985... However, even then, cool children's software allows you to change the key tasks (hint, hint:)
Thank you, Volker, for noting how Really Annoying Ctrl-C is [ end sarcasm alert ] ... [ sarcasm alert ] Until now it had been only one of the little irritants of your life such as an alarm that went away before you finished sleeping.

Replace value with the average of it's column - many columns

I have an excel sheet with over 1000 columns and 11000 rows - all with numeric data. Within the data, there are missing values represented with '*'.
I would like to replace all of the '*' values with the average of the column that it is in.
Doing this manually would take a long time, so is there a formula that would achieve this?
Thanks so much in advanced for any help.
I can give you a three sheet solution Sam?:
Sheet 2:
Cell A1=
=AVERAGE(Sheet1!A:A)
Paste that along the top row for each of 1000 columns in sheet 2.
Sheet 3:
Cell A1=
=IF(Sheet1!A1="*",Sheet2!A$1,Sheet1!A1)
Copy that and then paste it into the entire worksheet 3 (i.e., that top left corner symbol that allows you to do that). It's gonna take a while to update but will deliver what you want!
As you have mentioned machine learning I thought I would introduce you to how you could do this with Azure Machine Learning Studio (AML) using a free account.
By using AML you gain access to a number of methods for replacing missing values which are extremely quick. AML has a Clean Missing Data module which exposes methods of replacement such as Multivariate Imputation using Chained Equation, Mean, Median and several others. The great thing here is you can visualize the dataset columns by right clicking on the dataset and see which columns have skew. You can then select on a column by column basis which replacement method to use. If you have heavily skewed columns you might use median instead for instance. This also offers great opportunities for data normalization (scale and reduce). You also gain access to using Python and R with your dataset.
I don't know if there is a method for directly treating "*" as missing values, I am trying to find that out, but if you do a little processing in advance of load then all is fine. The step before loading requires:
Export the sheet as a CSV and save it.
Use Ctrl+ F to bring up the find and replace dialog and enter "~*" for Find and leave Replace blank
Then login into AML and click the + New at the bottom of the screen
Select New > DATASET > FROM LOCAL FILE and select your file
When selecting type ensure to select CSV with no header if you data has no header row or with header if it does:
Your dataset will start uploading as shown by progress bar at bottom of screen and then appear in the SAVED DATASETS collection.
Click the + New button again and select EXPERIMENT > BLANK EXPERIMENT
Drag and drop your saved dataset onto the canvas on the right:
In the Search experiment items box on the right, type: Clean Missing Data
then drag the module that appears onto the canvas
Join the 2 boxes by clicking the dot at the bottom of the top box and dragging to the other box
Select the bottom box and then input the following parameters on the right (here is where you can choose which method to apply for missing values e.g. replace missing with mean, or perhaps median if your column data is skewed.
Right click the bottom module and select Run selected
Right click again and select Cleaned dataset > Save as Dataset
The progress bar at the bottom will inform you when complete
Type in the Search experiment items box again: convert to csv and drag that onto the canvas and connect the left hand side bottom of the second module to the top of the newly added third:
Select the bottom module and right click > Run selected
Wait for the progress bar to complete.
Right-click the bottom module and hit Download. Done.

Short-cut for selecting excel ready-made cell formatting. NOT formatting to table.

I have previously known an excel short-cut command that popped up an window where one could select a layout for a range of cells in the worksheet. There were many selections and several of them were quite beautiful. The layout would change the background color of the heading (first row selected), and format the first column and the cells in the body respectively.
I'm not talking about making tables or the table formatter, also it was only accessible through the shortcut command as far as I know (which I've now forgotten). Does anyone recall what I mean and could that person please share? I've been trying to remember it for some time now.
I believe what you first need to do is enable Excel to recognise your table. Take a look at my screenshot above!
By using Alt + O + A one will get up the AutoFormat window.

Create Excel VBA to delete specific text from cell in one column

I am trying to create an Excel VBA that would delete only a specific part of the cell in only one column.
In Column A, I have a directory values:
For example:
Directoryof K:\data\Admin\
What I would like to do is remove the "Directoryof" from all the cells in column A and leave only the remaining text that follows it.
To create a macro to perform the above follow the below steps:
Click the "Developer" tab on the top menu.
You will find an option "Record Macro".
Click the Record Macro ->
a. A dialog box appears, give your macro a name
b. Shortcut key (if you want) can give by pressing (shift and any key such as
letters)
c. Store macro in : This workbook (this allows your macro to run on this sheet).
Click on "Use Relative References".
Once you are done, just perform the delete operation ( by removing the portion you do not want) on one of the column so that the macro may record the process which you are performing.
Once done, below at the lowest pane you will find Stop Macro option (a small blue square box). Click it to stop the recording of the macro.
Now you are ready with a macro to replicate the same without you performing the operation.
Just goto any other column where you want to perform the operation and click on "Macro" option on the developer tab and then click on your created marco, and you will see the magic happen.
You could probably use regex to accomplish what you are going for. Regular Expressions are often used for finding patterns. If all of your follows the same format, you could break your strings apart into two capture groups with something like:
(.+)([A-Z]:\\.+)
https://regex101.com/r/uD4uJ0/2 <-- this will show you your capture groups
Edit: I updated this link, sorry, originally had the wrong one.
This here How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops will show you how to split up capture groups if you are interested.
You could use something like text to columns, fixed width, and split the columns after Directoryof and then copy/paste the values back into column A.
I'm not sure if there's a method to do this without a helper column without VBA. If you can afford to use a second column, you can also use =LEFT(Cell, # of characters) assuming that the part you want to strip off is always "Directoryof" and then copy/paste values back into column A.

Resources