How to iterate over rows of a dataset

How to iterate over rows of a dataset - apache-spark

I want to iterate over rows of a dataset (row by row) and get value of a certain column, how to achieve this ?
I tried with :
oldDF.foreach((ForeachFunction<Row>) row -> System.out.println(row));
Is it the right way ? else how to achieve it and how to access the value of a column of a row ?
Thanks ?

If you just want to output the value of a specific column, you can actually use a select query for this particular column and just show the results on the console like that:
oldDF.select("_2").show() // show the 2nd column's values of oldDF
In case you want to have the elements of this column in an Array of type Rows, you can swap the show() method with collect() or collectAsList() like this:
oldDF.select("_2").collect() // store the 2nd column's values of oldDF

Related

How to list the column index of a table in excel that contains a given value

I need to list all column indexes of a table in Excel with a given value.
I use function MATCH but it only shows the first column they found.
Some people tell me use OFFSET inside the MATCH function but I don't know how to do it.
Example with picture is shown here. I want to write a formula which show result like MARK column:
example

If I understand you correctly based on your example this should work:
=TEXTJOIN(",",1,IF(B2:F2="x",$B$1:$F$1,""))
B2:F2 is range in same row with "x" in it
$B$1:$F$1 is the header row
Basically what it does is it will replace all headers that don't have "x" with blank values and join all headers without blank values together

Power Query null issues

I am trying to create a custom column in power query that counts null values from column 7 - column 41. The function I have used is List.NonNullCount(List.Range(Record.FieldValues(),7,41))). There are many many rows of data so I would like it to calculate each row within the custom column.
For some reason even though each column between 7-41 has null values across the row the output is 2. I would expect the output to be 0. Futhermore I am getting an output of 35 when all columns from 7-41 have a value not equal to null across the row. I would expect 34 in this case. I did replace all blank values with null using the transform function in power query.
I'm not sure if my function List.NonNullCount(List.Range(Record.FieldValues(),7,41))) is correct or if there is a better way to do it. any help appreciated!!!!

In my version, you need to have an argument for Record.FieldValues. What you write: Record.FieldValues() would produce an error.
Also, the third argument of List.Range is the number of columns to return; it is not the number of the last column.
And the first column is column 0, not column 1
So something like:
List.NonNullCount(List.Range(Record.FieldValues(_),6,35))
should return the nonNull count for columns 7-41

Sqlite - how to iterate through Select Rows, then use in Insert

I'm trying to add on a column in which I calculate a count column. This count columns is at the User>Timestamp level, so if you start with an ordered table, you would go down from top to bottom, incrementing the count by 1, and setting it to 0 every time you get to a new user.
The way I thought I'd do this is using a Select statement to generate the ordered results, then iterate through each row and insert each row back into a new table, but also work out the count in the process.
In this case how do I use the row object returned by
for row in c.execute(sqlStr):
and easily re-insert all the columns in this row back into a new table but also add in a new column in the process?
I'm trying to avoid having to go through all the column names returned by row and construct a messy sql string (i.e. I only want to have to specify the new column and not type out all the existing column names).

Actually think I figured it out, though not sure if it's the best way of doing it.
I've gone with doing something like:
for row in c.execute(sqlStr):
a = str(row)[:-1] +', ' + newValue

Excel - if fields match output ID

Hoping someone can help me write a formula. I have table one where a certain title/string is listed multiple times. I also have table two where i have 2 columns, one with ID the other with same titles. I need to be able to assign IDs in table one from table 2 if they match

Just use the MATCH function. This will return the index of the matching cell. To get a True/False value, use the ISERROR method. Like so:
=IF(ISERROR(MATCH(A1,B:B, 0)), "False", "True")
Where:
- B is the column you want to match against
- A1 is the cell you want to check

Index match match - correct approach?

I have a data source in the format as the one below. In reality, that would contain few thousand rows.
I need to use something like INDEX-MATCH-MATCH in order to be able to get the "Status" for each "Content" item for each UserID.
The final result should look like this. The first two columns are not dynamic.
The INDEX formula goes to C and D.
I am using the following sequence to try and write the formula, but I don't seem to understand where the problem is.
=INDEX(Sheet1!A:K, [Vertical Position], [Horizontal Position])
look up the user with ID xxx:
=INDEX(Sheet1!A:K, MATCH(A2, Sheet1!A:K,0), [Horizontal Position])
look up the status for eLearn1.
=INDEX(Sheet1!A:K, MATCH(A2, Sheet1!A:K,0), MATCH("Status", Sheet1!A:K,0))
What am I doing wrong?

The question is not clear, but I think you are trying to do a LOOKUP based on the values of two columns. So for a particular value of Column A (UserID) and Column B (Content) you need to return Column H (Status).
This can be done using an array formula to return the row number of the matching line which can be fed into INDEX. Note, that this will only work as long as Columns A&B only have unique pairings.
I have set up some sample data:
Columns A-C are my source data. Cells G2:H4 are the lookup.
The formula is:
=INDEX($C$1:$C$7, SUM(($A$1:$A$7=$F2)* ($B$1:$B$7=G$1)*ROW($C$1:$C$7)))
This needs to be entered as an array formula by pressing CTRL-ALT-ENTER.
The formula works by matching the value you are searching for in both arrays and multiplying out the results. This should give you a result array consisting of all False with one True indicating the matched row. This is then multiplied against the row number to return the correct row to the INDEX formula.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to iterate over rows of a dataset - apache-spark

I want to iterate over rows of a dataset (row by row) and get value of a certain column, how to achieve this ? I tried with : oldDF.foreach((ForeachFunction<Row>) row -> System.out.println(row)); Is it the right way ? else how to achieve it and how to access the value of a column of a row ? Thanks ?

Related

How to list the column index of a table in excel that contains a given value

Power Query null issues

Sqlite - how to iterate through Select Rows, then use in Insert

Excel - if fields match output ID

Index match match - correct approach?

Categories

Resources