Combining only specified data into a single csv file

Combining only specified data into a single csv file - linux

I am familiar with combing csv files using cat. I also am familiar with doing so while specifying rows.
What I need to know though is how to combine only specified columns that start a specified row in the csv files. The csv files I am using are kinda wild but they are all the same format. I have no control over their output and am forced to figure out how to combine a couple hundred files (hopefully not manually).
Example of the data:
| Column1 | Column3 | Column4 | Column5 | Column6 | Column7 | Column8 | Column9 | Column10 | Column11 |
|--------------|---------|--------------|---------|--------------|---------|---------|---------|--------------|----------|
| garbage data | | garbage data | garbage | | | | garbage | | |
| garbage data | | garbage data | | | | | | | |
| garbage data | | garbage data | | | | | | | |
| garbage data | | garbage data | | | | | | | |
| garbage data | | garbage data | | garbage | garbage | | | | |
| garbage data | | garbage data | | good data 1 | | | | good data 1 | garbage |
| garbage data | | garbage data | | good data 2 | | | | good data 2 | garbage |
| garbage data | | garbage data | | good data 3 | | | | good data 3 | garbage |
| garbage data | | garbage data | | good data 4 | | | | good data 4 | garbage |
| garbage data | | garbage data | | good data 5 | | | | good data 5 | garbage |
| garbage data | | garbage data | | good data 6 | | | | good data 6 | garbage |
| garbage data | | garbage data | | good data 7 | | | | good data 7 | garbage |
| garbage data | | garbage data | | good data 8 | | | | good data 8 | garbage |
| garbage data | | garbage data | | good data 9 | | | | good data 9 | garbage |
| garbage data | | garbage data | | good data 10 | | | | good data 10 | garbage |
EDIT: The desired output would be row 6 where "good data" begins down (files are 1000 to 2000 rows each) from Columns 6 and 10.
EDIT 2: Desired Output
| Column10 | Column6 |
|--------------|--------------|
| good data 1 | good data 1 |
| good data 2 | good data 2 |
| good data 3 | good data 3 |
| good data 4 | good data 4 |
| good data 5 | good data 5 |
| good data 6 | good data 6 |
| good data 7 | good data 7 |
| good data 8 | good data 8 |
| good data 9 | good data 9 |
| good data 10 | good data 10 |
All feedback is most welcome.

If they really are CSV files,
awk -F, 'FNR>5 {print $6,$10}' *.csv > BigBoy.csv

Use sed and cut:
sed '1,6d' file | cut -f6,10
sed '1,6d' will remove all lines up to the sixth
cut -f6,10 will extract the needed columns (using tab as the delimiter)
To process all csv files in one go:
sed '1,6d' *.csv | cut -f6,10 > output.csv

Related

Find cell address of value found in range

tl;dr In Google Sheets/Excel, how do I find the address of a cell with a specified value within a specified range where value may be in any row or column?
My best guess is
=CELL("address",LOOKUP("My search value", $search:$range))
but it doesn't work. When it finds a value at all, it returns the rightmost column every time, rather than the column of the cell it found.
I have a sheet of pretty, formatted tables that represent various concepts. Each table consists of
| Title |
+------+------+-------+------+------+-------+------+------+-------+
| Sub | Prop | Name | Sub | Prop | Name | Sub | Prop | Name |
+------+------+-------+------+------+-------+------+------+-------+
| Sub prop | value | Sub prop | value | Sub prop | value |
+------+------+-------+------+------+-------+------+------+-------+
| data | data | data | data | data | data | data | data | data |
| data | data | data | data | data | data | data | data | data |
⋮
I have 8 such tables of variable height arranged in a grid within the sheet 3 tables wide and 3 tables tall except the last column which has only 2 tables--see image. These fill the range C2:AI78.
Now I have a table off to the right consisting in AK2:AO11 of
| Table title | Table title address | ... |
+---------------+-----------------------+-----+
| Table 1 Title | | ... |
| Table 2 Title | | ... |
⋮
| Table 8 Title | | ... |
I want to fill out the Table title address column. (Would it be easier to do this manually for all of 8 values? Absolutely. Did I need to in order to write this question? Yes. But using static values is not the StackOverflow way, now, is it?)
Based on very limited Excel/Google Sheets experience, I believe I need to use CELL() and LOOKUP() for this.
=CELL("address",LOOKUP($AK4, $C$2:$AI$78))
This retrieves the wrong value. For AL4 (looking for value Death Wave), LOOKUP($AK4, $C$2:$AI$78) should retrieve cell C2 but it finds AI2 instead.
| Max Levels |
+------------------+---------------+----+--+----+
| UW | Table Address | | | |
+------------------+---------------+----+--+----+
| Death Wave | $AI$3 | 3 | | 15 |
| Poison Swamp | $AI$30 | | | |
| Smart Missiles | $AI$56 | | | |
| Black Hole | #N/A | 1 | | |
| Inner Land Mines | $AI$3 | | | |
| Chain Lightning | #N/A | | | |
| Golden Tower | $AI$3 | | | |
| Chrono Field | #N/A | 25 | | |
The error messages for the #N/A columns is
Did not find value '<Table Title>' in LOOKUP evaluation.
My expected table is
| Max Levels |
+------------------+---------------+----+--+----+
| UW | Table Address | | | |
+------------------+---------------+----+--+----+
| Death Wave | $C$2 | 3 | | 15 |
| Poison Swamp | $C$28 | | | |
| Smart Missiles | $C$54 | | | |
| Black Hole | $O$2 | 1 | | |
| Inner Land Mines | $O$28 | | | |
| Chain Lightning | $O$54 | | | |
| Golden Tower | $AA$2 | | | |
| Chrono Field | $AA$39 | 25 | | |

try:
=INDEX(ADDRESS(
VLOOKUP(A2:A3, SPLIT(FLATTEN(D2:F4&""&ROW(D2:F4)), ""), 2, ),
VLOOKUP(A2:A3, SPLIT(FLATTEN(D2:F4&""&COLUMN(D2:F4)), ""), 2, ), 4))
or if you want to create jump links:
=INDEX(LAMBDA(x, HYPERLINK("#gid=1273961649&range="&x, x))(ADDRESS(
VLOOKUP(A2:A3, SPLIT(FLATTEN(D2:F4&""&ROW(D2:F4)), ""), 2, ),
VLOOKUP(A2:A3, SPLIT(FLATTEN(D2:F4&""&COLUMN(D2:F4)), ""), 2, ), 4)))

Try this:
=QUERY(
FLATTEN(
ARRAYFORMULA(
IF(
C:AI=$AK4,
ADDRESS(ROW(C:AI), COLUMN(C:AI)),
""
)
)
), "
SELECT
Col1
WHERE
Col1<>''
"
, 0)
Basically, cast all cells in the search range to addresses if they equal the search term. Then flatten that 2D range and filter out non-nulls.

Pivot, dynamic data source

I have a pivot chart which currently has the data source referring to my table.
Every week I run my code and a new row of data is appended to the bottom of my table.
The pivot does pick up this new data every week as it refers to the table, however, I want to take one less week each week So I have a years worth of data. So I want to include around rows.
Is there anyway to adjust my table to only include the years worth of rows?
Here is my sample data:
+----------+------------------+-----------------+
| week | stack | overflow |
+----------+------------------+-----------------+
| 12/20/17 | -142,335,432.00 | -41,641,109.88 |
| 12/27/17 | -105,428,220.20 | -47,448,990.63 |
| 1/3/18 | -88,520,154.56 | -24,858,774.97 |
| 1/10/18 | -42,033,431.10 | 14,573,779.35 |
| 1/17/18 | -66,101,748.16 | -8,670,735.22 |
| 1/24/18 | -75,871,649.12 | -18,000,154.21 |
| 1/31/18 | -77,027,686.63 | -11,784,198.64 |
| 2/7/18 | -96,720,126.71 | -52,219,288.98 |
| 2/14/18 | -119,118,554.60 | -34,743,350.28 |
| 2/21/18 | -116,529,554.70 | -20,774,072.93 |
| 2/28/18 | -86,871,998.53 | -25,993,521.20 |
| 3/7/18 | -90,351,387.27 | -21,259,727.05 |
| 3/14/18 | -77,968,076.28 | -51,609,924.29 |
| 3/21/18 | -120,805,352.60 | -40,338,490.97 |
| 3/28/18 | -92,247,583.62 | -14,525,648.04 |
| 4/4/18 | -70,821,451.36 | -35,866,864.46 |
| 4/11/18 | -82,694,486.66 | -59,009,729.82 |
| 4/18/18 | -79,034,094.39 | -64,231,312.42 |
| 4/25/18 | -63,415,815.16 | -28,612,265.37 |
| 5/2/18 | -80,372,191.96 | -53,375,611.61 |
| 5/9/18 | -72,619,415.73 | -50,642,469.19 |
| 5/16/18 | -109,654,240.70 | -45,762,784.43 |
| 5/23/18 | -100,407,366.50 | -39,577,966.11 |
| 5/30/18 | -105,794,095.80 | -65,071,199.59 |
| 6/6/18 | -83,630,201.98 | -60,981,969.88 |
| 6/13/18 | -104,644,821.50 | -63,754,760.71 |
| 6/20/18 | -75,229,424.33 | -55,803,681.24 |
| 6/27/18 | -65,237,135.62 | -54,693,832.65 |
| 7/4/18 | -60,025,672.33 | -44,367,918.60 |
| 7/11/18 | -30,172,175.09 | -28,392,163.28 |
| 7/18/18 | -20,687,864.39 | 24,300,285.63 |
| 7/25/18 | -40,476,447.03 | 4,850,881.09 |
| 8/1/18 | -31,211,625.05 | -67,887,918.30 |
| 8/8/18 | -29,736,938.87 | -32,905,703.80 |
| 8/15/18 | -74,934,647.91 | -65,611,884.73 |
| 8/22/18 | -25,220,747.20 | -7,019,746.86 |
| 8/29/18 | -24,608,552.13 | -8,065,633.97 |
| 9/5/18 | -30,119,599.95 | -26,225,633.08 |
| 9/12/18 | -29,836,379.12 | -10,045,560.95 |
| 9/19/18 | -61,281,567.61 | -58,427,878.27 |
| 9/26/18 | -47,418,209.59 | -33,451,409.22 |
| 10/3/18 | -41,321,336.46 | -25,112,764.44 |
| 10/10/18 | -1,241,932.51 | 21,814,274.35 |
| 10/17/18 | -19,791,273.66 | -12,199,449.75 |
| 10/24/18 | -20,501,406.84 | 1,225,387.11 |
| 10/31/18 | -64,116,464.30 | -5,308,628.21 |
| 11/7/18 | -83,657,672.02 | -19,922,992.91 |
| 11/14/18 | -112,704,007.53 | -32,939,535.69 |
| 11/21/18 | -71,969,954.54 | -51,335,709.79 |
| 11/28/18 | -79,668,484.56 | -67,887,918.30 |
| 12/5/18 | -44,134,343.99 | -32,905,703.80 |
| 12/12/18 | -71,700,079.84 | -65,611,884.73 |
| 12/19/18 | -82,238,011.30 | -74,725,620.20 |
| 12/26/18 | -59,385,932.41 | -54,947,256.94 |
| 1/2/19 | -42,717,830.26 | -31,110,199.14 |
| 1/9/19 | -11,029,444.63 | 7,309,440.90 |
+----------+------------------+-----------------+

Changing the source range for the pivot will be tricky as Excel does not allow non-contiguous cells to be used in pivot tables. Instead you can create the pivot by selecting entire column to account for all future entries.
Then the pivot can be manipulated to show a changing range as shown in the code below.
Hope that works for you.
EDIT
The code is updated below to include 50(can be changed) from the bottom.
Sub MovingPivot()
Dim ws As Worksheet
Dim dtTop As Date
Dim i As Integer, n As Long
Const NumWeeks = 50 'Change this to set weeks range
Set ws = ActiveSheet 'Set reference to your worksheet here
'reset the pivot filters
ws.PivotTables("Table1").PivotFields("Date").ClearAllFilters
'remove blank values
ws.PivotTables("Table1").PivotFields("Date").PivotItems("(blank)").Visible = False
'find the date entry in 50 places from bottom.
i = 0
For n = ws.PivotTables("Table1").RowRange.Count To 1 Step -1
If i = NumWeeks Then
dtTop = ws.PivotTables("Table1").RowRange.Cells(n).Value
Exit For
End If
i = i + 1
Next n
ws.PivotTables("Table1").PivotFields("Date").PivotFilters.Add2 Type:=xlAfterOrEqualTo, Value1:=Format(dtTop, "dd-mmm-yyyy")
End Sub

If you want to continue using a Pivot Chart, you can use the time line slicer to include/exclude data. You'll need to adjust the time line or filter manually after the data has refreshed. Or write VBA to set the filters.
A non-vba version that does not require slicers can also be achieved with a standard chart (not a pivot chart). Create named ranges with Offset functions that grab just the rows of data that you are interested in, then plug these range names into the standard chart. When new data is added to the table, the named ranges that feed the standard chart will also be updated.
If you need a step by step, take a look at https://peltiertech.com/Excel/Charts/DynamicLast12.html

How to transpose all subfields in front of the parent field of a pivot table in Excel

I have a data of automotive spare parts with their multiple store locations in a warehouse.
all I want to do is get the locations in front of the part number, so that it is easy to know all the locations of a specific part number.
The current pivot data looks like this
I've manually transposed a few rows in the below image, but the data contains around 70K rows, Hence I'm looking for a better solution
Kindly refer to the below table
+--------------+-----+-------+-------------+
| Item name | Qty | UoM | Stock |
+--------------+-----+-------+-------------+
| '0450000115 | 324 | piece | G12B04 |
| '0450000A61 | 312 | piece | G12B05 |
| '0450000115 | 336 | piece | G12B06 |
| '0450000A61 | 228 | piece | G12B07 |
| '0450000115 | 336 | piece | G12B08 |
| '0450000115 | 192 | piece | G12B09 |
| '087902E200A | 470 | piece | G12B10 |
| '087902E200A | 760 | piece | G12B13 |
| '087902E200A | 759 | piece | G12B14 |
| '0450000115 | 336 | piece | G12B15 |
| '087902E200A | 400 | piece | G12B16 |
| '087902E200A | 10 | piece | G3B32 |
| '084B410426 | 100 | piece | G3B32 |
| '087902E200A | 300 | piece | G4B08 |
| '0450000A61 | 2 | piece | GDB01 |
| '084B410426 | 60 | piece | GR.04.C.04. |
| '087902E200A | 327 | piece | HD.03.K.05. |
+--------------+-----+-------+-------------+

You need to create a measure, using the CONCATENATEX function. For this you need to add your data to the datamodel. You can do this by checking the box add this data to the datamodel on the bottom of the create pivottable dialogbox.
Rightclick the table on the Pivottable Fields Pane and select add measure. Then create the following measure: = CONCATENATEX('table','table'[Stock],", ")
Now put [Item name] on Rows and the measure [StockText] on Values. This should be the result:

Excel Formula to count all items in a group to see if the status is an Open status

I am working in Excel 2016. I am trying to figure out how many projects I have that have not had any part of it started. For instance if my project id is 203784 and it has 3 parts to it where 2 are Complete and 1 was Not Started. I would not want to count that. If the project had 3 parts and 2 were Not Started 1 was assigned. I would want to count that as 1. Thank you in advance you your assistance.
+----+------------+------------------+-------------+
| | A | B | C |
+----+------------+------------------+-------------+
| 1 | Project ID | Position | Status |
| 2 | 203784 | Staff | Complete |
| 3 | 203784 | Staff | Complete |
| 4 | 203784 | Staff | Not Started |
| 5 | 203785 | Maintenance | Complete |
| 6 | 203785 | Maintenance | In Progress |
| 7 | 203786 | Grounds | Complete |
| 8 | 203787 | Nurse | Complete |
| 9 | 203788 | Teacher | Complete |
| 10 | 203788 | Teacher | Complete |
| 11 | 203788 | Teacher | Complete |
| 12 | 203789 | Transportation | Complete |
| 13 | 203789 | Transportation | Complete |
| 14 | 203789 | Transportation | Complete |
| 15 | 203790 | Evacuation | Complete |
| 16 | 203790 | Evacuation | Complete |
| 17 | 203791 | Implementation | Complete |
| 18 | 203792 | Knowledge Base | Not Started |
| 19 | 203792 | Knowledge Base | Not Started |
| 20 | 203793 | Janitor | Not Started |
| 21 | 203794 | Public Relations | In Progress |
| 22 | 203795 | HR | Complete |
| 23 | 203796 | Admin | Complete |
+----+------------+------------------+-------------+
In this example. I would only want the count to show a total of 2. For project numbers 203792 and 203793.

One way would be to add a column (say Count) populated as:
=COUNTIFS(A:A,A2,C:C,"Complete")+COUNTIFS(A:A,A2,C:C,"In Progress")
and then create a PivotTable with Count as Filters and Project ID for Rows. Select 0 for the filter.

Excel VBA extrapolate values

I have a file that has data stored in it the following way (weekly data example)
+----------+----------+----------+----------+----------+----------+
| | WK1 | WK2 | WK3 | WK4 | WK5 |
+----------+----------+----------+----------+----------+----------+
| DT Begin | 29.12.14 | 05.01.15 | 12.01.15 | 19.01.15 | 26.01.15 |
| DT End | 04.01.15 | 11.01.15 | 18.01.15 | 25.01.15 | 01.02.15 |
| XData | 50 | 10 | 10 | 10 | 50 |
+----------+----------+----------+----------+----------+----------+
My problem ist to aggregate the XData on a monthly basis. For that I want to break the data down for days and then calculate the average.
Edit: I changed the table as it was not clear what I meant. This averages to ((50*4)+(10*21)+(5*50))/31 = 22.90
+------------+-------+
| Date | Value |
+------------+-------+
| 01.01.2015 | 50 |
| 02.01.2015 | 50 |
| 03.01.2015 | 50 |
| 04.01.2015 | 50 |
| 05.01.2015 | 10 |
| 06.01.2015 | 10 |
| 07.01.2015 | 10 |
| 08.01.2015 | 10 |
| 09.01.2015 | 10 |
| 10.01.2015 | 10 |
| 11.01.2015 | 10 |
| 12.01.2015 | 10 |
| 13.01.2015 | 10 |
| 14.01.2015 | 10 |
| 15.01.2015 | 10 |
| 16.01.2015 | 10 |
| 17.01.2015 | 10 |
| 18.01.2015 | 10 |
| 19.01.2015 | 10 |
| 20.01.2015 | 10 |
| 21.01.2015 | 10 |
| 22.01.2015 | 10 |
| 23.01.2015 | 10 |
| 24.01.2015 | 10 |
| 25.01.2015 | 10 |
| 26.01.2015 | 50 |
| 27.01.2015 | 50 |
| 28.01.2015 | 50 |
| 29.01.2015 | 50 |
| 30.01.2015 | 50 |
| 31.01.2015 | 50 |
+------------+-------+
| Average | 22.90 |
+------------+-------+
After having done this calculation I want to summarize the data as follows for the entire year:
+-------+-------+-------+------+------+
| | Jan | Feb | Mar | ... |
+-------+-------+-------+------+------+
| XData | 22.90 | 22.00 | 23.1 | ... |
+-------+-------+-------+------+------+
Being a newbie in Excel VBA, I have extreme trouble doing this.
I know how to get to the value of a cell (Range.Value) but not how to find data in a particular week (as WK1 is there for 2014 as well) Range.Find with a date other than the one in the cell itself does not seem to work.
Whar I am asking for is a way to approach this problem. My particular difficulties are to:
Find the data in the worksheet
split the week values into day values (see table above)
Copy the data or hold it in some sort of data structure
calculate the average (this should be ease then)
fill in the data on a monthly basis
As you can see, I have trouble even getting started - any hints would be greatly appreciated. Maybe I'm thinking of this entirely too complicated? Thank you!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Combining only specified data into a single csv file - linux

If they really are CSV files, awk -F, 'FNR>5 {print $6,$10}' *.csv > BigBoy.csv

Use sed and cut: sed '1,6d' file | cut -f6,10 sed '1,6d' will remove all lines up to the sixth cut -f6,10 will extract the needed columns (using tab as the delimiter) To process all csv files in one go: sed '1,6d' *.csv | cut -f6,10 > output.csv

Related

Find cell address of value found in range

Pivot, dynamic data source

How to transpose all subfields in front of the parent field of a pivot table in Excel

Excel Formula to count all items in a group to see if the status is an Open status

Excel VBA extrapolate values

Categories

Resources