How to resolve Excel not removing duplicates?

How to resolve Excel not removing duplicates? - excel

Probably a dumb question, but I'm having an issue where Microsoft Excel is successfully finding duplicates and SAYING its deleting them in my CSV file, but not actually deleting them or otherwise making any change to the file.
I usually have no problem at all, but I suspect the issue may have to do with the unusually large file size I'm working with today.
To walk through my steps for better troubleshooting, I have columns A-AC, I am selecting column E, which is a list of usernames. I go to the top bar, select data > data tools > remove duplicates. There is a popup menu that confirms if I want to select that column, I select "continue with current selection", then select "ok" and "remove duplicates". It says it's removed the duplicates displaying the expected/correct number (see screenshot), but then no changes are made. If I follow the same steps again, I get the exact same results.
As an experiment, I tested it in a separate CSV file with only the top 20 rows, and it worked perfectly fine. But with the other file that has 15k rows, it doesn't work.
My goal is to remove any rows that contain a duplicate username in column E.
Any idea what's going haywire?

Related

Table doesn't expand when adding new data (from .csv files in my case)

Lets say I got some external .csv files which I got updated and I just need to hit the refresh button in Power Query to make some magic - that works fine, BUT, there some columns which are information about some parts, and I need to lookup values for them in another .csv file. What I did here is, I didnt convert all 4 Columns in a Table, but I separated them, each column has another name (table name) because I had some issues with refreshing from Power Query, and seemed easier to do calculation first and then convert to table.. maybe that was not smart tough??
My question is and issue actually, I am not getting new rows with new data beneath my "tables" I must drag it down to populate. Why that occurred?
These are functions I used from starting Column:
=INDEX(Matrix[[#All];[_]];ROW())
Then others are just lookup ones depending which info I am looking for:
=INDEX(variantendb[Vartext];MATCH(C2;variantendb[Variante];0))
And last column and calculation is concatinating to have Info name and Code together:
='0528 - info'!$D2 & " "& "("&'0528 - info'!$C2&")"
And of all of them I made in 5x Tables SEPARATELY, not as one table. Maybe I should do with one table, and then do the calculations and then it will be dynamically updated?
It is automatically updated only when I add new data somewhere in the middle of .csv but not when is in a last row, then it is not expanding!

Well, I solved it. How? Using Power Query at its best, I played around and actually gave me complete another approach to my problem, using Merge function and a bit of formatting. Works flawlessly, with minimum functions afterwards. What is important it refreshes in a milisecond - PROPERLY!!!
I am amazed by PQ and its functionality.

Why empty excel file is 5.5mb large?

I have an empty excel file, which is 5.5mb large.
If I open it - the process is very laggy, even on fast PC (intel i7 processor).
It opens ~30 sec.
When it opens, it shows that the document has 1048576 rows.
I tried to delete them - but unsuccessfully.
If I remove the G column, the file size gets decreased by half (2.5mb).
If I remove the Entire Sheet, adding new empty one, the file size gets 8kb.
The question is not about how to solve the problem, but what does cause the problem, why this is happening and how do I remove unused rows? I tried to delete them in different ways. saved the document-reopened - no success.
Here is the document, if you need: https://files.fm/u/erfr4weq

Save the excel file with the open xml format, unrar and open it with the editor to see what is going on in it.
Please note that this approach is only valid for xlsx files (office 2007 and onwards)

It's most likely not actually empty, there will be some hidden data somewhere. Try the clear function on excel instead of just the delete button.

I had the same thing happen.
I had 1200 data records that took up 5MB, which is odd.
I looked at the right scroll bar and saw that it was small. Excel had added 10,000 more rows, which I found out.
I got rid of the extra rows, so it is now 152kb.
Check to see if the number of rows on your scroll bar matches the number of rows you're using.
Then get rid of them.
Even though it looks like you're not deleting anything, you are.

Access VBA "Record Too Large"

I have not found an answer to this issue on the net: maybe an Access bug?
I have Windows 10, and Access 2016. I have 102 fields and 40 records. There are 14 long text fields for each record. None of the long text entries contain more than 200 characters. The records in question are set to "Long Text" (used to be "Memo" in earlier Access versions).
The software that I wrote and have used for 5 years with Access 2010, imports an Excel Workbook. Now I use that same software with Access 2016 and have started getting the error described here. This is the 4th db I have setup using Access 2016 and this is the first time I have seen this problem.
When I tried to type entries into one or two cells in a "long text" field in a record, an error was generated saying the "Record is too large". The same field on other records work as expected. Only the cell on that given record is generating an error. Like I said, I have never seen this error in other versions of Access.
I have performed 1) "compact and repair", 2) exported the table to a new table, and 3) exported the table to Excel and, with a new Access record, cut and paste by hand all 102 records. Item number 3) works most of the time, efforts 1) and 2) have never fixed the problem.
The incident leading me to seek help is that this time, performing step 3) above, with a new record, I have one cell that generates the "record too large" error again. I noticed the entry cell in Excel that I was copying from had several semi-colons: I removed them, tried to cut and paste to the Access Cell with no success. I tried typing the entry into the cell instead of pasting it and get the error.
I really am at a loss to what the issue is with this problem and I need some help. Has anyone ever experienced this issue?

I have 102 fields and 40 records. There are 14 long text fields for each record. None of the long text entries contain more than 200 characters.
I'd refactor the schema, yesterday. This is exactly what one-to-one relationships are for. Move a subset of columns into another table, relate PK to PK. 102 columns is too many concerns stuffed into one single table. Break it down - regardless of the "record too large" error.
That said if none of the long text entries contain more than 200 characters, then why are they long text in the first place? I'd make them variable-length character columns (that would be nvarchar on SQL Server, not sure about Access), with perhaps 255 characters capacity.

thank you for all of the input. I cannot post the code. I was, however, able to work a solution for the issue as described below in case others get in this same predicament.
Export offending Table to Excel, preserve formatting.
Export offending Table to Access, same db, preserve definition, do not export data.
Change several (offending) Fields in the blank (new) table from "Short Text" to "Long Text" (Memo).
Append the exported Excel sheet created in step 1. into the blank Table created in step 2.
These steps resolved my issue and got me out of a painful jam. Thank you all for the help and ideas.
v/r,
Johnny

There is a chance you found Excel-import-related bug in Microsoft Access.
Create the table anew to work around the defect to get its internal data right.
The problem is in larger tables auto-created on import from Excel. Even if the length or count of their fields does not exceed any limits, you can still start getting error "Record too Large". Executing Compact and Repair action does not remedy this issue.
So if you are sure that your data structure does not exceed Access limitations and then you re-create the table with the same fields and their lengths, the error is gone.
As a proof, currently I have two tables in my database, both with identical internal structure. The one created by import reports "Record too Large" and the one I created manually (by copy-paste of fields in design view) is OK.
So we can say that upon import from Excel, there is a specific Access bug which was not corrected as of today (2018-10-17) in Access 2016. Work it around in the above way.

SSIS : Read Excel File too lines

I work on SSIS project (Visual studio 2012)
A browser send me an excel file with articles.
Informations begin in A2, until column G.
Then I use
SELECT * FROM [ProduitsFamilles$A2:G]
My problem is, I saw in this file last line is 16143 (under empty cell)
And when I run it, It gives 16382 lines ... then 200 lines empty crush import in database cause primary key can't be empty.
I think it's because before send this file, browser delete old unless row.
Using "conditional Split" give good responce but I Want know If I can break directly empty row, like using where clause...

SSIS will handle excel files and stop at the empty row automatically if you choose a table instead of a sql statement. I believe you could also select specific columns in the Select clause of an sql statement instead of defining your range in the From clause. SSIS generally assumes there is a header row, though you can also specify this.
There is another possible issue, which is that cells can be active and empty in Excel instead of inactive. To test this, press ctrl and the down arrow at the top of a full column in your sheet. It should stop at the last cell with data in it (the 16143th row in your case); if it instead goes down to the 16382th row, you'll know you have a bunch of empty but active rows that need to be taken care of before importing.
In general, it's a lot easier to use .csv files with SSIS than Excel files, which tend to have these types of formatting issues.

Getting one Extra row when exporting SSRS report to Excel

I have a simple Reporting Services report, a simple table, created with BIDS 2005, with the report wizard.
I run the report on a RS2008 R2 server as is and it renders perfectly.
When I export to Excel, an extra row is appended just below the table. The row is hidden and has a heigth of 409.5.
Where that row comes from ?
How to get rid of it ?
*nb - no extra row if run on a RS2005 server

The only way I found to eliminate the hidden row is change the layout of the report. I increased the height of all rows of from 0,53333cm to 0,538cm.
Anything less than 0,538cm doesn’t solve the issue.
According to Microsoft, the goal when exporting to excel is to match the visual appearance of the report as close as possible. The excel output may have unexpected things like extra rows or columns or merged cells as part of the process to match the layout.

Changing the tablix location to 0cm, 0cm , will fix the problem.

I was running into this issue and tried all the posted solutions I could find, but none worked for me. To be more specific, after exporting the SSRS report to excel there was an extra row that contained duplicated data from the first row of the group. This extra row was contained in a group that could be toggled and when that group was collapsed that extra row was still showing instead of nothing.
This was the report layout looked like before I made the change.
What I had to do was add an extra row above and outside the nested grouping by right clicking the group box and selecting "Add row" -> "Outside Group - Above"
Here is the report after.
After adding the rows outside the group there was no duplicated data in an extra row.

Try to change the Size of report(not table) to 0.0pt, 0.0pt.It will automatically set it to minimum required.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string