Remove lines that doesn't contains a string with Sumlimes - text

I have 6 huge text files, and i need to filter them by deleting all the lines that doesn't contains the string: 53=S.
For 5 of them, i managed to filter the files with notepad++ as follow:
Find --> Mark --> Bookmark Lines --> Mark All --> Search --> Bookmarks -- > Remove Unbookmarked Lines
However, the application collapsed for a specific file each time i tried it. I tried it in two PCs with the same result.
Anyone know how can i remove the irrelevant lines with Sublimes or any other tool?

You could try a regular expression replace in notepad++.
Using notepad++, press ctrl+h or bring up the search>replace window.
In the 'find what' text box enter ^(?m)^(?:(?!53=S).)*$
and leave the 'replace with' text box empty
Make sure the search mode is set to 'Regular Expression' and then hit 'Replace All'
This should remove any line that doesn'tcontain the string 53=S

There is a notepad++ plugin called LineFilter (not LineFilter2), which provides a menu with entries like
delete all lines containing the selection
delete all lines not containig the selection
it opens a new tab with the result. That worked on large files. I liked it a lot.
The plugin is available from Notepad++ Plugin Central.
If you have grep available, then grep should do the trick, too.

Related

How to match lines with only one tab in vim?

I have a csv file. Some lines have 7 columns with tab delimited, and others have only one tab. I want to find all lines with only one tab and remove them.
What's the command to do this in VIM? I tried this but it doesn't work:
[^\t]+\t[^\t]+
Assuming you have a single tab or more on every line, this will remove all lines without multiple tabs.
:v/\t.*\t/d
If you have lines with no tabs that you want to retain, this will not work as it will remove them.

Prevent automatic tab insertion or conversion of spaces to tabs

Google Docs has a "feature" that sometimes converts four spaces to one tab.
Copying and pasting text does not solve this problem, because the spaces in that text are converted to tabs automatically.
Is there a way to turn this off?
No way to turn of that I know of. So annoying.
You can work-around using normal copy-paste, then a search-and-replace.
Copy-Paste you content into the Google Doc
In a text-editor, enter a tab character then cut it to your clipboard
Back in Google Docs, highlight the content you wish to fix
Hit Ctrl + H to open Find and replace dialogue
Paste the tab character into the Find field
Insert 4 space characters into the Replace with field
Click Replace all
The approach that caused me the least headaches was to replace all spaces by another character (say underscore) in the original text, copy/paste it, then replace the underscore using find+replace. This was in Google slides.
i use cmd+shift+v (edit -> paste without formatting) to paste.
The spaces are not converted to tabs.
I did find one solution: there is a Chrome plugin called "Drive Notepad" which edits google drive files and has an option "Tabs: hard"

How to remove invisible line break character

I have big data at excel, and some cells contains html codes. These cells have line breaks in them. I tried to replace line breaks (Alt+010, \n) but excel said there is no char like this.
When I copied cell to notepad, there is no line break.
When I copied from notepad to phpmyadmin sql area or textpad, I see line breaks again.
There are notepad, textpad and phpmyadmin sql area screenshots below. How can I remove these invisible line breaks?
This could be a problem with Carriage Return + Line Feed. When you press Alt+Enter in Excel it only incerts a Line Feed. But if you somehow get both Carriage Return + Line Feed in a cell that could leed to additional problems. See this page for solutions:
https://www.ablebits.com/office-addins-blog/2013/12/03/remove-carriage-returns-excel/
Did you try to remove any unnecessary tab within the code? Also check for some trivial things like e.g string max length in your mysql database or editor's miscellaneous settings.
EDIT. oh, I forgot. It may be also caused by your language settings, check for default database's regional coding preset and if Turkish is currently supported.
Line breaks - do you mean the line breaks you could introduce in Excel with ALT+ENTER?
Then you could use Search / Replace option in Excel without need to copy your content to another tool:
Open it and introduce in Search for CTRL+J (you will receive a point displayed in the search field).
In Replace you could introduce what you want (nothing, a space, a semicolon, ...).
Select Replace all.
EDIT:
I've tested it by copying html from textpad to one cell using clipboard. With this the method described by me is not working.
But there is another solution: Open replace command, for "search string" introduce ALT-Key (keep it pressed), then introduce by using the numeric key pad (on the right side of a "standard" keyboard) the tree digits 0 1 0 and finally release ALT-Key (you will see a point displayed in the search field). Choose as replacement string what you want and choose replace all.
Function =clean() helped me. Find/replace with ALT+J worked to replace, but did not fully deleted all the invisible characters in the string, so the cell was still misbehaving with text in columns. The =clean() function finally removed all the invisible characters left there.

Remove non utf8 lines in text file

How do i remove only non utf8 keywords/lines in a text file.
eg.
你好
相手123abc
this is only abc
I only want to remove lines that contain all english words and not the lines with utf8 words. So in this case only 'this is only abc' will be removed. Is it possible to do it in notepad++ or do i need to write a script for it?
This is possible using the following steps;
Open Notepad++ select the Find menu and select the last tab 'Mark', enter the regex ^(([a-zA-Z])+\s?)+, select Bookmark Line, and click the button 'Mark All'.
From the drop down menu select; Search --> Bookmark --> Remove Bookmarked Lines
I would also recommend making sure Notepad++ is up to date. I tested this with version 6.3. Marking lines is something added quite recently.

Find and replace help, to remove certain things from text

I have file which contain 18k lines of text which consists of links and rondom ID codes and looks like this:
"
http://arduino.cc/en/Main/ArduinoBoardNano
SC09661
http://arduino.cc/en/Main/ArduinoBoardUno
http://www.farnell.com/datasheets/1639172.pdf
SC09670
http://arduino.cc/en/Main/ArduinoBoardUno
SC09665
http://arduino.cc/en/Main/ArduinoEthernetShield
SC09662
http://arduino.cc/en/Main/ArduinoXbeeShield
CS23020
http://bcove.me/zypzpy2q
SC09147
http://cache.national.com/ds/LM/LM134.pdf
SC08546
http://cache.national.com/ds/LM/LM2574.pdf
SC08540
http://cache.national.com/ds/LM/LM2576.pdf
"
I need to remove from this text all those ID codes (SC08540,SC09662,...) and links which not ends with .pdf, I know its posible with Notepad++ and other programs, with Replace funkction, but I dont know how exacly should I do this. Maybe I could get help with this?
I have not found a way to do this in one go with Notepad++ but this should work:
Open the replace box (Search --> Replace...) and select Regular expression
Search for ^.*[^\.][^p][^d][^f]$
Make sure Replace with is empty
Replace All
Now you have a file with empty lines and the links you want. There are at least two ways to get rid of the empty lines:
Method 1: TextFX plugin
Select all text
TextFX --> TextFX Edit --> Delete blank lines
Method 2: Replace
Make sure the cursor is at the beginning of the document
Open the replace box (Search --> Replace...) and select Extended
Search for \n\r
Make sure Replace with is empty
Replace All

Resources