Differences between new String and Bytes.toString

Differences between new String and Bytes.toString - string

I am using hbase utility org.apache.hadoop.hbase.util.Bytes
I generated a an array of Bytes from a string (in a example in Scala):
val bytes = Bytes.toBytes("test")
and want to convert back in String.
What is the difference between new String(bytes,"UTF-8") and Bytes.toString(bytes)
They both works.

At a guess that you are talking about https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html, basically nothing: Bytes.toString will call new String, except if the array is empty. You can see the method called here:
public static String toString(final byte [] b, int off, int len) {
if (b == null) {
return null;
}
if (len == 0) {
return "";
}
return new String(b, off, len, UTF8_CHARSET);
}
For the future, please mention any libraries you are using in the question (and the question is completely unrelated to Scala).

Related

j2me - How to store custom objects using RMS

In the applications I'm developing I need to store data for Customer,Products and their Prices.
In order to persist that data I use RMS, but knowing that RMS doesn't support object serializing directly and since that data I read already comes in json format, I store every JSONObject as its string version, like this:
rs = RecordStore.openRecordStore(mRecordStoreName, true);
JSONArray jsArray = new JSONArray(data);
for (int i = 0; i < jsArray.length(); i++) {
JSONObject jsObj = jsArray.getJSONObject(i);
stringJSON = jsObj.toString();
addRecord(stringJSON, rs);
}
The addRecord Method
public int addRecord(String stringJSON, RecordStore rs) throws JSONException,RecordStoreException {
int id = -1;
byte[] raw = stringJSON.getBytes();
id= rs.addRecord(raw, 0, raw.length);
return id;
}
So I have three RecordStores (Customer,Products and their Prices) and for each of them I do the save as shown above to save their corresponding data.
I know this might be a possible to solution, but I'm sure there's gotta be a better implementation. Even more,considering that over those three "tables" I'm going to perform searching, sorting,etc.
In those cases, having to deserialize before proceeding to search or sort doesn't seem a very good idea.
That's why I want to ask you guys. In your experience, how do store custom objects in RMS in way that is easy to work with them later??
I really appreciate all your comments and suggestions.
EDIT
It seems that it's easier to work with records when you define a fixed max length for each field. So here's what I tried:
1) First all, this is the class I use to retrieve the values from the record store:
public class Customer {
public int idCust;
public String name;
public String IDNumber;
public String address;
}
2) This is the code I use to save every jsonObject to the record store:
RecordStore rs = null;
try {
rs = RecordStore.openRecordStore(mRecordStoreName, true);
JSONArray js = new JSONArray(data);
for (int i = 0; i < js.length(); i++) {
JSONObject jsObj = js.getJSONObject(i);
byte[] record = packRecord(jsObj);
rs.addRecord(record, 0, record.length);
}
} finally {
if (rs != null) {
rs.closeRecordStore();
}
}
The packRecord method :
private byte[] packRecord(JSONObject jsonObj) throws IOException, JSONException {
ByteArrayOutputStream raw = new ByteArrayOutputStream();
DataOutputStream out = new DataOutputStream(raw);
out.writeInt(jsonObj.getInt("idCust"));
out.writeUTF(jsonObj.getString("name"));
out.writeUTF(jsonObj.getString("IDNumber"));
out.writeUTF(jsonObj.getString("address"));
return raw.toByteArray();
}
3) This is how I pull all the records from the record store :
RecordStore rs = null;
RecordEnumeration re = null;
try {
rs = RecordStore.openRecordStore(mRecordStoreName, true);
re = rs.enumerateRecords(null, null, false);
while (re.hasNextElement()) {
Customer c;
int idRecord = re.nextRecordId();
byte[] record = rs.getRecord(idRecord);
c = parseRecord(record);
//Do something with the parsed object (Customer)
}
} finally {
if (re != null) {
re.destroy();
}
if (rs != null) {
rs.closeRecordStore();
}
}
The parseRecord Method :
private Customer parseRecord(byte[] record) throws IOException {
Customer cust = new Customer();
ByteArrayInputStream raw = new ByteArrayInputStream(record);
DataInputStream in = new DataInputStream(raw);
cust.idCust = in.readInt();
cust.name = in.readUTF();
cust.IDNumber = in.readUTF();
cust.address = in.readUTF();
return cust;
}
This is how I implemented what Mister Smith suggested(hope it's what he had in mind). However, I'm still not very sure about how to implement the searchs.
I almost forget to mention that before I made theses changes to my code, the size of my RecordStore was 229048 bytes, now it is only 158872 bytes :)

RMS is nothing of the sort of a database. You have to think of it as a record set, where each record is a byte array.
Because of this, it is easier to work with it when you define a fixed max length for each field in the record. For instance, a record could be some info about a player in a game (max level reached, score, player name, etc). You could define the level field as 4 bytes long (int), then a score field of 8 bytes (a long), then the name as a 100 bytes field (string). This is tricky because strings usually will be of variable length, but you would probably like to have a fixed max length for this field, and if some string is shorter than that, you'd use a string terminator char to delimite it. (This example is actually bad because the string is the last field, so it would have been easier to keep it variable length. Just imagine you have several consecutive fields of type string.)
To help you with serialization/deserialization, you can use DataOutputstream and DataInputStream. With these classes you can read/write strings in UTF and they will insert the string delimiters for you. But this means that when you need a field, as you don't know exactly where it is located, you'll have to read the array up to that position first.
The advantage of fixed lengths is that you could later use a RecordFilter and if you wanted to retrieve recors of players that have reached a score greater than 10000, you can look at the "points" field in exactly the same position (an offset of 4 bytes from the start of the byte array).
So it's a tradeoff. Fixed lengths means faster access to fields (faster searches), but potential waste of space. Variable lengths means minimum storage space but slower searches. What is best for your case will depend on the number of records and the kind of searches you need.
You have a good collection of tutorials in the net. Just to name a few:
http://developer.samsung.com/java/technical-docs/Java-ME-Record-Management-System
http://developer.nokia.com/community/wiki/Persistent_Data_in_Java_ME

How do I reverse a String in Dart?

I have a String, and I would like to reverse it. For example, I am writing an AngularDart filter that reverses a string. It's just for demonstration purposes, but it made me wonder how I would reverse a string.
Example:
Hello, world
should turn into:
dlrow ,olleH
I should also consider strings with Unicode characters. For example: 'Ame\u{301}lie'
What's an easy way to reverse a string, even if it has?

The question is not well defined. Reversing arbitrary strings does not make sense and will lead to broken output. The first (surmountable) obstacle is Utf-16. Dart strings are encoded as Utf-16 and reversing just the code-units leads to invalid strings:
var input = "Music \u{1d11e} for the win"; // Music 𝄞 for the win
print(input.split('').reversed.join()); // niw eht rof
The split function explicitly warns against this problem (with an example):
Splitting with an empty string pattern ('') splits at UTF-16 code unit boundaries and not at rune boundaries[.]
There is an easy fix for this: instead of reversing the individual code-units one can reverse the runes:
var input = "Music \u{1d11e} for the win"; // Music 𝄞 for the win
print(new String.fromCharCodes(input.runes.toList().reversed)); // niw eht rof 𝄞 cisuM
But that's not all. Runes, too, can have a specific order. This second obstacle is much harder to solve. A simple example:
var input = 'Ame\u{301}lie'; // Amélie
print(new String.fromCharCodes(input.runes.toList().reversed)); // eiĺemA
Note that the accent is on the wrong character.
There are probably other languages that are even more sensitive to the order of individual runes.
If the input has severe restrictions (for example being Ascii, or Iso Latin 1) then reversing strings is technically possible. However, I haven't yet seen a single use-case where this operation made sense.
Using this question as example for showing that strings have List-like operations is not a good idea, either. Except for few use-cases, strings have to be treated with respect to a specific language, and with highly complex methods that have language-specific knowledge.
In particular native English speakers have to pay attention: strings can rarely be handled as if they were lists of single characters. In almost every other language this will lead to buggy programs. (And don't get me started on toLowerCase and toUpperCase ...).

Here's one way to reverse an ASCII String in Dart:
input.split('').reversed.join('');
split the string on every character, creating an List
generate an iterator that reverses a list
join the list (creating a new string)
Note: this is not necessarily the fastest way to reverse a string. See other answers for alternatives.
Note: this does not properly handle all unicode strings.

I've made a small benchmark for a few different alternatives:
String reverse0(String s) {
return s.split('').reversed.join('');
}
String reverse1(String s) {
var sb = new StringBuffer();
for(var i = s.length - 1; i >= 0; --i) {
sb.write(s[i]);
}
return sb.toString();
}
String reverse2(String s) {
return new String.fromCharCodes(s.codeUnits.reversed);
}
String reverse3(String s) {
var sb = new StringBuffer();
for(var i = s.length - 1; i >= 0; --i) {
sb.writeCharCode(s.codeUnitAt(i));
}
return sb.toString();
}
String reverse4(String s) {
var sb = new StringBuffer();
var i = s.length - 1;
while (i >= 3) {
sb.writeCharCode(s.codeUnitAt(i-0));
sb.writeCharCode(s.codeUnitAt(i-1));
sb.writeCharCode(s.codeUnitAt(i-2));
sb.writeCharCode(s.codeUnitAt(i-3));
i -= 4;
}
while (i >= 0) {
sb.writeCharCode(s.codeUnitAt(i));
i -= 1;
}
return sb.toString();
}
String reverse5(String s) {
var length = s.length;
var charCodes = new List(length);
for(var index = 0; index < length; index++) {
charCodes[index] = s.codeUnitAt(length - index - 1);
}
return new String.fromCharCodes(charCodes);
}
main() {
var s = "Lorem Ipsum is simply dummy text of the printing and typesetting industry.";
time('reverse0', () => reverse0(s));
time('reverse1', () => reverse1(s));
time('reverse2', () => reverse2(s));
time('reverse3', () => reverse3(s));
time('reverse4', () => reverse4(s));
time('reverse5', () => reverse5(s));
}
Here is the result:
reverse0: => 331,394 ops/sec (3 us) stdev(0.01363)
reverse1: => 346,822 ops/sec (3 us) stdev(0.00885)
reverse2: => 490,821 ops/sec (2 us) stdev(0.0338)
reverse3: => 873,636 ops/sec (1 us) stdev(0.03972)
reverse4: => 893,953 ops/sec (1 us) stdev(0.04089)
reverse5: => 2,624,282 ops/sec (0 us) stdev(0.11828)

Try this function
String reverse(String s) {
var chars = s.splitChars();
var len = s.length - 1;
var i = 0;
while (i < len) {
var tmp = chars[i];
chars[i] = chars[len];
chars[len] = tmp;
i++;
len--;
}
return Strings.concatAll(chars);
}
void main() {
var s = "Hello , world";
print(s);
print(reverse(s));
}
(or)
String reverse(String s) {
StringBuffer sb=new StringBuffer();
for(int i=s.length-1;i>=0;i--) {
sb.add(s[i]);
}
return sb.toString();
}
main() {
print(reverse('Hello , world'));
}

The library More Dart contains a light-weight wrapper around strings that makes them behave like an immutable list of characters:
import 'package:more/iterable.dart';
void main() {
print(string('Hello World').reversed.join());
}

There is a utils package that covers this function. It has some more nice methods for operation on strings.
Install it with :
dependencies:
basic_utils: ^1.2.0
Usage :
String reversed = StringUtils.reverse("helloworld");
Github:
https://github.com/Ephenodrom/Dart-Basic-Utils

Here is a function you can use to reverse strings. It takes an string as input and will use a dart package called Characters to extract characters from the given string. Then we can reverse them and join again to make the reversed string.
String reverse(String string) {
if (string.length < 2) {
return string;
}
final characters = Characters(string);
return characters.toList().reversed.join();
}

Create this extension:
extension Ex on String {
String get reverse => split('').reversed.join();
}
Usage:
void main() {
String string = 'Hello World';
print(string.reverse); // dlroW olleH
}

Reversing "Hello World"

ConcurrentModificationException in HashSet

I have my code as below and I'm getting ConcurrentModificationException, particularly in the line for (String file : files)
I don't change anything for the "file" when doing iteration, so why the exception will be caused and how should I avoid it? Thanks for any suggestion!
int getTotalLength(final HashSet<String> files) {
int total = 0;
int len;
for (String file : files) {
len = getLength(file);
if (len != Long.MIN_VALUE) {
total += len;
}
}
return total;
}
int getLength(String file) {
int len = Long.MIN_VALUE;
if (file == null) {
return len;
}
File f = new File(file);
if (f.exists() && f.isFile()) {
len = f.length();
}
return size;
}

Refering to you comment, declaring final HashSet<String> files makes variable files finale - that means that you cannot assign another object to this variable inside this variable's scope. HashSet itself is mutable object and can be modified - it has nothing to do with final modifier (reference to the set object itselt is still the same).
If you want to work concurently on same object (same hashset) use synchronized blocks or methods.
Generally speaking, you cannot modify collection (in same or another thread) that are beeing iterated with for loop in for-each alike variant.

Removing part of a String^ in MFC C++

So far I've only written console applications. My first application using MFC (in Visual Studio 2010) is basically a form with two multiline boxes (using String[] arrays noted with String^) and a button to activate text processing. It should search the String^ for a [, look for the ] behind it and delete all characters between them (including the []). With 'normal' C++ strings, this isn't difficult. String^ however is more like an object and MSDN tells me to make use of the Remove method. So, I tried to implement it.
public ref class Form1 : public System::Windows::Forms::Form
{
public:
Form1(void)
{
InitializeComponent();
//
//TODO: Add the constructor code here
//
}
String^ DestroyCoords(String^ phrase)
{
int CoordsStart = 0;
int CoordsEnd = 0;
int CharCount = 0;
for each (Char ch in phrase)
{
if (ch == '[')
CoordsStart = CharCount;
if (ch == ']')
{
CoordsEnd = CharCount;
//CoordsEnd = phrase->IndexOf(ch);
phrase->Remove( CoordsStart , CoordsEnd-CoordsStart );
}
CharCount++;
}
return phrase;
}
The button using the method:
private: System::Void button1_Click(System::Object^ sender, System::EventArgs^ e) {
TempString = String::Copy(BoxInput->Text);
DestroyCoords(TempString);
BoxOutput->Text = TempString;
The function seems to hit the correct places at the correct time, but the phrase->Remove() method is doing absolutely nothing..
I'm no OO hero (as said, I normally only build console applications), so it's probably a rookie mistake. What am I doing wrong?

In C++/CLI, System::String is immutable, so Remove creates a new String^. This means you'll need to assign the results:
phrase = phrase->Remove( CoordsStart , CoordsEnd-CoordsStart );
The same is true in your usage:
TempString = DestroyCoords(TempString);
BoxOutput->Text = TempString;
Note that this will still not work, as you'd need to iterate through your string in reverse (as the index will be wrong after the first removal).

No MFC here, that's the C++/CLI that Microsoft uses for writing .NET programs in C++.
The .NET System::String class is immutable, so any operations you expect to modify the string actually return a new string with the adjustment made.
A further problem is that you're trying to modify a container (the string) while iterating through it. Instead of using Remove, have a StringBuilder variable and copy across the parts of the string you want to keep. This means only a single copy and will be far faster than repeated calls to Remove each of which makes a copy. And it won't interfere with iteration.
Here's the right approach:
int BracketDepth = 0;
StringBuilder sb(phrase->Length); // using stack semantics
// preallocated to size of input string
for each (Char ch in phrase)
{
if (ch == '[') { // now we're handling nested brackets
++BracketDepth;
}
else if (ch == ']') { // and complaining if there are too many closing brackets
if (!BracketDepth--) throw gcnew Exception();
}
else if (!BracketDepth) { // keep what's not brackets or inside brackets
sb.Append(ch);
}
}
if (BracketDepth) throw gcnew Exception(); // not enough closing brackets
return sb.ToString();

Check a string for containing a list of substrings

How can I check a specific string to see if it contains a series of substrings?
Specifically something like this:
public GetByValue(string testString) {
// if testString contains these substrings I want to throw back that string was invalid
// string cannot contain "the " or any part of the words "College" or "University"
...
}

If performance is a concern, you may want to consider using the RegexStringValidator class.

You can use string.Contains() method
http://msdn.microsoft.com/en-us/library/dy85x1sa.aspx
// This example demonstrates the String.Contains() method
using System;
class Sample
{
public static void Main()
{
string s1 = "The quick brown fox jumps over the lazy dog";
string s2 = "fox";
bool b;
b = s1.Contains(s2);
Console.WriteLine("Is the string, s2, in the string, s1?: {0}", b);
}
}
/*
This example produces the following results:
Is the string, s2, in the string, s1?: True
*/

This an interesting question. As #Jon mentioned, a regular expression might be a good start because it will allow you to evaluate multiple negative matches at once (potentially). A naive loop would be much less efficient in comparison.

You can check it as follow....
class Program
{
public static bool checkstr(string str1,string str2)
{
bool c1=str1.Contains(str2);
return c1;
}
public static void Main()
{
string st = "I am a boy";
string st1 = "boy";
bool c1=checkstr(st,st1);
//if st1 is in st then it print true otherwise false
System.Console.WriteLine(c1);
}
}

Decided against checking for strings to limit my data return, instead limited my return to a .Take(15) and if the return count is more than 65,536, just return null

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string