What are the limits for SharePoint field values?

Are there any limits to the amount of data that can be stored in an individual SharePoint field? If there are, what are they?
Is there a limit in terms of the number of bytes or string length, say, that can be stored as a value of an individual field?

SharePoint stores the list items in a SQL Server table called AllUserData, so the maximum values are determined by the data types of the columns.
You can find the complete structure here. However, I cannot find any resource discussing the mapping between the SharePoint field types and SQL Server columns types; probably because accessing the SharePoint tables directly is strongly discouraged. That's not a big problem though - query the table, look at the results and you will be able to match the fields and the columns (e.g. nvarchar1 correspond to the 1st 'Single line of text' field).


Is there a way to exclude NULL values from Azure Cognitive Search Indexes

So for example we have field 1 up to 10. I want to index all the field in Azure Search, so you can filter, search on those filters.
My Question is, is there a way to just exclude the fields that are NULL from a specific ID, so not store them in Azure search? See example underneath.
The data itself is initially stored in Azure Cosmos Database.
In Azure Cosmos DB it would like this:
Id 1
field 1: a
field 2: b
field 5: c
field 6: d
field 8: e
Id 2
field 3: a
field 2: b
field 5: c
field 9: d
field 10: e
However in Azure Search Index, it looks like this:
Id 1
field 1:a
field 2:b
field 3:NULL
field 4:NULL
field 5:c
field 6:d
field 7:NULL
field 8:e
field 9:NULL
field 10:NULL
Id 2
field 1:NULL
field 2:b
field 3:a
field 4:NULL
field 5:c
field 6:NULL
field 7:NULL
field 8:NULL
field 9:d
field 10:e
The shortest answer to your question is "no", but it's a little deeper than that.
When you add documents to an Azure Cognitive Search index, the values of each field are stored in a data structure called an inverted index. This stores a dictionary of terms found in the field, and each entry contains a list of document IDs containing that term. It is somewhat similar to a column-oriented database in that regard. The null value that you see in document JSON is never actually stored in the inverted index. This can make it expensive to test whether a field is null, since the query needs to look for all document IDs not contained in the inverted index, but it is perfectly efficient in terms of storage (because it doesn't consume any).
This article has a few simplified examples of how inverted indexes work, although it's about a different topic than your question.
Your broader concern about having many fields defined in your index is a valid one. There is a tradeoff between schema flexibility and resource utilization as you increase the number of fields in your index. However, this is due to the bookkeeping overhead required for each field, not the "number of nulls in the field" (which doesn't really mean anything since nulls aren't stored).
From your question, it sounds like you're trying to model different "entity types" in the same index, resulting in a sparse index where some subset of the documents have one subset of fields defined, while another subset of documents have different fields defined. This is a scenario that we want to better support in the service. One promising future direction could be supporting multi-index query, so each subset of your schema could have its own index with its own distinct (but perhaps overlapping) set of fields. This is not on our immediate roadmap, but it's something we want to investigate further. Please vote on this User Voice item to help us prioritize.
As far as not saving the null values, AFAIK it is not possible. An index in Cognitive Search has a pre-defined schema (much like a relational database table) and based on an attribute's data type an attribute's value will be initialized with a default value (null for most of the data types).
If your concern is storage, it's not a problem since it's an inverted index.
If you have an issue with the complexity of the JSON data returned, you could implement your own intermediate service that just hides all NULL values from the JSON. So, your application queries your own query service which in turn queries the actual Azure service. Just passing along all parameters as-is. The only difference is that your service removes both the key/value from the JSON to make the responses easier to manage.
The response from search would then appear to be identical to your Cosmos record.

Changing your 20 indexed columns

I have a large SharePoint 0365 list of over 15,000 items. I have already used all 20 indexed columns. I now need to filter by a different column. Is it safe for me to remove an indexed column and changed to a different field? Do you have to reindex the list, if I do that?
I'm afraid you'll find that creating or removing column indexes are among the operations that are restricted upon surpassing SharePoint's list view threshold, as documented here.
In an on-premises SharePoint farm (or an otherwise traditional SharePoint farm using cloud-hosted infrastructure), you'd have access to central administration where you could temporarily increase the threshold, set a time window during which the threshold won't apply, or even use Powershell to temporarily set the EnableThrottling property of the list to false, allowing you to make your indexed column changes. But with Office 365 you won't have any of those options.
Depending on the circumstances, you can still circumvent the list view threshold when filtering by first filtering the list by one or more of your indexed columns such that less than 5000 items are returned; you should then be able to filter that subset of results using your unindexed column.
Another alternative would be to use SharePoint's search services to access results in your list that match the given metadata. Since the search crawl index is generated ahead of time (rather than a live query), it is not beholden to the list view threshold. Only problem there is that the results might be stale depending on the frequency of search crawls.
Since you already have 20 indexed columns, it is possible that you might be able to query the list using an already-indexed column to return a response that obeys the list-view threshold ('Date Created' range, or 'Created By' might be useful columns)
Once you return your initial response, you can then filter on the unindexed column of interest.

Union of two sharepoint list in SSRS

I have requirement to union two sharepoint list on basis of common id field for showing report. Please suggest whether union operation is possible.
Not sure if you mean UNION in the SQL sense (appending together) or you mean you want to join the two lists on a common value.
There's no way to append datasets together inside a report, but there is some limited functionality to look up the values from a dataset based on a value in another dataset.
Take a look at these report expression functions:

Cassandra sets or composite columns

I am storing account information in Cassandra. Each account has lists of data associated with it. For example, an account may have a list of friends and a list of liked books. Queries on accounts will always want all friends or all liked books or all of both. No filtering or searching is needed on either. The list of friends and books can grow and shrink.
Is it better to use a set column type or composite columns for this scenario?
I would suggest you not to use sets if
You are concerned about disk space(as each value is allocated a cell in disk + data space for metadata of each cell which is 15 bytes if am not wrong. Now that consumes a lot if your data is a growing one).
Not going to grow a lot of data in that particular row as each time ,the cells are to be fetched from different sstable .
In these kind of cases, the more preferred option would be a json array. You shall store it as a text and back the data from that.
Set (or any other collections ) use case was brought in for a completely different perspective. If you are having a particular value inside the list or a value has to be updated frequently inside the same collection, you shall make use of the collections .
My take on your query will be this.
Store all account specific info in a json object of friends that has a value as list of books .
Sets are good for smaller collections of data, if you expect your friends / liked books lists to grow constantly and get large (there isn't a golden number here) it would be better to go with composite columns as that model scales out better than collections and allows for straight up querying compared to requiring secondary indexes on collections.

How can I delete records from a table that have certain criteria

Rookie question I know.
I have a table with about 10 fields, one of the fields is a category field. I need this field to exist because of the multiple types of categories. However, one category in this field is wrong and is duplicating results.
So can I delete all records in the table that have "Type320" in the CatDescription field, and how? I want to keep eveerything else as it is in this table; just need to get rid of the records that have that that in that one field
Thanks very much!
EDIT: Thanks for the answer, I did not know how to do this so this is very helpful
However, this is more complicated than I thought. The raw data that I am supplied carries these duplicate records (only duplicate in certain circumstances but they are easy to isolate). This raw data is given to me on a monthly basis in several spreadsheet forms.
It all relates to these ID numbers, and has like 10 fields (xls columns). As I said before one of these is the Category Description field (sorry, this is not a lookup) In certain places this records automatically duplicates itself on output because in the database this comes from, it has to have this sub category for one particular "type"
So....every time there is a duplication, every single bit of information in all fields are exactly the same, with the exception of this CatDescription (one is Type320, and the duplicated record type is "Type321"). However, there are some instances where Type321 is valid on it's own (in which case there is no matching data row with a Type320 catdescription). By matching I mean all data in all fields of a particular record.
A very clear absolute of this is if all fields (data within) of a record with Type320 CatDescription, matches all fields (data within) a record with Type321 CatDescription, then I can delete that record containing Type321 CatDescription. This is true because this is the only situation where this duplication occurs, normally not all of this should match.
This allows all unique records with Type320 and Type321 data (that does not match exactly) to stay; just a it should. This makes sense to me (and hopefully you too :/) but can it be done, and how?
thanks because this is way over my head. I would rather know how to do it in access, but an xls solution is equally as appreciate. heck i would do it in ppt if it would get the job done! :)
I would try with one of these two querys:
DELETE FROM table WHERE CatDescription LIKE '%Type320%';
DELETE FROM table WHERE CatDescription LIKE '*Type320*';
That because the Access database engine could be using * (ANSI-89 Query Mode e.g. DAO) instead of % (ANSI-92 Query Mode e.g. OLE DB/ADO) for the wildcards.
Alternatively, this regardless of ANSI Query Mode:
DELETE FROM table WHERE CatDescription ALIKE '%Type320%';
Note the Access database engine's ALIKE keyword is not officially supported.
Does the CatDescription field look to another table? Is it a a query of those tables that creates what you call duplicate results?
If so, be careful about blaming the table that has CatDescription. Check the look-up table to see if Type320 is found there in duplicate.
If you don't have the problem isolated correctly, then you're likely to delete good records while not fixing the problem.
