Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
Could someone explain what difference between Object Storage and File Storage is please?
I read about Object Storage on wiki, also I read http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf, also I read amazons docs(S3), openstack swift and etc. But could someone give me an example to understand better?
All the difference is only that for 'object storage' objects we add more metadata?
For example how to store image like object using some programming language (for example python)?
Thanks.
IMO, Object storage has nothing to do with scale because someone could build a FS which is capable of storing a huge number of files, even in a single directory.
It is also not about the access methods. HTTP access to data in filesystems has been available in many well known NAS systems.
Storage/Access by OID is a way to handle data without bothering about naming it. It could be done on files too. I believe there is an NFS protocol extension that allows this.
I would muster this: Object storage is a (new/different) ''object centric'' way of thinking of data, its access and management.
Think about these points:
What are snapshots today? They are point in time copies of a volume. When a snapshot is taken, all files in the volume are snapped too. Whether all of them like it or not, whether all of them need it or not. A lot of space can get used(wasted?) for a complete volume snapshot while only a few files needed to be snapped.
In an object storage system, you will rarely see snapshots of volumes, objects will be snapshot-ed, perhaps automatically. This is object versioning. All objects need not be versioned, each individual object can tell if it is versioned.
How are files/volumes protected from a disaster? Typically, in a Disaster Recovery(DR) setup, entire volumes/volume-sets are setup for replication to a DR site. Again, this does not bother whether individual files want to be replicated or not. The unit of disaster protection is the volume. Files are small fry.
In an object storage system, DR is not volume centric. Object metadata can decide how many copies should exist and where(geo locations/fault domains).
Similarly for other features:
Tiering - Objects placed in storage tiers/classes based on its metadata independent of other unrelated objects.
Life - Objects move between tiers, change the number of copies, etc, individually, instead of as a group.
Authentication - Individual objects can get authenticated from different authentication domains if required.
As you can see, the change in thinking is that in an object store, everything is about an object.
Contrast this with the traditional way of thinking about and management and access larger containers like volumes(containing files) is not object storage.
The features above and their object-centric-ness fits well with the requirements of unstructured data and hence the interest.
If a storage system is object(or file) centric instead of volume centric in its thinking, (irrespective of the access protocol or the scale,) it is an object storage system.
Disclosure - I work for a vendor (NetApp) that develops and sells both large filesystem and object storage platforms, I'll try to keep this as implementation neutral as I can, but my cognitive biases may unconciously influence my answer.
There are many differences from both an access, programmability and implementation point of view, however given this is likely to be read primarily by programmers rather than infrastructure or storage people, I’ll focus on that aspect here.
The main difference from an external / programming point of view, is that an object in an object store is created or deleted or updated as a complete unit, you can't append data to an object and you can't update a portion of an object "in place", you can however replace it while still keeping the same object ID. Creating, Reading, Updating and Deleting objects is typically done via relatively straightforward APIs, which are almost always REST-ful or REST based and encourages a mindset that the store is a programmable resource or perhaps as multi-tenant remote service. While most of the object stores I'm aware of support byte-range reads within an object, in general objects stores were initially designed to work with whole objects . Good examples of object storage API’s are those used by Amazon S3 (the default standard for object storage access), OpenStack Swift, and Azure Blob Service REST API. Describing the back end implementations behind these APIs would be a book all by itself.
On the other hand files in a filesystem have a broader set of functions that can be applied to them, including appending data, and updating data in place. The programming model is more complex than an object store and is now almost always accessed programatically via a "POSIX" style of interface and generally tries to make the most efficient use of CPU and memory and encourages a mindset that the filesystem is a private local resource. NFS and SMB does allow for a filesystem to be made available as a multi-tenanted resource, however these are often treated with suspicion by programmers as they sometimes have subtle differences in how they react compared to "local" filesystems despite their full support for POSIX semantics. To update files in a local filesystem, you will probably use API’s such as https://www.classes.cs.uchicago.edu/archive/2017/winter/51081-1/LabFAQ/lab2/fileio.html or https://msdn.microsoft.com/en-us/library/mt794711(v=vs.85).aspx. Talking about the relative merits of filesystem implementations e.g. NTFS vs BTRFS vs XFS vs WAFL vs ZFS has a tendency to result in a religious war that is rarely worth anyones time, though if you buy me a beer I’ll happily share my opinions with you.
From a use-case point of view, if you wanted to keep a large number of photo’s, or videos, or binary build artefacts, then an object store is often a good choice. If on the other hand you wanted to persistently store data in a binary tree and update that data in place on the storage media then an object store simply wouldn’t work, and you’d be much better off with a filesystem (you could also use raw block devices for that, but I haven’t seen anybody do that since the early 90s)
The other big differences are that filesystems a designed to be strongly consistent, and are usually accessed over low to moderate latency (50 microseconds - 50 milliseconds) networks whereas object stores are often eventually consistent, and distributed over a shared nothing infrastructure connected together over low bandwidth high latency wide area networks and their time to first byte can sometimes be measured in multiples of whole seconds. Performing lots of small (4K - 16K) random reads from an object store is likely to cause frustration and performance problems.
The other main benefit of an object store vs a filesystem is that you can be reasonably sure that anything you put in an object store will remain there until you ask for it again and that it will never run out of space so long as you keep paying for the monthly charges. These resources are generally run at large scale with built in replication, version control, automated recovery etc etc and nothing short of Hurricane Harvey style disaster will make the data disappear (even then, you have easy options to make another copy in another location). With a filesystem, especially one that you are expecting you or your local operations people to manage, you have to hope that everything is getting backed up and that it doesnt fill up accidentally and cause everything to melt down when you cant update your data anymore.
I've tried to be conscise, but to add to the confusion the words "filesystem" and "object store” get applied to things which are nothing like the descriptions I’ve used above, e.g. NFS the Network file system isn’t actually a filesystem, its a way of implementing the posix storage API’s via remote procedure calls, and VMware’s VSAN stores its data in something they refer to as an "object store" which allows high speed in place updates of the virtual machine images.
There are some very fundamental differences between File Storage and Object Storage.
File storage presents itself as a file system hierarchy with directories, sub-directories and files. It is great and works beautifully when the number of files is not very large. It also works well when you know exactly where your files are stored.
Object storage, on the other hand, typically presents itself via. a RESTful API. There is no concept of a file system. Instead, an application would save a object (files + additional metadata) to the object store via. the PUT API and the object storage would save the object somewhere in the system. The object storage platform would give the application a unique key (analogous to a valet ticket) for that object which the application would store in the application database. If an application wanted to fetch that object, all they would need to do is give the key as part of the GET API and the object would be fetched by the object storage.
Hope this is now clear.
The simple answer is that object accessed storage systems or services utilize APIs and other object access methods for storing, retrieving and looking up data as opposed to traditional file or NAS. For example with file or NAS, you access storage using NFS (Network File System) or CIFS (e.g. windows file share) aka SMB aka SAMBA where the file has a name/handle with associated meta data determined by the file system.
The meta data includes info about create, access, modified and other dates, permissions, security, application or file type, or other attributes. Files are limited by the file system in terms of their size, as well as the number of files per file system. Likewise, file systems are limited by their total or aggregate size in terms of space capacity and the number of files in the filesystem.
Object access is different in that while file or NAS front-end or gateways or plugins are available for many solutions or services, primary access is via an API where an object can be of arbitrary size (up to the maximum of the object system) along with variable sized meta data (depends on the object system/service implementation). With most object storage systems/services you can specify anywhere from a few Kbytes of user defined meta data or GBytes. What would you use GBytes of meta data for? How about in addition to normal info, adding more data for policies, managements, where other copies are located, thumbnails or small previews of videos, audio, etc.
Some examples of object access APIs or interfaces include Amazon Web Services (AWS) simple storage services (S3) or other HTTP and REST based ones, SNIA CDMI. Different solutions will also support IOS (e.g. iphone/ipad) access, SOAP, Torrent, WebDav, JSON, XAM among others plus NFS/CIFS. In addition many of the object storage systems or services support programmatic bindings for python among others. The APIs allow you to essentially open a stream and then get or put, list and other functions supported by the API/system to determine how you will use it.
For example, I use both Rackspace Cloud files and Amazon S3 (in addition to EBS and Glacier) for backing up, storing, and archiving data. I can access the objects stored via a web browser or tools including Jungle disk (JD) which is what I backup and synchronize files with. JD handles the object management and moves data to both Rackspace as well as Amazon for me. If I were inclined, I could also do some programming using the APIs and then directly access either of those sites supplying my security credentials to do things with my stored objects.
Here is a link to object and cloud storage primer from a session I did in Holland last year that has some simple examples of objects and access.
http://storageio.com/DownloadItems/Nijkerk_Nov2012/SIO_IndustryTrends_CloudObjectStorage.pdf
Using the programmatic binding, you would define your data structures or objects in your program and then use the APIs or calls for storing, retrieving, listing of data, meta data access etc. If there is a particular object storage system, software or service that you are looking to work with or need to know how to program to, go to their site and you should find their SDK or API info with examples. With objects, once you create your initial bucket or container on a service or with a product/system, you then simply create and store additional objects as you go.
Here is a link as an example to AWS S3 API/programming:
http://docs.aws.amazon.com/AmazonS3/latest/API/IntroductionAPI.html
In theory object storage systems are talked about has having unlimited numbers of objects, or object size, in reality, most systems, solutions, software or services are limited by what they have either tested or currently support, which can be billions of objects, with objects sizes of 5GByte or larger. Pay attention to the limits on specific services or products as to what is actually tested, supported vs. what is architecturally possible or what is implemented on webex or powerpoint.
Again its very service and product/service/software dependent as to the number of objects, size of the objects, size of meta data, and amount of data that can be moved in/out via their APIs. However, it is generally safe to assume that object storage can be much more scalable (depending on implementation) than file systems (without using global name space, federation, file virtualization or other techniques).
Also in my book Cloud and Virtual Data Storage Networking (CRC Press) that is Intel Recommended Reading, you will find more information about cloud and object storage.
I will be adding more related material to www.objectstorage.us soon.
Cheers gs
Object Storage = Block Storage
+ Rich Metadata
- File hierarchy
Block Storage uses a filesystem to point where content is stored.
Object Storage uses a identifyer to point to content and his context.
This is my understanding of reading Content-addressed vs. location-addressed
Block Storage needs a filesystem and structuring so with bigger files sytems comes more overhead.
The Object storage has a lot of context about the file and doesn't need the file hierarchy.
The explanation on page 7 of the Dell paper clearly shows this..What troubled me to, was that on the scale of the hard disk itself it isn't explained.
I found that a Hard Disk itself always uses a Block storage mechanism (though that seems to be changing to)
(though that seems to be changing to)
some other insights can be found here
This answer doesn't even explain anything about the differences.
There are some very fundamental differences between File Storage and Object Storage.
File storage presents itself as a file system hierarchy with directories, sub-directories and files. It is great and works beautifully when the number of files is not very large. It also works well when you know exactly where your files are stored.
Object storage, on the other hand, typically presents itself via. a RESTful API. There is no concept of a file system. Instead, an application would save a object (files + additional metadata) to the object store via. the PUT API and the object storage would save the object somewhere in the system. The object storage platform would give the application a unique key (analogous to a valet ticket) for that object which the application would store in the application database. If an application wanted to fetch that object, all they would need to do is give the key as part of the GET API and the object would be fetched by the object storage.
This explained a large portion of it; but you argued about the meta data.
Object storage has no sense of folders, or any kind of organization structure which makes it easy for a human to organize. File Storage, of course, does have all those folders that make it so easy for a human to organize and shuffle through...In a server environment with the number of files in a scale that is astronomical, folders are just a waste of space and time.
Databases you say? Well they're not talking about the Object storage itself, they are saying your http service (php, webmail, etc) has the unique ID in its database to reference a file that may have a human recognizable name.
Metadata, well where is this file stored you say? That's what the metadata is for. Your single file is split up into a bunch of small pieces and spread out of geographic location, servers, and hard drives. These small pieces also contain more data, they contain parity information for the other pieces of data, or maybe even outright duplication.
The metadata is used to locate every piece of data for that file over different geographic locations, data centres, servers and hard drives as well as being used to restore any destroyed pieces from hardware failure. It does this automatically. It will even fluidly move these pieces around to have a better spread. It will even recreate a piece that is gone and store it on a new good hard drive.
This maybe a simple explanation; but I think it might help you better understand. I believe file storage can do the same thing with the metadata; but file storage is storage that you can organize as a human (folders, hierarchy and such) whereas object storage has no hierarchy, no folders, just a flat storage container.
Actually you can mount an bucket/container and access the objects or subfolders (and their objects) from Linux. For example, I have s3fs installed on Ubuntu that I have setup a mount point to one of my S3 buckets and able to do regular cp, ls and other functions just as though it were another filesystem. The key is getting the software tool of which there are plenty that allows you to map a bucket/container and present it as mount point. There are also software tools that allow you to access S3 and other buckets/containers via iSCSI in addition to as NAS.
Most companies with object based solutions have a mix of block/file/object storage chosen based on performance/cost reqs.
From a use case perspective:
Ultimately object storage was created to address unstructured data which is growing explosively, far quicker than structured data.
For example, if a database is structured data, unstructured would be a word doc or PDF.
How do you search 1 billion PDFs in a file system? (if it could even store that many in the first place).
How quickly could you search just the metadata of 1 billion files?
Object storage is currently used more for long term or archival, cheap and deep storage, that keeps track of more detail of what that data is. This metadata becomes very powerful when searching or mining very large data sets. Sometimes you can get what you need from the metadata without even accessing the data itself. Object storage solutions can typically replicate automatically with geographic failover built-in.
The problem is that application would have to be re-written to use object access methods rather than file hierarchy (which is simpler from a app dev perspective). It's really a change in the philosophy of data storage, and storing more actionable information about that data from a management standpoint as well as usage.
Quick example might be an MRI scan image. On Filesystem you have owner/creation date, but not much else. If it were an object, all of the information surrounding the MRI could be stored along with it in metadata, like patient name, MRI center location, the requesting Dr., insurance carrier, etc.
Block/file are more well suited for local access or OTLP where performance is more important than retention and cost.
For example, you would not want to wait minutes for a Word doc to open, but you could wait a few minutes for a data mining/business intelligence process to complete.
Another example would be a legal search where you have to search everything from 5 years ago to present. With retention policies in place to decrease the active data set and cost, how would you even do that without restoring from tape?
Object storage is a great solution for replacing long term archival methods like tape.
Setting up replication and failover for block and file can get very expensive in the enterprise and usually requires very expensive software and services.
Note: At the lower level, object storage access happens via the RESTful API which is more like a web request than accessing a file at the end of a path.
Here is a good article worth to read:
https://cloudian.com/blog/object-storage-vs-file-storage/
cited from the ariticle:
To start, object storage overcomes many of the limitations that file storage faces. Think of file storage as a warehouse. When you first put a box of files in there, it seems like you have plenty of space. But as your data needs grow, you’ll fill up the warehouse to capacity before you know it. Object storage, on the other hand, is like the warehouse, except with no roof. You can keep adding data infinitely – the sky’s the limit.
If you’re primarily retrieving smaller or individual files, then file storage shines with performance, especially with relatively low amounts of data. Once you start scaling, though, you may start wondering, “How am I going to find the file I need?”
In this case, you can think of object storage as valet parking while file storage is more like self-parking (yes, another analogy, but bear with me!). When you pull your car into a small lot, you know exactly where your car is. However, imagine that lot was a thousand times larger – it’d be harder to find your car, right?
Because object storage has customizable metadata and all the objects live on a flat address space, it’s similar to handing your keys over to a valet. Your car will be stored somewhere, and when you need it, the valet will get the car for you. It might take a little longer to retrieve your car, but you don’t have to worry about wandering around looking for it.
I think the white paper explains the idea of object storage quite well. I am not aware of any standard way to use object storage devices (in the sense of a SCSI OSD) from a user application.
Object storage is in use in some large scale storage products like the storage appliances of Panasas. However, these appliances then export a file system to the end user. It is IMHO fair to say that the T10 OSD idea never really caught momentum.
Related ideas to the OSD standard can be found in cloud storage systems like S3 and RADOS.
Related
I'm trying to create a distributed system which contains mobile app, web userpanel and an API that communicates with DB. I want the user to be able to upload a profile image both from the mobile app and the web userpanel but what is the best and "right" way to store images accross a distributed system? Cant really find anything describing best practices on this topic.
I know that the filepath should be in database, and the image in a file system. But should that file system be on the API server or where?
Here is an diagram of what i think the distributed system should be like.
The "right" way to do something complex like image hosting depends on factors like expected traffic and performance expectations. Designing large systems involves a lot of tradeoffs, so it's best to nail down what requirements are for your system are in order to make decisions that serve those requirements.
As for your question, this diagram is roughly correct - you want to store the location of the uploaded image separate from the image itself. If you wanted your solution to be more scalable, an approach would be turning your file system into its own service with its own API. You would store a hash of the file in your database to reference it rather than its path, then request that image (or a URL to that image) from the new storage service by asking the storage service's API for the file that has the stored hash.
The reason this is more scalable is that the storage service is free to become its own distributed system when we don't require that every file has an associated file system path within a single namespace. A hash is a good candidate for a replacement of the filesystem path, but you could come up with your own storage ID scheme depending on your needs.
However, this may be wildly out of scope for what you are trying to design. If you only expect to have a few thousand users, storing your images and database on your API server in the file system isn't necessarily wrong, but you might experience growing pains if the requirements of your system grow.
Google's site reliability engineer classroom has a lesson on building a distributed image server, which is an adjacent problem to what you're looking to do: https://sre.google/classroom/imageserver/
I am currently considering whether I should be storing media in an apache cassandra database. The use case is that the site will be taking uploads from users for insurance claims and will need to store the files so that they cannot be accessed outside the correct permissions and at the same time they need to be able to be streamed. If I store them on a file system, I have to deal with redundancy backups and so on using file system based old tech. I am not really interested in dealing with a CDN because many of them are expensive but also I the permissions to the whether you can view the content depends on information in the app such as which adjuster is assigned to the case and so on. In addition I want to stream the files rather than require download and view which would be the default mode with requests against a CDN. If I put them in cassandra it will handle the replication, storage and I can stream the binary data out of the database to the user with integrated permissions. What I am concerned about is if I will run into problems with cassandra rows having huge HD video files that are sometimes 1 to 2 hours long (testimony).
I am interested in the recommendations of Cassandra users concerning this issue. How would to solve the problem. Any lessons you have learned that I can benefit from. Would you suggest anything specific about the video tables if I go with cassandra storage? Is there any CDN that will stream, not require download, allow me to plug in permissions and at the same time be open source?
Thanks a bunch.
Cassandra is definitely not designed and should not be used as an object store. I've worked on plenty of use cases where Cassandra was used as the metadata store alongside the object store/CDN and can complement them quite nicely.
Check out KillrVideo for inspiration: https://killrvideo.github.io/
This seems like a good key-value usecase for Streaming LOB support in Oracle NoSQL Database. You might want to look at this - http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/lobapi.html
RavenDB (a .Net JSON storage storage db with querying) provides aggressive caching / memory management under its own control (via its own storage engine Munin), with config parameters to tweak various cache sizes etc... Google groups suggests that before (may not be the case with latest releases) occasional out-of-memory exceptions as result of un tuned parameters (with sufficient size db / index).
CouchDB seems to take a different approach and leaves the caching to the operating system. Meaning when I GET /db1/doc-id-1 it essential in terms of programming a file read op against the filesystem which the OS can optimize away due to its own caches. Similarly I believe this is same for views and of reduce results (multiple parts of b tree need loading/computed from disk depending on range).
The latter seems superior to me, the OS have gone from years of evolutions in caching/paging etc.. and pressure from other services can balance memory.
Firstly.
Am I correct in my understanding?
Is CouchDB's approach unique to Unix based OSes (although I see they have a Windows port)?
Is there a reason a .Net DB cant relying on the OS to optimize away file reads etc..?
What are the disadvantages and advantages of each approach that would influence choice in building a data store?
Side note: I believe Redis is the same just keeping the index in memory, each GET KEY is a disk hit (which either does hit the disk heads or not depending on the OS file caching)
Jia93,
One of the reasons that we are working the way we do is that we have stronger separation between the layer. CouchDB have much the same optimizations as we do (keeping things in mem), but it is doing that on top of the BTree structure that is directly expose to the application.
Another reason for caching the results is to avoid the costs of parsing the json on every single request.
I am designing a system that's going to have about 10 millions+ users, each has a photo, which is about 1~2 MB.
We are going to deploy both database and web app using Microsoft Azure
I am wondering the way I should store the photos, there are currently two options,
1, Store all photos use Sql Server FileStream
2, Use File Server
I haven't experienced such large scale BLOB data using FileStream.
Can anybody give my any suggestion? The Cons and Pros?
And anyone with Microsoft Azure experiences concerning the large photos store is really appreciated!
Thx
Ryan.
I vote for neither. Use Windows Azure Blob storage. Simple REST API, $0.15/GB/month. You can even serve the images directly from there, if you make them public (like <img src="http://myaccount.blob.core.windows.net/container/image.jpg" />), meaning you don't have to funnel them through your web app.
Database is almost always a horrible choice for any large-scale binary storage needs. Database is best for relational-only systems, and instead, provide references in your database to the actual storage location. There's a few factors you should consider:
Cost - SQL Azure costs quite a lot per GB of storage, and has small storage limitations (50GB per database), both of which make it a poor choice for binary data. Windows Azure Blob storage is vastly cheaper for serving up binary objects (though has a bit more complicated pricing system, still vastly cheaper per GB).
Throughput - SQL Azure has pretty good throughput, as it can scale well, however, Windows Azure Blog storage has even greater throughput as it can scale to any number of nodes.
Content Delivery Network - A feature not available to SQL Azure (though a complex, custom wrapper could be created), but can easily be setup within minutes to piggy-back off your Windows Azure Blob storage to provide limitless bandwidth to your end-users, so you never have to worry about your binary objects being a bottleneck in your system. CDN costs are similar to that of Blob storage, but you can find all that stuff here: http://www.microsoft.com/windowsazure/pricing/#windows
In other words, no reason not to go with Blob storage. It is simple to use, cost effective, and will scale to any needs.
I can't speak on anything Azure related but for my money the biggest advantage of using FILESTREAM is that that data can get backed up inside the normal SQL Server backup process. The size of the data that you are talking about also suggests that FILESTREAM may be a good choice as well.
I've worked on a SCM system with a RDBMS back end and one of our big decisions was whether to store the file deltas on the file system or inside the DB itself. Because it was cross-RDBMS we had to cook up a generic non-FILESTREAM way of doing it but the ability to do a single shot backup sold us.
FILESTREAM is a horrible option for storing images. I'm surprised MS ever promoted it.
We're currently using it for our images on our website. Mainly the user generated images and any CMS related stuff that admins create. The decision to use FILESTREAM was made before I started. The biggest issue is related to serving the images up. You better have a CDN sitting in front. If not, plan on your system coming to a screeching halt. Of course, most sites have a CDN, but you don't want to be at the mercy of that service going down meaning your system will get overloaded. The amount of stress put on your sql server is the main problem here.
In terms of ease of backup. Your tradeoff there is that your db is MUCH MUCH LARGER and, therefore, the backup takes longer. Potentially, much longer and the system runs slower during the backup. Not to mention, moving backups around takes longer (i.e., restoring prod data in a dev environment or on local machines for dev purposes). Don't use this as a deciding factor.
Most cloud services have automatic redundancy of any files that you store on their system (i.e., aws's S3 and azure's blob). If you're on premise, just make sure you use a shared location for the images and make sure that location is backed up. I think the best option is to set it up so each image (other UGC file types too) has an entry in your db with a path to that file. Going one step further, separate the root path into a config setting and only store the remaining path with the entry. For example, root path in config might be a base url, a shared drive or virtual dir, or a blank entry. Then your entry might have "/files/images/image.jpg". This way, if you move your filestore, you can just update the root config. I would also suggest creating a FileStoreProvider interface (Singleton) that can be used for managing (saving, deleting, updating) these files. This way, if you switch between AWS, Azure, or on premise, you can just create a new Provider.
I have a client server DB, i manage many files (doc, txt, pdf, ...) and all of them go in a filestream BLOB. Customers has 50+ MB dbs. If in azure you can do the same go for it. Having all in the db is a wonderful thing. It is considered good policy also for Postgres and MySQL
Is every app that allows users to input data built with core data?
I've built a "grocery list" type of table view app where you name the list and then in a detail view add items to the list. Simple.
What I don't get is this, based on an iphone development book the example saves the data to a plist using dictionaries.
I've learned that it works on the simulator but not the device because the data is saved to the application bundle not the document directory (which was new to me!)
On the device the app works great except-it won't HOLD the data.
Is core data or sqlite the only solution?
Is every app that allows users to input data built with core data?
Note that your question as posed is incorrect, as it assumes that CoreData is tied to SQLite and is an alternative to plists.
CoreData is a framework for object lifecycle and graph management. It provides implementation of common tasks like changes tracking and propagation, consistency enforcement, data validation and so on.
The CoreData framework is a separate from the object persistence layer and can use different serialization implementations, including SQLite and XML (plists).
For more details, read Core Data Programming - Persistent Store Features.
The decision whether you should use CoreData should be based on whether you need any of the features it provides. If you need to serialize simple object graphs, without consistency requirements, you can use standard NSDictionary to serialize your data in a simple plist file in any of the application-writable folders. Otherwise, use CoreData, and choose the proper persistent store based on the type of data you will be storing.
From what I've seen around the internet, you can use Core Data (which gives you the options of SQLite, atomic, and XML), you can use NSKeyedArchivers and NSKeyedUnarchivers (http://www.vimeo.com/1454094) or you can store the data inside the local application folder (possibly using a serialization method). It looks like Core data is the best solution, but a more complex one to implement. For a simple app, as yours is, I think serializing data and storing it in the local app directory would be perfect.
I am surprised that your book is showing an example where user data is written to the app bundle. Actually, I'm a little surprised that that is even possible.
You should be able to write your data to an NSDictionary (or NSMutableDictionary) and then write that to your app's Documents directory, using -writeToFile:atomically:
Reading data back in should also be straightforward, using -initWithContentsOfFile:.
For someone just getting started, I would recommend keeping it simple. Working NSDictionary is very simple, though you have to manage things like the list of lists and how to name lists that are stored in Documents directory, etc.
Ultimately, using Core Data would probably be a better approach. It offers more flexibility and more power - but, as ever, those advantages come at a cost.
Your question is very important to the community in the respect that
you are asking a strategic question: which technology do I use, when?
Core Data is best for the day-to-day work of a list-based app. Core data is built to mirror the storage of data, similar to how databases work. Relational structures, sorting, key indexing and other row-based attributes are best supported by Core Data.
Property Lists (*.plist) is best suited to one-time updates to critical environmental settings. The user, for example, can optionally set .plist attributes through IOS Settings app. So passwords, account settings, email addresses, and configuration options can be set here nicely. This kind of data is very different from frequently-updated, transactional data.
XML Persistence is closely related to .plist, in that the property list (or .plist) is an xml file in itself. Hence, you could download a stream of xml data, then use it in your app using the same programming rubric as you would, adjusting a property list. Hence, receiving xml data from the web, or uploading such a list, maps nicely to xml persistence.
AWS also proposed the AWS-Persistence library, to support synchronizing your core data collections with their online databases. This could provide helpful by 1) having a user populate data locally via Core Data, then lazily/opportunistically uploading the list. For your purposes (grocery shopping list), this could provide immediacy to the user, while giving your server an interesting big-data opportunity (analyze user transactions, provide recommendations, sell ads, etc).
Hope this gets future visitors tapping into the wealth of what IOS provides -- peace!