How to Store a List in a Column of a Database Table

How to store a list in a column of a database table

No, there is no "better" way to store a sequence of items in a single column. Relational databases are designed specifically to store one value per row/column combination. In order to store more than one value, you must serialize your list into a single value for storage, then deserialize it upon retrieval. There is no other way to do what you're talking about (because what you're talking about is a bad idea that should, in general, never be done).

I understand that you think it's silly to create another table to store that list, but this is exactly what relational databases do. You're fighting an uphill battle and violating one of the most basic principles of relational database design for no good reason. Since you state that you're just learning SQL, I would strongly advise you to avoid this idea and stick with the practices recommended to you by more seasoned SQL developers.

The principle you're violating is called first normal form, which is the first step in database normalization.

At the risk of oversimplifying things, database normalization is the process of defining your database based upon what the data is, so that you can write sensible, consistent queries against it and be able to maintain it easily. Normalization is designed to limit logical inconsistencies and corruption in your data, and there are a lot of levels to it. The Wikipedia article on database normalization is actually pretty good.

Basically, the first rule (or form) of normalization states that your table must represent a relation. This means that:

  • You must be able to differentiate one row from any other row (in other words, you table must have something that can serve as a primary key. This also means that no row should be duplicated.
  • Any ordering of the data must be defined by the data, not by the physical ordering of the rows (SQL is based upon the idea of a set, meaning that the only ordering you should rely on is that which you explicitly define in your query)
  • Every row/column intersection must contain one and only one value

The last point is obviously the salient point here. SQL is designed to store your sets for you, not to provide you with a "bucket" for you to store a set yourself. Yes, it's possible to do. No, the world won't end. You have, however, already crippled yourself in understanding SQL and the best practices that go along with it by immediately jumping into using an ORM. LINQ to SQL is fantastic, just like graphing calculators are. In the same vein, however, they should not be used as a substitute for knowing how the processes they employ actually work.

Your list may be entirely "atomic" now, and that may not change for this project. But you will, however, get into the habit of doing similar things in other projects, and you'll eventually (likely quickly) run into a scenario where you're now fitting your quick-n-easy list-in-a-column approach where it is wholly inappropriate. There is not much additional work in creating the correct table for what you're trying to store, and you won't be derided by other SQL developers when they see your database design. Besides, LINQ to SQL is going to see your relation and give you the proper object-oriented interface to your list automatically. Why would you give up the convenience offered to you by the ORM so that you can perform nonstandard and ill-advised database hackery?

How to store a list in a db column

In a normalized relational database, such a situation is unacceptable. You should have a junction table that stores one row for each distinct ID of the FOO object and the ID of the Fruit. Existence of such a row means the fruit is in that list for the FOO.

CREATE TABLE FOO ( 
id int primary key not null,
int1 int,
int2 int,
int3 int
)

CREATE TABLE Fruits (
id int primary key not null,
name varchar(30)
)

CREATE TABLE FOOFruits (
FruitID int references Fruits (ID),
FooID int references FOO(id),
constraint pk_FooFruits primary key (FruitID, FooID)
)

To add Apple fruit to the list of a specific FOO object with ID=5, you would:

INSERT FOOFruits(FooID, FruitID)
SELECT 5, ID FROM Fruits WHERE name = 'Apple'

Should a long ordered list of ids be stored in a column of a database table?

It's usually not a good design for a relational database.

Storing a comma-separated list of values is one type of denormalization. Good relational database design encourages normalization.

All types of optimizations improve one type of query, at the expense of other queries. In your case, if you only store or retrieve the whole list of id's, then it could be a good optimization. But if you ever want to add an id to the list, or search for a specific id, or be assured they are sorted correctly, or many other types of operations, then those tasks are not optimized.

There are actually many disadvantages to using comma-separated lists, not only the one about foreign keys you mention. I wrote an old answer about this here: Is storing a delimited list in a database column really that bad?

Using normalized design makes a database more flexible. That is, you can run many types of queries against the data, and none are especially disadvantaged.

So optimizations like denormalization require you to be sure that you know up front which queries are important for your project, and that you know you won't need any of the types of queries that are made more costly by the denormalized design. Or if you occasionally do need those queries, you don't need them to be efficient.

You expressed concern about making many rows if you store this in a normalized fashion, but most RDBMS products can handle billions of rows.

Searches should not scan a lot of rows if you create the right indexes. Which indexes are the right ones depends on which queries you need to optimize.



What if the ordered lists could be even longer, e.g. millions of elements each, so the lists would be more like a blob of data and an equivalent table could contain trillions of entries?

With respect, if you had to solve data management at that scale, then you wouldn't be asking how to solve it on Stack Overflow. You'd employ some senior software architecture experts to solve it.

They'd tell you basically the same thing: you have to be very specific about what types of queries you need to do against this data before they can choose an optimal architecture to support those specific queries. Because at that scale, you can't afford to do anything but an optimal approach.

If you don't need to solve the problem at the "trillions of elements" scale, then using the relational solution is adequate and offers flexibility, as I described above.

I see plenty of SO questions asking how Facebook manages data at their scale. The answer is almost always: "it doesn't matter what they do, because you will never have to do what they do at their scale."

MySQL - Storing a long list of items

You said items have a property source. But then you want to store the data as if the list of items is a property of each source?

This really sounds like you have a many-to-many relationship between items and sources. This is the typical solution to do that in a relational database:

CREATE TABLE items ( item_id INT PRIMARY KEY );

CREATE TABLE sources ( source_id INT PRIMARY KEY );

CREATE TABLE item_sources (
item_id INT NOT NULL,
source_id INT NOT NULL,
PRIMARY KEY (item_id, source_id),
FOREIGN KEY (item_id) REFERENCES items (item_id),
FOREIGN KEY (source_id) REFERENCES sources (source_id)
);
  • Store one row in item_sources each time you assign an item to a source.
  • You can have as many as you want.
  • It's easy to add another, just by inserting a row.
  • It's easy to remove one, by deleting a row.
  • It's easy to fetch the items for a given source.
  • It's easy to fetch the sources for a given item.
  • It's easy to count them or sort them, or figure out which items are most popular, or do anything else you want.

You only asked how to store them, you didn't identify any queries you need to make. But lacking any information about specific queries, you should default to using normalized table structure like that above.

If you want to optimize for some specific query, you may choose to denormalize, but to do that you must have the specific query in mind. You can't choose which way to denormalize until you know the query you want to optimize.

Keep in mind that if you optimize for one type of query, this will come at the expense of other types of queries.

Normalization is the best way to remain flexible with respect to the types of queries you can run with reasonable efficiency.

Storing one column of a database table into an ArrayList

In an "ideal universe", you would probably want to use an ORM (like Hibernate), and a framework (like Spring Boot).

Either or both might be "overkill" for your application.

If your goal is to get a "list of tags", then the code you've got looks OK:

  • Connect to the database.
  • Make a query.
  • Copy the resultset into a list of Java objects, one at a time.
  • Close the DB connection when you're done.

If your goal is to "optimally" find a specific tag (without making another DB query), then perhaps you should use a Java Map or Set instead of an ArrayList.

Should you wish to consider Spring Boot and Sqlite, here are a couple of tutorials:

  • https://www.baeldung.com/spring-boot-sqlite
  • http://code-flow-hjbello.blogspot.com/2017/07/using-sqlite-with-jdbc.html

This part of your code seems good:

public ArrayList getAllWallets() throws SQLException{
resultset = statement.executeQuery(query);
while (resultset.next()) {
Wallet w = new Wallet(query, 0, query);
w.setName(resultset.getString("Name"));
w.setLocation(resultset.getInt("Location"));
w.setTag(resultset.getString("Tag"));
list.add(w);
}
...

So one of two choices:

  1. Either forget about getWalletTag(), and just use your wallets to identify tags, or

  2. Use the same query, just save the "tag" column into your array list (instead of anything else).

Option 2:

 public ArrayList getWalletTag() throws SQLException {
String query = "SELECT Tag FROM Wallets;";
...
resultset = statement.executeQuery(query);
List tags = new ArrayList();
while (resultset.next()) {
tags.add(resultset.getString("Tag"));
...

... or ...

     ...
Set tags = new HashSet();
while (resultset.next()) {
tags.add(resultset.getString("Tag"));

How to store a list of items on a single column in SQL Server 2008

Generally storing multiples values in column is an indication of poor database design. It makes it very difficult to efficiently select rows based on criteria within that single column. Having said that, if you really only ever need to select those values on a per row basis then consider using XML as SQL Server natively supports XML data.

Store List in table columns

Since the relationship you want to achieve is (according to my understanding) many-to-many, you do indeed need a third table mapping hours to relationships. Here is a very good example. You will need to set up your third table with two foreign keys to the two tables you want to connect.

@Entity
@Table(name = "performance")
@Data
public class Performance {

@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Column(nullable = false, unique = true)
private Integer performanceId;

@ManyToMany
@JoinTable(
name = "hours_performance",
joinColumns = { @JoinColumn(name = "performance_id") },
inverseJoinColumns = { @JoinColumn(name = "hour_id") }
)
private List performanceHours = new ArrayList<>();
}

@Entity
@Table(name = "hours")
@Data
public class Hours {

@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Column(nullable = false, unique = true)
private Integer hourId;
ZonedDateTime time;

@ManyToMany(mappedBy = "performanceHours")
private List performances;
}


Related Topics



Leave a reply



Submit