Memory Effective Way to Read Blob Data in C#/SQL 2005

Memory effective way to read BLOB data in C#/SQL 2005

See this excellent article here or this blog post for a long explanation how to do it.

Basically, you need to use a SqlDataReader and specify SequentialAccess to it when you create it - then you can read (or write) the BLOB from the database in chunks of whatever size is best for you.

Basically something like:

SqlDataReader myReader = getEmp.ExecuteReader(CommandBehavior.SequentialAccess);

while (myReader.Read())
{
   int startIndex = 0;

   // Read the bytes into outbyte[] and retain the number of bytes returned.
   retval = myReader.GetBytes(1, startIndex, outbyte, 0, bufferSize);

   // Continue reading and writing while there are bytes beyond the size of the buffer.
   while (retval == bufferSize)
   {
      // write the buffer to the output, e.g. a file
      ....

      // Reposition the start index to the end of the last buffer and fill the buffer.
      startIndex += bufferSize;
      retval = myReader.GetBytes(1, startIndex, outbyte, 0, bufferSize);
   }

   // write the last buffer to the output, e.g. a file
   ....
}

// Close the reader and the connection.
myReader.Close();

Marc

Most memory efficient way of retrieving blob data from SQL Server

You have a choice when retrieving binary data from SQL. Assuming you're using varbinary (image is depricated) as your data type, you can either return all the data or you can return just some of the data using a simple substring function. If the binary is huge (like 1 gb), returning all of the data will be very memory intensive.

If that's the case, you have the option of taking a more iterative approach to returning the data. Let's say it's a 1 gb binary, you can have the program cycle through the data in 100mb chunks, writing each chunk to disk, then discarding the buffer, before returning for the next 100mb-chunk.

To get the first chunk you'd use:

Declare @ChunkCounter as integer
Declare @Data as varbinary(max)
Declare @ChunkSize as integer = 10000000
Declare @bytes as integer
Select @bytes = datalength(YourField) from YourTable where ID = YourID
If @bytes> @ChunkSize 
      Begin 
           /* use substring to get the first chunksize   */ 
           Select @data= substring(YourField,0,@ChunkSize), @Chunkcounter +1 as 'ChunkCounter'
           FROM YourTable   
           where ID = YourID
      End 
Else
      Begin ....

how to read blob from sql server using c#?

Read the blob from the database into a byte[] and write this buffer to a file.

Streaming a large file into a database BLOB field

Using some hints I got from these two pages, I have an answer that works:

http://www.syntaxwarriors.com/2013/stream-varbinary-data-to-and-from-mssql-using-csharp/ (this looks a lot like the serialize SO answer, but there's more here...not sure who copied who!).

How do I copy the contents of one stream to another?

Basically, it uses the same methodology as the answer about serializing Blobs, but instead of using BinaryFormatter (a class I'm not fond of anyhow), it creates a FileStream that takes the path to the file, and an extension method to copy that stream into the target stream, or BlobStream, as the example named it.

Here's the extension:

public static class StreamEx
{
    public static void CopyTo(this Stream Input, Stream Output)
    {
        var buffer = new Byte[32768];
        Int32 bytesRead;
        while ((bytesRead = Input.Read(buffer, 0, buffer.Length)) > 0)
            Output.Write(buffer, 0, bytesRead);
    }
}

So the trick was to link two streams, copying the data from one to another in chunked fashion, as noted in the comments.

How to I serialize a large graph of .NET object into a SQL Server BLOB without creating a large buffer?

There is no built-in ADO.Net functionality to handle this really gracefully for large data. The problem is two fold:

there is no API to 'write' into a SQL command(s) or parameters as into a stream. The parameter types that accept a stream (like FileStream) accept the stream to READ from it, which does not agree with the serialization semantics of write into a stream. No matter which way you turn this, you end up with a in memory copy of the entire serialized object, bad.
even if the point above would be solved (and it cannot be), the TDS protocol and the way SQL Server accepts parameters do not work well with large parameters as the entire request has to be first received before it is launched into execution and this would create additional copies of the object inside SQL Server.

So you really have to approach this from a different angle. Fortunately, there is a fairly easy solution. The trick is to use the highly efficient UPDATE .WRITE syntax and pass in the chunks of data one by one, in a series of T-SQL statements. This is the MSDN recommended way, see Modifying Large-Value (max) Data in ADO.NET. This looks complicated, but is actually trivial to do and plug into a Stream class.

The BlobStream class

This is the bread and butter of the solution. A Stream derived class that implements the Write method as a call to the T-SQL BLOB WRITE syntax. Straight forward, the only thing interesting about it is that it has to keep track of the first update because the UPDATE ... SET blob.WRITE(...) syntax would fail on a NULL field:

class BlobStream: Stream
{
    private SqlCommand cmdAppendChunk;
    private SqlCommand cmdFirstChunk;
    private SqlConnection connection;
    private SqlTransaction transaction;

    private SqlParameter paramChunk;
    private SqlParameter paramLength;

    private long offset;

    public BlobStream(
        SqlConnection connection,
        SqlTransaction transaction,
        string schemaName,
        string tableName,
        string blobColumn,
        string keyColumn,
        object keyValue)
    {
        this.transaction = transaction;
        this.connection = connection;
        cmdFirstChunk = new SqlCommand(String.Format(@"
UPDATE [{0}].[{1}]
    SET [{2}] = @firstChunk
    WHERE [{3}] = @key"
            ,schemaName, tableName, blobColumn, keyColumn)
            , connection, transaction);
        cmdFirstChunk.Parameters.AddWithValue("@key", keyValue);
        cmdAppendChunk = new SqlCommand(String.Format(@"
UPDATE [{0}].[{1}]
    SET [{2}].WRITE(@chunk, NULL, NULL)
    WHERE [{3}] = @key"
            , schemaName, tableName, blobColumn, keyColumn)
            , connection, transaction);
        cmdAppendChunk.Parameters.AddWithValue("@key", keyValue);
        paramChunk = new SqlParameter("@chunk", SqlDbType.VarBinary, -1);
        cmdAppendChunk.Parameters.Add(paramChunk);
    }

    public override void Write(byte[] buffer, int index, int count)
    {
        byte[] bytesToWrite = buffer;
        if (index != 0 || count != buffer.Length)
        {
            bytesToWrite = new MemoryStream(buffer, index, count).ToArray();
        }
        if (offset == 0)
        {
            cmdFirstChunk.Parameters.AddWithValue("@firstChunk", bytesToWrite);
            cmdFirstChunk.ExecuteNonQuery();
            offset = count;
        }
        else
        {
            paramChunk.Value = bytesToWrite;
            cmdAppendChunk.ExecuteNonQuery();
            offset += count;
        }
    }

    // Rest of the abstract Stream implementation
 }

Using the BlobStream

To use this newly created blob stream class you plug into a BufferedStream. The class has a trivial design that handles only writing the stream into a column of a table. I'll reuse a table from another example:

CREATE TABLE [dbo].[Uploads](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [FileName] [varchar](256) NULL,
    [ContentType] [varchar](256) NULL,
    [FileData] [varbinary](max) NULL)

I'll add a dummy object to be serialized:

[Serializable]
class HugeSerialized
{
    public byte[] theBigArray { get; set; }
}

Finally, the actual serialization. We'll first insert a new record into the Uploads table, then create a BlobStream on the newly inserted Id and call the serialization straight into this stream:

using (SqlConnection conn = new SqlConnection(Settings.Default.connString))
{
    conn.Open();
    using (SqlTransaction trn = conn.BeginTransaction())
    {
        SqlCommand cmdInsert = new SqlCommand(
@"INSERT INTO dbo.Uploads (FileName, ContentType)
VALUES (@fileName, @contentType);
SET @id = SCOPE_IDENTITY();", conn, trn);
        cmdInsert.Parameters.AddWithValue("@fileName", "Demo");
        cmdInsert.Parameters.AddWithValue("@contentType", "application/octet-stream");
        SqlParameter paramId = new SqlParameter("@id", SqlDbType.Int);
        paramId.Direction = ParameterDirection.Output;
        cmdInsert.Parameters.Add(paramId);
        cmdInsert.ExecuteNonQuery();

        BlobStream blob = new BlobStream(
            conn, trn, "dbo", "Uploads", "FileData", "Id", paramId.Value);
        BufferedStream bufferedBlob = new BufferedStream(blob, 8040);

        HugeSerialized big = new HugeSerialized { theBigArray = new byte[1024 * 1024] };
        BinaryFormatter bf = new BinaryFormatter();
        bf.Serialize(bufferedBlob, big);

        trn.Commit();
    }
}

If you monitor the execution of this simple sample you'll see that nowhere is a large serialization stream created. The sample will allocate the array of [1024*1024] but that is for demo purposes to have something to serialize. This code serializes in a buffered manner, chunk by chunk, using the SQL Server BLOB recommended update size of 8040 bytes at a time.

How can I stream a BLOB to VARBINARY(MAX) on INSERT

You should be using RBS interface of SQL Server for working with blobs.

Read a Azure Blob via c# and insert into on-premise SQL Server

You could skip the

Json file --> List of Class Object --> Use EntityFramework

steps and parse the json in a stored procedure and then insert it into a table.

CREATE TABLE [dbo].[JsonTest] (
    [Id]          INT           NOT NULL,
    [firstName]   NVARCHAR (50) NULL,
    [lastName]    NVARCHAR (50) NULL,
    [age]         INT           NULL,
    [dateOfBirth] DATETIME2 (7) NULL
);

DECLARE @json NVARCHAR(MAX);
SET @json = N'[
     {"id": 2, "info": {"name": "John", "surname": "Smith"}, "age": 25},
     {"id": 5, "info": {"name": "Jane", "surname": "Smith"}, "dob": "2005-11-04T12:00:00"}
]';

INSERT INTO JsonTest 
SELECT *
FROM OPENJSON(@json)
   WITH (
     id INT 'strict $.id',
     firstName NVARCHAR(50) '$.info.name',
     lastName NVARCHAR(50) '$.info.surname',
     age INT,
     dateOfBirth DATETIME2 '$.dob'
  );

SELECT * FROM JsonTest

https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver15#convert-json-collections-to-a-rowset

For something like this I would also suggest using Dapper.

Out of Memory when reading a string from SqlDataReader

You should try to read the data sequentially by specifying the command behavior when you execute the reader. Per the documentation, Use SequentialAccess to retrieve large values and binary data. Otherwise, an OutOfMemoryException might occur and the connection will be closed.

While sequential access is typically used on large binary data, based on the MSDN documentation you can use it to read large amounts of character data as well.

When accessing the data in the BLOB field, use the GetBytes or
GetChars typed accessors of the DataReader, which fill an array with
data. You can also use GetString for character data; however. to
conserve system resources you might not want to load an entire BLOB
value into a single string variable. You can instead specify a
specific buffer size of data to be returned, and a starting location
for the first byte or character to be read from the returned data.
GetBytes and GetChars will return a long value, which represents the
number of bytes or characters returned. If you pass a null array to
GetBytes or GetChars, the long value returned will be the total number
of bytes or characters in the BLOB. You can optionally specify an
index in the array as a starting position for the data being read.

This MSDN example shows how to perform sequential access. I believe you can use the GetChars method to read the textual data.

Memory Effective Way to Read Blob Data in C#/SQL 2005