Django: Using Custom Raw SQL Inserts with Executemany and MySQL

Django: Using custom raw SQL inserts with executemany and MySQL

Here's a solution that actually uses executemany() !

Basically the idea in the example here will work.

But note that in Django, you need to use the %s placeholder rather than the question mark.

Also, you will want to manage your transactions. I'll not get into that here as there is plenty of documentation available.

    from django.db import connection,transaction
cursor = connection.cursor()



query = ''' INSERT INTO table_name
(var1,var2,var3)
VALUES (%s,%s,%s) '''


query_list = build_query_list()

# here build_query_list() represents some function to populate
# the list with multiple records
# in the tuple format (value1, value2, value3).


cursor.executemany(query, query_list)

transaction.commit()

Django: Bulk operations

Use cursor.executemany(query, param_list) if you need a raw SQL.

param_list = [("something_1", 1), ("something_2", 2),...]
# or everything like [(some_number_1, 1)...]. Apostrophes around the substituted
# "%s" and the complete necessary escaping is added automatically for string
# parameters.

cursor.executemany("""UPDATE database.table
SET field_to_modify=%s
WHERE primary_key_field=%s""",
param_list)

It has many advantages:

  • The query string is much shorter than a big query and can be fast analyzed/optimized without consuming unnecessary resources by the database planner. (If you parse parameters into SQL manually, you get many different SQL command that must be analyzed individually.)
  • Parsing strings into SQL manually is a bad practise because it can be a security issue (SQL injection attack) if you don't escape unexpected apostrophes and backslashes from the user input correctly.

It is an undocumented method though both methods execute(self, sql, params=()) and executemany(self, sql, param_list) are supported for cursor objects by all native db backends (mysql, postgesql_psycopg2, sqlite3, oracle) for long time since Django-0.96 to the current 1.5-beta. A useful similar answer is https://stackoverflow.com/a/6101536/448474 .

The method executemany had two fixed issues related to exception handling in previous years. So, verify for your Django version that you get helpful error messages if you intentionally cause a database exception, too much %s or too little etc. Yet, a few minutes of initial tinkering/testing are faster than many hours of waiting for slow methods.

Accelerate bulk insert using Django's ORM?

Django 1.4 provides a bulk_create() method on the QuerySet object, see:

  • https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.bulk_create
  • https://docs.djangoproject.com/en/dev/releases/1.4/
  • https://code.djangoproject.com/ticket/7596

MySQL INSERT ... ON DUPLICATE KEY UPDATE with django 1.4 for bulk insert

So I created a custom manager. Here is the manager:

class BulkInsertManager(models.Manager):
def _bulk_insert_or_update(self, create_fields, update_fields, values):

from django.db import connection, transaction
cursor = connection.cursor()

db_table = self.model._meta.db_table

values_sql = []
values_data =[]

for value_lists in values:
values_sql.append( "(%s)" % (','.join([ "%s" for i in range(len(value_lists))]),) )
values_data.extend(value_lists)

base_sql = "INSERT INTO %s (%s) VALUES " % (db_table, ",".join(create_fields))

on_duplicates = []

for field in update_fields:
on_duplicates.append(field + "=VALUES(" + field +")")

sql = "%s %s ON DUPLICATE KEY UPDATE %s" % (base_sql, ", ".join(values_sql), ",".join(on_duplicates))

cursor.executemany(sql, [values_data])
transaction.commit_unless_managed()

And a sample model:

class User_Friend(models.Model):
objects = BulkInsertManager() # assign a custom manager to handle bulk insert

id = models.CharField(max_length=255)
user = models.ForeignKey(User, null=False, blank=False)
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)
city = models.CharField(max_length=50, null=True, blank=True)
province = models.CharField(max_length=50, null=True, blank=True)
country = models.CharField(max_length=30, null=True, blank=True)

And sample implementation:

def save_user_friends(user, friends):
user_friends = []
for friend in friends:

create_fields = ['id', 'user_id', 'first_name', 'last_name', 'city', 'province', 'country']
update_fields = ['first_name', 'last_name', 'city', 'province', 'country']

user_friends.append(
[
str(user.id),
str(friend['id']),
friend['first_name'],
friend['last_name'],
friend['city'],
friend['province'],
friend['country'],
]
)

User_Friend.objects._bulk_insert_or_update(create_fields, update_fields, user_friends)

Here is the gist.

Django cursor.executemany what's the preferred batch size for each executemany

MySQL documentation helps elaborate on that:

In most cases, the executemany() method iterates through the sequence of parameters, each time passing the current parameters to the the execute() method.

An optimization is applied for inserts: The data values given by the parameter sequences are batched using multiple-row syntax.

Why is executemany slow in Python MySQLdb?

Try lowercasing the word 'values' in your query - this appears to be a bug/regression in MySQL-python 1.2.3.

MySQL-python's implementation of executemany() matches the VALUES clause with a regular expression and then just clones the list of values for each row of data, so you end up executing exactly the same query as with your first approach.

Unfortunately the regular expression lost its case-insensitive flag in that release (subsequently fixed in trunk r622 but never backported to the 1.2 branch) so it degrades to iterating over the data and firing off a query per row.



Related Topics



Leave a reply



Submit