How to avoid inserting duplicate entries when adding values via a sqlalchemy relationship?
The SQLAlchemy wiki has a collection of examples, one of which is how you might check uniqueness of instances.
The examples are a bit convoluted though. Basically, create a classmethod get_unique
as an alternate constructor, which will first check a session cache, then try a query for existing instances, then finally create a new instance. Then call Language.get_unique(id, name)
instead of Language(id, name)
.
I've written a more detailed answer in response to OP's bounty on another question.
Preventing duplicate entries with sqlalchemy in preexisting sqllite table
The code snippet below works on my side with python version 2.7 and sqlalchemy version 1.0.9 and sqlite version 3.15.2.
from sqlalchemy import create_engine, MetaData, Column, Integer, Table, Text
from sqlalchemy.exc import IntegrityError
class DynamicSQLlitePipeline(object):
def __init__(self, table_name):
db_path = "sqlite:///data.db"
_engine = create_engine(db_path)
_connection = _engine.connect()
_metadata = MetaData()
_stack_items = Table(table_name, _metadata,
Column("id", Integer, primary_key=True),
Column("case", Text, unique=True),)
_metadata.create_all(_engine)
self.connection = _connection
self.stack_items = _stack_items
def process_item(self, item):
try:
ins_query = self.stack_items.insert().values(case=item['case'])
self.connection.execute(ins_query)
except IntegrityError:
print('THIS IS A DUP')
return item
if __name__ == '__main__':
d = DynamicSQLlitePipeline("pipeline")
item = {
'case': 'sdjwaichjkneirjpewjcmelkdfpoewrjlkxncdsd'
}
print d.process_item(item)
And the output for the second run would be like :
THIS IS A DUP
{'case': 'sdjwaichjkneirjpewjcmelkdfpoewrjlkxncdsd'}
I did not see much difference between your code logic. The only difference might be the version I guess.
SQLalchemy Avoid duplicate in session() before commiting
It'd seem that you might be better off "deduplicating" in your application:
seen = set()
# Reversed so that the last row wins.
for row in reversed(database):
c_hash = row['c_hash']
if c_hash not in seen:
session.merge(Mytable(hash=c_hash,
date=row['date'],
text=row['text']))
seen.add(c_hash)
In theory you could let SQLAlchemy handle the deduplication as well:
for row in database:
session.merge(Mytable(hash=row['c_hash'],
date=row['date'],
text=row['text']))
session.flush()
The trick is to flush in between, so that later merges will consult the DB and find the existing row, but this will be performing more queries, compared to the other solution.
Stop Inserting into Table when Duplicate Value Detected Flask SQLAlchemy
Your model should have unique indexes for some criteria to remove duplicates on. Column
s are not unique by default, which you seem to assume (unique=False
in a column and the comments). You should either instead of an auto incrementing surrogate key use some "natural" key such as the id provided by twitter, or make the text column tweet
unique.
When you've fixed the uniqueness requirements and if you wish to ignore IntegrityError
s and keep going, wrap your inserts in transactions (or use the implicit behaviour) and commit or rollback accordingly:
from sqlalchemy.exc import IntegrityError
class listener(StreamListener):
def on_data(self, data):
all_data = json.loads(data)
tweet_id = all_data["id_str"]
tweet_text = all_data["text"]
tweet_username = all_data["user"]["screen_name"]
label = 1
ttweets = TrainingTweets(label_id=label,
tweet_username=tweet_username,
tweet=tweet_text)
try:
db.session.add(ttweets)
db.session.commit()
print((username, tweet))
# Increment the counter here, as we've truly successfully
# stored a tweet.
self.n += 1
except IntegrityError:
db.session.rollback()
# Don't stop the stream, just ignore the duplicate.
print("Duplicate entry detected!")
if self.n >= self.m:
print("Successfully stored", self.m, "tweets into database")
# Cross the... stop the stream.
return False
else:
# Keep the stream going.
return True
Related Topics
Python: Requests.Exceptions.Connectionerror. Max Retries Exceeded With Url
How to Properly Setup Pipenv in Pycharm
Increment Values in a List of Lists Starting from 1
Python - Converting a List of 2 Digit String Numbers to a List of 2 Digit Integers
Sort Array and Return Original Indexes of Sorted Array
Split String At Nth Occurrence of a Given Character
How to Save Training History on Every Epoch in Keras
Opencv - Saving Images to a Particular Folder of Choice
Convert CSV File to Pipe Delimited File in Python
Construct Networkx Graph from Pandas Dataframe
How to Repeat Each Test Multiple Times in a Py.Test Run
Pythonically Add Header to a CSV File
How to Delete a Character in an Item in a List (Python)
How to Split by Commas That Are Not Within Parentheses