python sqlalchemy distinct column values

I have 6 tables in my SQLite database, each table with 6 columns( Date, user, NormalA, specialA, contact, remarks ) and 1000+ rows.

How can I use sqlalchemy to sort through the Date column to look for duplicate dates, and delete that row?


Assuming this is your model:

class MyTable(Base):
    __tablename__ = 'my_table'
    id = Column(Integer, primary_key=True)
    date = Column(DateTime)
    user = Column(String)
    # do not really care of columns other than `id` and `date`
    # important here is the fact that `id` is a PK

following are two ways to delete you data:

  • Find duplicates, mark them for deletion and commit the transaction
  • Create a single SQL query which will perform deletion on the database directly.
  • For both of them a helper sub-query will be used:

    # helper subquery: find first row (by primary key) for each unique date
    subq = (
        session.query(MyTable.date, func.min(MyTable.id).label("min_id"))
        .group_by(MyTable.date)
    ) .subquery('date_min_id')
    

    Option-1: Find duplicates, mark them for deletion and commit the transaction

    # query to find all duplicates
    q_duplicates = (
        session
        .query(MyTable)
        .join(subq, and_(
            MyTable.date == subq.c.date,
            MyTable.id != subq.c.min_id)
        )
    )
    
    for x in q_duplicates:
        print("Will delete %s" % x)
        session.delete(x)
    session.commit()
    

    Option-2: Create a single SQL query which will perform deletion on the database directly

    sq = (
        session
        .query(MyTable.id)
        .join(subq, and_(
            MyTable.date == subq.c.date,
            MyTable.id != subq.c.min_id)
        )
    ).subquery("subq")
    
    dq = (
        session
        .query(MyTable)
        .filter(MyTable.id.in_(sq))
    ).delete(synchronize_session=False)
    

    Inspired by the Find duplicate values in SQL table this might help you to select duplicate dates:

    query = session.query(
        MyTable
    ).
        having(func.count(MyTable.date) > 1).
        group_by(MyTable.date).all()
    

    If you only want to show unique dates; distinct on is what you might need


    While I like the whole object oriented approache with SQLAlchemy, sometimes I find it easier to directly use some SQL. And since the records don't have a key, we need the row number ( _ROWID_ ) to delete the targeted records and I don't think the API provides it.

    So first we connect to the database:

    from sqlalchemy import create_engine
    db = create_engine(r'sqlite:///C:tempexample.db')
    eng = db.engine
    

    Then to list all the records:

    for row in eng.execute("SELECT * FROM TableA;") :
      print row
    

    And to display all the duplicated records where the dates are identical:

    for row in eng.execute("""
      SELECT * FROM {table}
      WHERE {field} IN (SELECT {field} FROM {table} GROUP BY {field} HAVING COUNT(*) > 1)
      ORDER BY {field};
      """.format(table="TableA", field="Date")) :
      print row
    

    Now that we identified all the duplicates, they probably need to be fixed if the other fields are different:

    eng.execute("UPDATE TableA SET NormalA=18, specialA=20 WHERE Date = '2016-18-12' ;");
    eng.execute("UPDATE TableA SET NormalA=4,  specialA=8  WHERE Date = '2015-18-12' ;");
    

    And finnally to keep the first inserted record and delete the most recent duplicated records :

    print eng.execute("""
      DELETE FROM {table} 
      WHERE _ROWID_ NOT IN (SELECT MIN(_ROWID_) FROM {table} GROUP BY {field});
      """.format(table="TableA", field="Date")).rowcount
    

    Or to keep the last inserted record and delete the other duplicated records :

    print eng.execute("""
      DELETE FROM {table} 
      WHERE _ROWID_ NOT IN (SELECT MAX(_ROWID_) FROM {table} GROUP BY {field});
      """.format(table="TableA", field="Date")).rowcount
    
    链接地址: http://www.djcxy.com/p/90794.html

    上一篇: Python中具有正系数的线性回归

    下一篇: python sqlalchemy不同的列值