WHILE batch_end_id <= end_id DO SELECT CONCAT( 'UPDATING FROM ', batch_start_id, ' TO: ', batch_end_id) as log ĭELIMITER $$ CREATE PROCEDURE BatchUpdate(īEGIN DECLARE batch_start_id INT DEFAULT start_id ĭECLARE batch_end_id INT DEFAULT start_id + batch_size You can see that after the END$$ command we set the delimiter back to. This allows us to use the default MySQL delimiter within the stored procedure definition. Note that we use the DELIMITER $$ to set the statement delimiter to $$. We create a sliding window defined by starting id of batch_start_id and ending id of batch_end_id which gets moved up by batch_size for each run. In order to run updates in a while loop we need to create a stored procedure to encapsulate this logic. Let’s assume we want to set the st record to NJ this time. We can use this to chunk our updates into batches of 50,000 user records. In our user table, we have a primary column which is a monotonically increasing id. Let’s see how we can update our user table in chunks. Let’s run updates in batches of 50,000 records. What if instead of updating 5 million records in one update statement, we only updated a few user records at a time ? This way, we can keep the number of records being locked at any given time small and reduce the wait time for other updating transactions. Show variables like 'innodb_lock_wait_timeout' In your sql terminal run this update command. Let’s see how user experience may be affected by a large update holding a lock on their records. If you are a user and your record is locked you will not be able to modify your data. A constraint we are working with is to keep downtime for users as small as possible. We have to update 5 million records out of a total of 10 million records to have the st value set to NY. A bug made it to production and now we have st(state) field set incorrectly for users with ids between 3 million(3000000) and 8 million(8000000). Let’s say we work for an e-commerce website. LOAD DATA LOCAL INFILE '/var/lib/data/user_data_fin.csv' INTO TABLE user FIELDS TERMINATED BY ',' Grant permissions to the fake data generator and generate 10 million rows as shown below. The gen_fake.py script generates fake data of the format id,name,is_active flag,state,country per row. add_argument( "-seed", type =int, default = 0, help = "seed")įile_name =args. "-num-records", type =int, default = 100, help = "Num of records to generate" ArgumentParser(description = "Generate some fake data") Let’s set up a project folder and generate some fake user data.ĭef gen_user_data(file_name: str, num_records: int, seed: int = 1) -> None:į " \n " We are going to be using docker to run a MySQL container and python faker library to generate fake data. For our use case, let’s assume we are updating a user table which, if locked for a significant amount of time (say > 10s), can significantly impact our user experience and is not ideal. The usages of these approaches depend on your use case. There are other approaches such as swapping tables, running a standard update depending on your transaction isolation levels, etc. How does the update command lock records ? How to update millions of records without significantly impacting user experience ? This way, only the records that are being updated at any point are locked. One common approach used to update a large number of records is to run multiple smaller update in batches. If those records are locked, they will not be editable(update or delete) by other transactions on your database. What I would do, but I don't know if it's possib.When updating a large number of records in an OLTP database, such as MySQL, you have to be mindful about locking the records. PCIe NVME SSD, which I want to swap with a 1TB drive.The laptop only has one slot. It's a Dell Latitude 7490, and it has a 256GB
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |