Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Simplify the chunk range condition generate logic, reduce the risk of using wrong execute plan(enhancement) #855
Comments
|
We want make a pr to gh-ost |
|
Be aware that the operators like '>=' also support tuples, I generally prefer to use ranges with a closed start and an open end, so I (virtually always) pair '>=' with '<'. |
@druud unfortunately I believe this still applies: http://code.openark.org/blog/mysql/mysql-not-being-able-to-utilize-a-compound-index, and this is the reason for exploding the range comparison query. |
Right, so instead of just expanding the syntactic sugar and then let the optimizer work that out as usual, there has been an optimization effort that derailed. Can't say I haven't seen that happen before, and I might have once or twice caused a few cases myself too. Now I want more than ever a user-configurable very complete server-side SQL-rewrite layer, that can also use table structures and column data types etc. even more than ever before. (I recently hit a case where a JOIN unnecessarily derailed because the data types of the ON-columns weren't exactly the same.) |
Problem Description
When we do a ddl on a table which has a composed primary key, active session arise quickly and the database hang。 But when we do the same ddl operation using pt-online-schema-change, it worked。 It is very confusing。 So we want to do a deep analysis to get the reason。
The reason We got
when we compare the chunk sql of pt-osc and gh-ost . The Range condition has some different like below:
The table's primary key compose with three columns : shard_key, bt_id, object
for the same chunk
start : (973,50,.trash/9d16-4b95-bb50-xxxxxxxxxx155 )
end : (973,50,.trash/9d16-4b95-bb50-xxxxxxxxxx2ss )
the range condition from gh-ost
The range condition from pt-online-schema-change
We Can see that the condition of gh-ost is more complex than pt-osc. when we check the sql execution plan, they are all using primary key .
pt-osc'sql scan only 499 rows , but gh-ost scan 20 millions rows and it take 30 seconds to accomplish .
We think the query condition is more complex, the database optimizer has more risk to using a wrong plan .
Optimization
We change the chunk query condition generate logic to make gh-ost generate a chunk sql query like pt-osc does . Making the condition more simple and readable , We also do more test about data consistency, It worked well now . So we want give this pr to gh-ost .