Siwawes's Blog: ชิวๆกับ MySQL query performance experiment (มีสาระ)

เรื่องของเรื่องคือมีเรื่องให้ต้องทำแอพ แล้วแอพต้องอ่าน-เขียนข้อมูลลงฐานข้อมูล ขาเขียนเนี่ยมันก็ไม่เท่าไรหรอก แต่ขาอ่านนี่สิ ประเด็นคือต้องทำให้ Query มันมีประสิทธิภาพ(เวลาที่ใช้)สูงที่สุด กับการอ่านข้อมูลจำนวน n ข้อมูลขึ้นมาแบบมีเงื่อนไข แต่ไม่รู้ว่าจะทำยังไงดี ก็เลยต้องทำการทดลองขึ้นมา แล้วก็ได้มาเขียนเป็นบล็อกนี้ เผื่อว่าข้อมูลนี้จะเป็นประโยชน์

ขั้นตอนก็คือ
สร้างฐานข้อมูลชุดทดสอบ table t โดยมี 2 field คือ a และ b ทั้งคู่ให้เป็น interger ก่อน
แล้วก็ใส่ข้อมูล โดยสุมเลขจำนวนเต็ม 0-9 ลงไป ซัก 100,000,000 records (ร้อยล้าน)
แล้วลอง query ข้อมูลออกมา แบบง่ายๆ คือ

#A2

select a,b,a*b as c
from t
where a between 1 and 3 and b between 1 and 3
order by c

#B2

select a,b,a*b as c
from
(select * from t where a between 1 and 3 and b between 1 and 3) as t
order by c

โดยหลักการแล้วทั้ง 2 query นี้ไม่ต่างกัน เพราะตัว Database จะไปหา record ออกมาก่อน แล้วค่อยมาทำ select แต่นั่นคือหลักการ แต่ของจริงคือเราใช้งาน MySQL อยู่ กระบวนการข้างในอาจจะมีอะไรที่ทำให้ประสิทธิภาพแตกต่างกันอย่างมีนัยสำคัญก็ได้

ก่อนที่จะดูผลต้องบอก Environment ก่อน นั่นคือ เตรียม System ได้ 2 System คือ

Macbook pro late 2013
CPU 2.4 GHz Intel Core i5
RAM 8 GB 1600 MHz DDR3
MySQL 5.5.42
Server ตัวถูกสุดของ digital ocean
CPU 1 Core
RAM 512 MB
MySQL 5.5.46

และนอกจากนี้ ยังเพิ่ม query อีก 6 ชุด เพื่อเปรียบเทียบว่าถ้า record ผลลัพท์มากขึ้นประสิทธิภาพจะเป็นอย่างไร ด้วยการเปลี่ยนจากช่วง 1-3 เป็น 1-2, 1-6 และ 1-8 จะได้ sql ดังต่อไปนี้

#A1

select a,b,a*b as c
from t
where a between 1 and 2 and b between 1 and 2
order by c

#B1

select a,b,a*b as c
from
(select * from t where a between 1 and 2 and b between 1 and 2) as t
order by c

#A2

select a,b,a*b as c
from t
where a between 1 and 3 and b between 1 and 3
order by c

#B2

select a,b,a*b as c
from
(select * from t where a between 1 and 3 and b between 1 and 3) as t
order by c

#A3

select a,b,a*b as c
from t
where a between 1 and 6 and b between 1 and 6
order by c

#B3

select a,b,a*b as c

from

(select * from t where a between 1 and 6 and b between 1 and 6) as t

order by c

#A4

select a,b,a*b as c
from t
where a between 1 and 8 and b between 1 and 8
order by c

#B4

select a,b,a*b as c
from
(select * from t where a between 1 and 8 and b between 1 and 8) as t
order by c

ผลการทดสอบขอลงเฉพาะผลที่ได้จาก MBP เท่านั้น เนื่องจากเครื่องของ Digital ocean ไม่รู้สเป็คที่แน่นอน (ผลออกมาในทิศทางเดียวกัน แต่ของ Digital ocean ดีกว่า น่าจะเพราะ CPU แรงกว่า)

Query	Record(s)	#A	#B
1 (1-2)	3,997,334	34.711s	31.767s
2 (1-3)	8,999,411	37.820s	36.361s
3 (1-6)	36,020,162	63.060s	63.941s
4 (1-8)	63,021,051	81.483s	96.416s

#A ทำ query แบบธรรมดา
#B ทำ query แบบ sub-query

จากผลที่เห็น ก็พอบอกได้ว่า ที่จำนวนผลลัพท์เท่ากันเวลาที่ใช้จะต่างกันมากขึ้นตามจำนวนผลลัพท์
แต่ยังไม่สามารถสรุปได้ว่า ระหว่าง แบบธรรมดา กับ แบบ sub-query อย่างไหนจะดีกว่ากัน อาจจะต้องใช้ Data มากกว่านี้

เพิ่มเติม
SQL ที่เอาไว้ Gen data

DELIMITER $$
CREATE DEFINER=`root`@`localhost` PROCEDURE `mockdata`()
BEGIN
set @insert_query = CONCAT('INSERT INTO t(a,b) VALUES (floor(rand() * 10),floor(rand() * 10))');
set @ia = 0;
while @ia <= 10000 do
set @insert_query = CONCAT(@insert_query, ',(floor(rand() * 10),floor(rand() * 10))');
set @ia = @ia + 1;
end while;
prepare stmt FROM @insert_query;
set @ib = 0;
while @ib < 10000 do
execute stmt;
set @ib = @ib + 1;
end while;
END$$
DELIMITER ;

Siwawes's Blog

วันศุกร์ที่ 30 ตุลาคม พ.ศ. 2558

ชิวๆกับ MySQL query performance experiment (มีสาระ)

ไม่มีความคิดเห็น: