Today's topic: AWS - Storage , storage types, S3 types, IOPS
EBS, S3, Lifecycle
Program ----> Checks DB (NoSQL) ---> Storage
For better performance, you need to have all of these components faster
For faster Data access
- Ram (fast)
- CPU (fast)
- Storage (slow - bottle neck) - impact the performance
How can you make storage perform well, or increase the speed of the storage?
Lets say you have a storage say s1
S1 -> I/O
how are you managing it?
- How are you designing the storae.
How are you arranging/organzing/managind the data is important.
Lets say you have a fileA, with 1GB in size.
Do you read entire 1Gb at a time or just read the part of the content?
say complete data also called DataSet, and program comes and read all or part of it for analysis?
or its a huge file say terrabyte. how fast you can retrieve from the storage does matter.
do you have older types of harddisk - magnoc one, which has platters sector and sector which contains 512 byte.
this is a physical sector.
but now a days, os program say lvm, they read/write 8 sector at a time (8x5=8 byte = 1 block)
if writing is continous block, then reading is faster since the header on harddisk directly can access. if the blocks are sphread across, then its hard to access faster.
But its not possible to get continous block allocation if the file size on say GB or TB big.
What can you do in this kind of situation?
we talk about throughput, speed of read/write is called throughput.
In case of AWS,
AWS -> Storage -> Block device -> EBS
google for aws optimized volume
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html
review the document.
Lets go to AWS console
- create a linux system nad login
# fdisk -l
review teh output from first disk.
lets create a folder
# mkdir mydir; cd mydir
# ls -l # review the total
# create a file
# vi myfile.txt
welcome to the clus
=> Now, you see the number 4.
this 4 means, its a 4 block of data is allocated and can be saved.
# get conf -a | less
# blockdev
getsize - get the blocksize os is provided.
# blockdev -h
Go to EBS, create a volume
change volume Type and review the IOPS
Review the Volume Type ST, SC (Cold HDD),
Letss pick
type: sc1
size -> review the range
throughtput (MB) ->
magnetic type or platter based harddisk has header which has to read each block for read or write.
so, we need header where it can read randomly from multiple location.
we want optimized disk or optimized type of disk, these are now available as SSD disk.
how do we calculate the speed of these type of disks?
IOP/s -> Input Output operation per second.
for bigger file, we need different type of hard disk
continouse read
lets say for eg,
1Gb -> 3 IOPS
higher size storage, higher the speed.
GP - general purpose (gp2)
3 IOPS per GB.
3000 IOPS- 1000 GB
Volume Type: Proviisoned IOPS SSD
look at the proce how much you will be paying when you selecting the hard disk
review all Volume Types:
GP2 normal and nost costly thats why by default boot disk is gp2.
100/3000 IOPS
Bustable ->
time
iops
baseline - We get 300 IOPS
got to volume and click on any volume and go to monitoring, you can see how much bustable, available,
- you can also see in cloudwatch.
now, there is something similar, s3
- permanent storage, we used to store object.
- every object is independent.
Lets talk about storage as a whole.
lets say you are a bank and want to store data or information about customer.
- Lets say whole day activities are copied to one file called apr3.csv and saved on S3 bucket in bank
- next day say apr4.csv also saved on s3.
- We keep storing data this way every day.
- Every week there is an audit/analysis
so we go to bucket, use the tool to read and generate report
- the major concern is how fast you can retrieve and acess, process and generate report.
There are different types of storage type
- you have to select what type of storage type you want.
we have a storage type
1. Standard (host-storage) - super fast - expensive say price is $100
2. Standard IA -> In frequest access type prive is $70
what is the difference?
its the IPOS. If you storeage on standard, its faster compared to standard IA.
Go to s3 and create a bucket
bucketname: mybucket
upload a file,
click on upload -> go down the page, you see storage class
- depleding on the type of data access, you can select the different options.
1. standard (fastest - frequent accessed data) - higher in price
2. Standard-IA - Infrequest Access (cold storage) - IF
in S3, we always select the region. this is ehrer your file is going to storage.
But by default it wil stored on all AZs, and if some disaster happens, data will be available from different AZs.
for standard and standard-IA, data will be available on all AZs on that region.
3. One Zone - AZ - availale on only one zone.
lets say we have a database, and we have a copy of database (IA)
and another copy atone-zone-IA.
so in case one zone-IA fails, we will still have data available on another az.
copying backup to different onezone helps you to work on or to access later date.
it will save a lots of money.
google: price for s3
in case of banking data, there might be provision to retrive data 7 years old data.
these data rarely retrieved. it moght be access once in 7 years or not at all.
so, we can use the slowest one.
Note: filename is key
Glacier: one type of s3 service
- cheapest storage
- Retrive data from 1 min to 12 hrs
Glacier Deep Archive
- takes 12 hours to retrive the data.
Intellegetn - tiering
S3 glacier
-> create a vault
Name:
vault is created.
there is not option to upload the file.
You have to use different tool.
its been merge to bucket.
go to bucket and upload on file
- here you can open fast.
on glacier, there is not option to download. the object in glacier, you want to download, you have to first send initiation request and restore it.
You see three type of retirival tier
- bukl etrival (5-10 hrs)
- standar retrival (3-5 hrs)
- expedited retirval (1-5 min)
and there is fee for tetrival fee.
to use, select expedited retrival, and restore -> you have to specifed how many days you want to restore: specify 1 days
google for storage classes :
give you very interesting table with differences.
check availabity, durability is same but check the retival fee.
You can move data from on s3 type to different s3 types
s3 is used for bigdata.
No comments:
Post a Comment