Big data in cloud
Big
data
in
clou
d
opportu
nities
A
B
S
T
R
A
C
T
W
e
l
c
o
m
e
t
o
t
h
e
n
e
w
e
r
a
o
f
b
i
g
d
a
t
a
.
P
ii
s
a
v
a
l
t
C
o
o
l
I
T
I
0
1
Academic essay (ITI0103) 2019 spring
Introduction
“Data is Everything and Everyone is Data. “[1]
The ability to collect, organize, structure and analyse data on a large scale is probably the
most significant trait that sets us, humans, apart from our primate friends. [1]
To comprehend the opportunities and threats regarding big data located within the cloud, one
must first realize the essence of them. Big data is not only what its name says, but it is also so
much more.
What is Big data?
Big data is a term, which is used to describe a broad spectrum of concepts: from the
technological ability to collect, aggregate, and process data, to the cultural shift that is
pervasively invading industry and society, both drowning in information overload. [2]
Big data can be described by the following properties:
-
Volume. Organizations collect data from a variety of sources, including business
transactions, social media, and information from sensor or machine-to-machine data.
In the past, storing it would’ve been a problem – but new technologies (such as
Hadoop) have eased the burden. [3]
-
Velocity. Data streams in at an unprecedented speed and must be dealt with in a
timely manner. RFID tags, sensors, and smart metering are driving the need to deal
with torrents of data in near-real time. [3]
-
Variety. Data comes in all types of formats – from structured, numeric data in
traditional databases to unstructured text documents, email, video, audio, stock ticker
data, and financial transactions. [3]
With its extensive volumes, it needs to be stored somewhere, so it could be easily accessed
and then processed. The best storage location we have for it, for now, is the cloud, or in other
words, the deep web.
A good analogy is of an iceberg, where the part which is above the water level is the part, that
the user can see and interact with. And the part below is comprised of the database and
algorithms designed to process the data and send it to the appropriate user.
Now, as we have those premises clear, we can finally understand and appreciate the
opportunities of big data and the feasibility of it being stored within the cloud.
OpportunitiesAcademic essay (ITI0103) 2019 spring
Due to the recent data revolution, new data formats and databases with unimaginable scales
have arisen, and artificial intelligence including machine learning greatly benefiting from it.
Artificial intelligence requires tremendous quantities of data for it to be exact, and big data is
exactly that. Some game including machine learning examples are:
-
AlphaGo, a software developed to play the Chinese board game Go [4].
-
Stockfish, an open source chess engine [5].
-
Deep Blue, an older chess-playing computer which beat Kasparov [6].
-
OpenAI, an artificial intelligence research organization that aims to promote and
develop friendly AI in such a way as to benefit humanity as a whole [7].
An example that mimics cognitive thinking:
-
CALO - Cognitive Assistant that Learns and Organizes [8].
Other examples which show reasoning capabilities:
-
Microsoft Cortana, an intelligent personal assistant with a voice interface in
Microsoft's various Windows 10 editions [9].
-
Wolfram Alpha, an online service that answers queries by computing the answer from
structured data [10].
Projects mentioned above all possess the same structure of having 4 primary stages [11][12].
-
Data is collected, examined, cleaned, prepared, split by its origin and where it should
be placed. It can come in many forms, ranging from social media posts to website
cookies. In a game, it involves setting up viable moves, making the data easily
accessible for future use. In voice recognition applications it is required to make
separate databases for different languages.
-
A set of instructions is set, on what to do with the data by selecting algorithms and
setting the goal in mind. In a game, it can be achieved by granting the learner points
for every right choice he does and penalizing for bad ones according to the algorithm
used.
-
It is necessary to evaluate the models in hand by comparing the success of the
algorithms and then cherry-picking the best ones from them. A good example is a
demonstration of DeepMind playing a game of StarCraft II, where AI, named
AlphaStar had plenty of different winning team configurations [13][14]. This whole
step is rinsed and repeated until the best configuration/configurations are found. [11]
-
The final models are sent live to test their capabilities by monitoring them and
applying fresh data. A sample presentation is AlphaStar successfully competing
against real players [14].
Rule-based AI systems have been around for decades, but recent advances in big data,
computational power, and improved algorithms have led to significant improvements in AI
Academic essay (ITI0103) 2019 spring
capabilities. As a result, more advanced AI systems are moving out of the lab and into the
real world. [12]
Self-driving drones and cars are becoming more and more relevant every day. [15]
Amazon is doing the first step towards a fully autonomous air delivery system, where the
control system delivers the drone to the desired state. The only thing stopping them right now
is keeping it legal, as there is no real protection against hackers, who can cause a malfunction
to happen. Resulting in a drone falling and potentially harming an innocent bystander.[16]
Furthermore, Tesla has recently released a new car with an autopilot feature [17]. They also
announced a car with full self-driving capabilities is plausible in a not so distant future. And it
is all possible due to the fact that big data is easily accessed through the internet connection
for updates to be carried out.
In big data, the software packages provide a rich set of tools and options where an individual
could map the entire data landscape across the company, thus allowing the individual to
analyze the threats faced internally [18]. Making big data somewhat secure as in whole and
also allowing different algorithms to function.
In fact, nowadays, every bigger company, who has to deal with massive amounts of users, has
its own big data database or at least access to one. List of some used based ones: YouTube,
Facebook, Instagram, Amazon, Google, Netflix.
The ones mentioned above all share common features, such as:
-
Having enormous databases.
-
Collecting user data and classifying them in groups, where they share similar traits
like age, what they like and how they see the world.
-
Selling and exchanging user information to reap personal benefit.
Profit is accomplished by classifying people in groups by their similarities and then relevant
ads are displayed for them. Having a Google account has become a norm today, and being
logged in with one is not a coincidence. When the user is surfing through social media or any
other web page, then everything the user does goes to a big data database. Either Google,
owner of the website, or both will know exactly what you are currently doing.
“If you are not paying for it, you're not the customer; you're the product being sold. “[19]
And the aforementioned leads us to some obstacles concerning big data.
ThreatsAcademic essay (ITI0103) 2019 spring
When big data collects your information, then who does it belong to? Does it belong to the
corporation, or to the person in particular? What happens when information was collected
from you, what you considered as private? A big problem can occur from the misuse of that
information. [20]
Enterprises worldwide make use of sensitive data, personal customer information and
strategic documents. When there’s so much confidential data lying around, the last thing you
want is a data breach at your enterprise. [21]
For marketing and research, many of the businesses use big data, but may not have the
fundamental assets particularly from a security perspective. If a security breach occurs to big
data, it would result in even more serious legal repercussions and reputational damage than at
present. [18]
There is no way around it, but to increase security measures, such as putting extra layers
around and encrypting the valuable data within its core, in addition to logging and honeypot
detection. It can become quite a difficult task considering the evergrowing amounts of data,
as we are talking of petabytes of data already.
Data storage and retention is the most obvious risk associated with big data. When data gets
accumulated at such a rapid pace and in such huge volumes, the first concern is its storage.
Traditional data storage methods and technology are just not enough to store big data and
retain it well. Enterprises today need a shift to cloud-based data storage solutions to store,
archive and access big data effectively. [21]
Storing high quantities of data doesn't come cheap. Small and medium-sized businesses are
struggling to afford the initial set up, migration, but when the overhauling cost is taken care
of, then big data acts as an incredible revenue generator for digital enterprises [21].
But when the big data is successfully set up, maintaining it becomes a big issue, as big data is
highly versatile. Data must be organized by its origin and structured, as data can come from
an offline or online source, and it can be either structured and unstructured. [21]
Leading us to the next problem, we lack skilled professionals and technology, as big data is a
reasonably new topic. When the company can't make sense of the data, it can be considered
worthless, or worse yet, it exposes enterprises to the risk of misinterpretation of data, and
wrong decision making. Hiring the right talent and applying the right tools is crucial to make
relevant decisions from a big data project. [21]
Some aforementioned products also bring new never-seen problems with them. As
cybersecurity has never been more important than it is now, then making sure, that your
product isn't misused and it cant be leaked is a must.
There are multiple ways hackers can attack databases and take advantage of their
vulnerabilities.
Few examples of inside-based attacks are:
Academic essay (ITI0103) 2019 spring
-
Abusing excessive privileges. When the user is granted additional privileges that the
user doesn't initially require can result in unexpected results, as the user, who only
needs read-only privileges, but is granted full administrative powers, can edit the data
he initially was not supposed to. [22]
-
Abusing existing privileges. Same goes for workers, who do have the rights to do so.
An example can be made when the mentioned decides to leak some data to the
publicity or sell it for personal gain. [22]
An attack can also happen externally:
-
Stealing the disk image, injecting SQL, bypassing access control, taking memory &
disk snapshot to later analyze and extract the data or staying in the system long
enough to corrupt all defenses and seize the data. [23]
-
Hackers can launch DDoS attacks by infiltrating and leveraging thousands or
millions of unsecured devices. They can cripple infrastructure, down networks, and as
IoT advances into our everyday lives, those attacks may very well put real human
lives in jeopardy. And even if hackers don’t outright threaten lives, they can
compromise gateways and deeper levels of IoT networks in order to reveal and exploit
sensitive personal and corporate information. [24]
For example, when self-driving cars become an actuality, they can launch a big-scaled
attack on the driving algorithm by flooding it with false information or altering it in
another way, causing crashes and possibly killing innocent users. A demonstration
was made by a group of researchers found that they could use off-the-shelf radio-,
sound- and light-emitting tools to deceive Tesla’s autopilot sensors, in some cases
causing the car’s computers to perceive an object where none existed, and in others to
miss a real object in the Tesla’s path [25].
My opinionAcademic essay (ITI0103) 2019 spring
I think big data is great since it gives an enormous boost to the gaming industry by allowing
artificial intelligence to grow in such a rapid paste.
Today, AI is dominating most of the games - from board games to interactive fiction games. I
believe, as AI advances in the gaming industry, it can be later used in more real-life
situations. And I'm not the only one thinking so. Elon Musk[26], who has invested in multiple
AI companies has stated, that he firmly believes in the numerous possibilities that AI can
bring.
For example, Tesla has already started producing cars with autopilot and self-driving is not so
far fetched, as in a similar note; starship[27] already has a fleet of delivering robots
distributing products to your doorstep in a daily basis.
Knowing how humanity functions, the next probable step would be to exchange on-ground
delivery bots for drones, as the technology progresses, as drones are highly impactful already.
In fact, drones becoming so relevant has already taken over jobs because of their
effectiveness. For example, geo-mapping is completely done with the help of some drones, as
sensors have become so precise. Previously people needed to measure everything by hand
and then analytics needed to make the necessary calculations, but now it is all done by drones
who fly over the are needed to be measured and then algorithms themselves draw a 3D image
with all the necessary data. [28]
But everything positive also has a negative side. Who is responsible, when an accident was to
occur, and an autonomous car was to cause a car crash. Let's presume, the self-driving car
was to cause an accident:
-
if the accident happened because of a malfunction of the driving algorithm or hacked
into, then the car producer should be prosecuted.
-
if the crash was caused because of the malfunction of the machine (either not
changing tires or other instance caused by a lazy user), then the user should be fully
responsible. But at some point, it should be mandatory for a machine to alert the user
if a part needs to be exchanged.
In addition, the non-physical side doesn't come without issues.
Cybersecurity is lacking currently, as leaks are happening from left, right, and center.
Encryption methods big corporations are using are not waterproof either, as some lucky
hackers have gained access to their databases as well. In my opinion, big corporations should
really put more time and money on securing sensitive user data.
Lack of privacy has become a big problem as well. Some people don't want to be observed,
but at the same time, it is the cost of the service you are using. I think it is within the borders
for big corporations to monitor your every move since nothing comes for free and privacy is
the cost I am willing to pay.
ReferencesAcademic essay (ITI0103) 2019 spring
[1] https://medium.com/scidex/data-is-everything-and-everyone-is-data-1886cfce2d92
[Internet Source]. [Used 29. March 2019].
[2] AIP Conference Proceedings 1644, 97 (2015); https://doi.org/10.1063/1.4907823 [Internet
Source]. [Used 26. March 2019].
[3] https://www.sas.com/en_us/insights/big-data/what-is-big-data.html [Internet Source].
[Used 29. March 2019].
[4] https://en.wikipedia.org/wiki/AlphaGo [Internet Source]. [Used 26. March 2019].
[5] https://en.wikipedia.org/wiki/Stockfish_(chess) [Internet Source]. [Used 26. March 2019].
[6] https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer) [Internet Source]. [Used 26.
March 2019].
[7] https://en.wikipedia.org/wiki/OpenAI [Internet Source]. [Used 26. March 2019].
[8] https://en.wikipedia.org/wiki/CALO [Internet Source]. [Used 26. March 2019].
[9] https://en.wikipedia.org/wiki/Cortana [Internet Source]. [Used 26. March 2019].
[10] https://en.wikipedia.org/wiki/Wolfram_Alpha [Internet Source]. [Used 26. March 2019].
[11] J. M. Font Fernandez and T. Mahlmann, "The Dota 2 Bot Competition," in IEEE
Transactions on Games. [Internet Source]. [Used 26. March 2019].
[12] https://www.linkedin.com/pulse/4-stages-machine-learning-ml-modeling-cycle-maurice-
chang [Internet Source]. [Used 26. March 2019].
[13] https://en.wikipedia.org/wiki/DeepMind [Internet Source]. [Used 26. March 2019].
[14] https://www.youtube.com/watch?v=cUTMhmVh1qs [Internet Source]. [Used 26. March
2019].
[15] Annals of Tourism Research Volume 74, January 2019, Pages 33-42
https://doi.org/10.1016/j.annals.2018.10.009 [Internet Source]. [Used 29. March 2019].
[16] https://www.alphr.com/the-future/1004520/droning-on-the-challenges-facing-drone-
delivery/ [Internet Source]. [Used 29. March 2019].
[17] https://www.tesla.com/autopilot?redirect=no [Internet Source]. [Used 26. March 2019].
[18] International Journal of Network Security & Its Applications (IJNSA), Vol.6, No.3, May
2014 [Internet Source]. [Used 26. March 2019].
[19] https://www.metafilter.com/user.mefi/15556 [Internet Source]. [Used 26. March 2019].
[20] https://wiki.itcollege.ee/index.php/Big_Data_ohud_ja_v%C3%B5imalused [Internet
Source]. [Used 26. March 2019].
[21] https://www.estuate.com/company/blog/content/are-you-fighting-5-biggest-risks-big-
data [Internet Source]. [Used 26. March 2019].
[22] https://www.bcs.org/content/ConWebDoc/8852 [Internet Source]. [Used 27. March
2019].
Academic essay (ITI0103) 2019 spring
[23] https://medium.com/@cossacklabs/database-leaks-2017-5852ec3db50a [Internet
Source]. [Used 27. March 2019].
[24] https://www.iotforall.com/5-worst-iot-hacking-vulnerabilities/ [Internet Source]. [Used
27. March 2019].
[25] https://www.wired.com/2016/08/hackers-fool-tesla-ss-autopilot-hide-spoof-obstacles/
[Internet Source]. [Used 27. March 2019].
[26] https://et.wikipedia.org/wiki/Elon_Musk [Internet Source]. [Used 29. March 2019].
[27] https://www.starship.xyz/ [Internet Source]. [Used 29. March 2019].
[28] https://www.dronedeploy.com/ [Internet Source]. [Used 29. March 2019].
Author’s notes:
Excluding references and author’s notes.
Total amount of words – 2,400
Total amount of characters with spaces – 14,500
Total amount of characters excluding spaces – 12,000
Introduction excluding spaces – 1,700
Opportunities excluding spaces – 4,100
Threats excluding spaces – 3,900
My opinion excluding spaces – 2,300
Total amount of words in Italic – 540
Total amount of characters with spaces in Italic – 3,400
Total amount of characters excluding spaces in Italic – 2,800
Percentage of total text – 23.5%
Total amount of references used – 27
Total amount of references used which can be found from scholar.google.com – 4
Document Outline
- [15] Annals of Tourism Research Volume 74, January 2019, Pages 33-42
Kõik kommentaarid