Open a license allows all users to make and

Open Source and Intellectual Property in the Digital SocietyTechnische Universität BerlinSyeda Mehak ZahraMeryem Naseer         Challenge Description Prepare a case study on how a specific FLOSS solution became a de-facto standard: There are many examples of FLOSS solution that attracted dominant industry support to become a de-facto standard – like the Linux kernel, Openstack, or the Apache web server. How a FLOSS solution reach such a status? Are there attempts to also create formal standards out of them? What are de facto standards, how are they developed in the FLOSS community, and what is their importance?IntroductionThis term paper basically discusses how a particular open source software solution adopted by the community becomes a standard with time. Two standard types are discussed in the paper: defacto standard i.e. the market driven standard that results when a mass of community begins to use the solution and dejure standard which is the standard which results when it is approved through a formal law organization. There are many FLOSS examples in the society who gained such status. In this paper a particular case study on Apache Flink is discussed in detail. And it is shown how it is developed and adopted by the community and the collaborations among them made that particular solution a standard. Apache’s users include renowned companies, educational institutes and software houses etc. and are increasing with the passage of time because of its utilities. Each aspect is considered in much detail in the later sections of the term paper. FLOSS- Free Libre Open Source SoftwareIntroduction In open source softwares everyone is freely licensed to use ,copy and redistribute the copies of the code because of the free availability of source code to its users. Thus making its sharing much easier.It is a copyrighted software that is distributed as source code, under a license agreement which grants special rights to users of the software, rights that are normally reserved for the author. Such a license allows all users to make and distribute copies of the software binaries and source code, without special permission from the author. Furthermore, it allows users to modify the source code, and distribute modified copies.What really matters is that open source software is community owned. It is software that is maintained by the community of people (or companies) that use it. It is freely available on the Internet, and anyone may use it. More importantly, users are encouraged to improve upon it. By sharing our improvements and ideas, pooling our resources with thousands, even millions of others around the world via the Internet, the open source community is able to create powerful, stable, reliable software, at very little cost.But the open source community is much larger than just the people who write the software. Everyone who uses the software participates in a real community and has a voice in its direction. You don’t have to be a programmer. By merely reporting a bug to a program’s author, or writing a simple “how-to” article, you contribute to the community and help to make the software better. Open-source software is written, documented, distributed and supported by the people who use it. That means that it is sensitive to your needs, not the needs of a corporation trying to sell it to you.Thus open source software is freely licensed to UseCopyChange the software in any way Source code is openly sharedImportance:Standards implemented in open source software can:Reduce risks for lock inImprove interoperabilityPromotes competition on the marketLock-in the close customer loyalty to products / services or a provider , which it the customer because of switching costs and other barriers makes it difficult to change the product or providerInteroperability is the ability to collaborate between different systems, techniques or organizations.Examples:Following are few FLOSS solutions that acquires a prominent industry support:    Linux kernel      Open source computer operating system kernelOpen Stack     Free open source software platform for cloud computingApache web server    Free and open source cross-platform for web server software 3.  Case Study – Apache FlinkApache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.Flink is an open-source framework for distributed stream processing that:Provides results that are accurate, even in the case of out-of-order or late-arriving dataIs stateful and fault-tolerant and can seamlessly recover from failures while maintaining exactly-once application statePerforms at large scale, running on thousands of nodes with very good throughput and latency characteristicsEarlier, we discussed aligning the type of dataset (bounded vs. unbounded) with the type of execution model (batch vs. streaming). Many of the Flink features listed below–state management, handling of out-of-order data, flexible windowing–are essential for computing accurate results on unbounded datasets and are enabled by Flink’s streaming execution model.The official Apache Flink project R1, describes Flink as follows: “Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.” The platform offers software developers varying application-programming interfaces (APIs) to create new applications that are executed on the Flink engine   3.1 History   In June 2008, Prof. Volker Mark laid the foundations of a new group at Technical university Berlin. The name of group was made as Database systems and Information Management abbreviated as DIMA. This is the same time, when the idea about apache flink was initiated. Professor Volker had a vision to develop a system which can process massive amount of data parallel and in distributed fashion. It was the same time when a similar rival of apache Flink, Apache spark was released in 2009 in USA. The idea of apache flink was to have similar solution to apache spark but to have distinct features which can outperform apache spark. The idea was to provide a system with unified API’s to process large amount of data using relation query language with user defined functions and provide the much better stream processing framework. Prof. Markl’s PhD students Stephan Ewen and Fabian Hüske built the very first prototype and shortly thereafter teamed up with Daniel Warneke, a PhD student in Prof. Odej Kao’s Complex and Distributed IT Systems (CIT) Group at TU Berlin. Soon after, Prof. Markl and Prof. Kao sought to collaborate with additional systems researchers in the greater Berlin area, in order to extend, harden, and validate their initial prototype. In 2009, Prof. Markl and Prof. Kao, jointly with researchers from Humboldt University (HU) of Berlin & the Hasso Plattner Institute (HPI) in Potsdam, co-wrote a DFG (German Research Foundation) research unit proposal entitled “Stratosphere – Information Management on the Cloud R4,” which was funded in 2010. This initial DFG grant (spanning 2010-2012) extended the original vision to develop a novel, database-inspired approach to analyze, aggregate, and query very large collections of either textual or (semi-)structured data on a virtualized, massively parallel cluster architecture. The follow-on DFG proposal entitled, “Stratosphere II: Advanced Analytics for Big Data” was also jointly co-written by researchers at TU Berlin, HU Berlin, and HPI and was funded in 2012. This second DFG grant (spanning 2012-2015) shifted the focus towards the processing of complex data analysis programs with low-latency. These early initiatives coupled with grants from the EU FP7 and Horizon 2020 Programmes, EIT Digital, German Federal Ministries (BMBF and BMWi), and industrial grants from IBM, HP, and Deutsche Telekom, among others provided the financial resources necessary to lay the initial foundation. Certainly, funding plays a critical role, however, success could only be achieved with the support of numerous collaborators, including members at DFKI (The German Research Centre for Artificial Intelligence), SICS (The Swedish Institute of Computer Science), and SZTAKI (The Hungarian Academy of Sciences), among many others who believed in our vision, contributed, and provided support over the years. In addition, the contributions from numerous PhD and Master’s students, and Postdoctoral Researcher Dr. Kostas Tzoumas paved the way for what is today Apache Flink. The project was accepted as an Apache Incubator project on April 16, 2014 and then went on to become an Apache Top-Level Project in December 2014. Kostas Tzoumas and Stephan Ewen established a startup company and gathered many creators of apache Flink. They named the company as Data Artisans. The idea was to make  a company which was devoted to make apache flink as data driven next generation open source project for programming data-intensive applications.The core feature of Apache Flink was to provide true streaming that is a streaming which is not based on mini batches to simulate streaming only. Apache Flink is based on streams and operators. It has attractive features like abstraction, high level set of APIS, feature to combine static and streaming data, support to run SQL queries on the data, Graph processing, machine learning and real time stream processing. It came later than Apache Spark and outperformed many systems including Apache Spark in stream processing. Apache Flink got popularity due to its continuous checkpointing and better fault tolerance in streaming applications. We measure the popularity of apache Flink by stackoverflow tags, flink mailing list, contributors, fundings and customers. data Artisans started with a seed financing round of 1 million euros from b-to-v Partners in summer 2014, and raised a Series A round of 5.5 million euros led by Intel Capital with participation from b-to-v Partners and Tengelmann Ventures in April 2016. Data Artisans got the funding from different investors and its total funding is now about $6 million. Since the company was founded, many team members ( from data Artisans are active contributors to Apache Flink, So, the people are successful in their efforts to make apache flink popular and now a days, apache flink is a standard in stream processing and leading the industry in its domain. Universities and companies are using apache flink for data processing.  Apache Flink is today one of the most active open source projects in the Apache Software Foundation with users in academia and industry, as well as contributors and communities all around the world.3.2  What does Flink provide?Flink is best for:Data Variety: Today data is fuel. Everything mobile, laptop, smart watches, social media and IoT sensors are generating huge amount of data in different formats. Apache flink has different types of data sources supported to processes variety of data formats. Apache Flink can process data from files, relational databases and from message queues like apache Kafka. Applications with state: For complex applications which are not just based on filtering and transforming, managing state within these applications (e.g., counters, windows of past data, state machines, embedded databases) becomes hard. Flink provides tools so that state is efficient, fault tolerant, and manageable from the outside so you don’t have to build these capabilities yourself.Data in transit: These days data is being generated from real time devices like smart watches, mobiles and health monitoring devices. Apache Flink is best application framework for developing applications which can process data in real time. Apache Flink is true streaming and process data without delays. It has many operators for stream processing like windowing, sliding etc. to enhance the functionality. Data in large volumes: As data is being generated in huge amount these days. We need large amount of nodes connected together to make a cluster to process huge data. Apache Flink is a platform to process huge amount of data in parallel and distributed fashion. We can run apache Flink on a cluster to process big data.  4.  Apache Flink Success StoriesWe will talk about top 7 use case of Apache Flink deployed in Fortune 500 companies in this use case tutorial. Apache Flink also known as 4G of Big Data, understand its real life applications, here we will discuss real world case studies of Apache Flink. Apache Flink is deployed in production at leading organizations like Alibaba, Bouygues, Zalando, etc. we will see these game-changing use cases of Apache Flink.  Bouygues Telecom- Third largest mobile provider in FranceThe Bouygues Group ranks in Fortune’s “Global 500.” Bouygues uses Flink for real-time event processing and analytics for billions of messages per day in a system that is running 24/7.Bouygues adopte Apache Flink because it supports true streaming at the API and at the runtime level with low latency. It also decrease system startup time, which helped them in extending the business logic in the system.Bouygues wanted to get real-time insights about customer experience, what is happening globally on the network, and what is happening in terms of network evolutions and operations. Team built a system to analyze network equipment logs to identify indicators of the quality of user experience to fulfil this. The system handles 2 billion events per day (500,000 events per second) with a required end-to-end latency of fewer than 200 milliseconds (including message publication by the transport layer and data processing in Flink). This was achieved on a small cluster reported to be only 10 nodes with 1 gigabyte of memory each.Planning was to use Flink’s stream processing for transforming and enriching data and pushing back the derived stream data to the message transport system for analytics by multiple consumers.This approach was chosen explicitly. Flink’s stream processing capability allowed the Bouygues team to complete the data processing and movement pipeline while meeting the latency requirement and with high reliability, high availability, and ease of use. The Apache Flink framework, for instance, is ideal for debugging, and it can be switched to local execution. Flink also supports program visualization to help understand how programs are running. Furthermore, the Flink APIs are attractive to both developers and data scientists.King – The creator of Candy Crush SagaKing – the leading online entertainment company has developed more than 200 games, being played in more than 200 countries and regions.Any stream analytics use case becomes a real technical challenge when more than 300 million monthly users generate more than 30 billion events every day from the different games and systems. To handle these massive data streams using data analytics while keeping maximal flexibility was a great challenge that has been overcome by Apache Flink.Flink allows data scientists at King to get access to these massive data streams in real time. Even with such a complex game application, Flink is able to provide out of the box solution. Zalando – Leading E-commerce Company in EuropeZalando has more than 16 million customers worldwide and uses Apache Flink for real-time process monitoring. A stream-based architecture nicely supports a micro services approach being used by Zalando, and Flink provides stream processing for business process monitoring and continuous Extract, Transform and Load (ETL)Otto Group – World’s second largest online retailerOtto Group BI Department was planning to develop its own streaming engine for processing their huge data as none of the open source options were fitting its requirements. After testing Flink, the department found it fit for crowdsourcing user-agent identification and identifying a search session via stream processing. Research Gate – Largest academic social networkResearch Gate is using Flink since 2014 as one of its primary tools in the data infrastructure for both batch and stream processing. It uses Flink for its network analysis and near duplicate detection to enable flawless experience to its members.Alibaba Group – World’s largest retailerAlibaba works with buyers and suppliers through its web portal. Flink’s variation (called Blink) is being used by the company for online recommendations. Apache Flink provides it the feature to take into consideration the purchases that are being made during the day while recommending products to users. This plays a key role on special days (holidays) when the activity is unusually high. This is an example where efficient stream processing plays over batch processing.Capital One – Fortune 500 financial services companyBeing a leading consumer and commercial banking institution, the company had the challenge to monitor customer activity data in real time. They wanted this to detect and resolve customer issues immediately and enable flawless digital enterprise experience. Current legacy systems were quite expensive and offered limited capabilities to handle this. Apache Flink provided a real time event processing system that was cost effective and future proof to handle growing customer activity data.   5.  De Jure Versus De FactoDe jure standards, or standards according to law, are endorsed by a formal standards organization. The organization ratifies each standard through its official procedures and gives the standard its stamp of approval. De facto standards, or standards in actuality, are adopted widely by an industry and its customers. They are also known as market-driven standards. These standards arise when a critical mass simply likes them well enough to collectively use them. Market-driven standards can become de jure standards if they are approved through a formal standards organization.Formal standards organizations that create de jure standards have well-documented processes that must be followed. The processes can seem complex or even rigid. But they are necessary to ensure things like repeatability, quality, and safety. The standards organizations themselves may undergo periodic audits. Organizations that develop de jure standards are open for all interested parties to participate. Anyone with a material interest can become a member of a standards committee within these organizations. Consensus is a necessary ingredient. Different organizations have different membership rules and definitions of consensus. For example, most organizations charge membership fees (always remember that standards development is not free), which vary quite a bit. And some organizations consider consensus to be a simple majority while others require 75% approval for a measure to pass.Because of the processes involved, de jure standards can be slow to produce. Development and approval cycles can take time as each documented step is followed through the process. Achieving consensus, while important and good, can be a lengthy activity. This is especially apparent when not all members of the committee want the standard to succeed. For various reasons—often competitive business—participants in a committee are there to stall or halt the standard. However, once a de jure standard completes the entire process, the implementers and consumers of the standard gain a high level of confidence that it will serve their needs well.De facto standards are brought about in a variety of ways. They can be closed or open, controlled or uncontrolled, owned by a few or by many, available to everyone or only to approved users. De facto standards can include proprietary and open standards alike.  5.1 Apache Flink as a de-facto standardApache flink is widely used now a days due to its tremendous new features. It has been adapted by many good organizations also. Apache Flink now has a user community of 180 contributors worldwide, more than 10,000 attendees to regular meetups in many cities in Europe, USA, South America, and Asia, at least 13 companies using it in production, many more research projects and academic institutions, as well as a startup that attracted VC funding of more than 6 Mio Euros. Few of the stats of the apache flink are as follows that will depict it as a de-facto standard: The  Apache Flink CommunityApache flink has reached . As of May 31, 2016 there are 186 contributors (as reflected in GitHub,, 33 Meetups worldwide (, over 6300 Apache Flink Meetup members, and almost 4800 Meetup members in big data related groups, where Apache Flink is also a topic of interest. These stats are shown below:Community Growth rate          Following are the stats for the community growth rate of apache in the few years which is a high rate success story. First, here’s a summary of community statistics from GitHub. At the time of writing:Contributors have increased from 258 in December 2016 to 352 in December 2017 (up 36%)Stars have increased from 1830 in December 2016 to 3036 in December 2017 (up 65%)Forks have increased from 1255 in December 2016 to 2070 in December 2017 (up 65%)       Below mentioned is a figure that demonstrates the meetups used for the apache flink usage:Worldwise Use of Apache Flink Below mentioned are the stats for apache flink usage worldwide which shows USA a biggest user for apache flink and germany secures a second position.Usage in Different Organizations          Conclusion :We are afraid of ideas, of experimenting, of change. We shrink from thinking a problem through to a logical conclusion.    Anne SullivanAfter having analyzed these obtained results, we can conclude that the open source community is increasing day by day and through above mentioned discussion we can analyze that how particular floss solution ( Apache flink) become a de-facto standard ( custom or convention that has achieved a dominant position by public acceptance or market forces ) due to its major acceptance rate and high increased usage worldwide which basically attempts to create formal acceptance and standards.From the above mentioned discussion we learned that De facto standards (market driven standards) are the standards that are widely accepted  by an industry and its customers and these standards come to existence when a  mass simply considers them well enough to collectively use them. From the above information and research we also analyzed that :Market-driven standards (De-facto) can become de jure standards if they are approved through a formal standards organization.If standard is considered a de facto standard it does not mean that it is the best. Most of the time achieve their status because they were the first to arrive on the market scene, or because a dominating organization imposes the standard on others forcing its usage. Often inferior de facto standards remain due to the costs involved when attempting to switch to another standard. So both standards plays an important de-facto as well as de-jure standards.So we conclude from the report that apache flink has reached high rank of acceptance to make it to the de-facto standard and also has been formally accepted from some standard institutes to use it as a standard. So this particular case study proves that apache flink has reach its status. ReferencesWebsites:A Historical Account of Apache FlinkTM : Its Origins, Growing Community, and Global Impact by Juan Soto, Technische Universität  Berlin  May 31, 2016 Flink in 2017: Year in Review