Smart way to Aggregate and Navigate Heterogeneous Data In Handheld Devices Abstract— Smart Content is alike Semantic web service composition which can be defined as a process of generating the aggregated service by integration of independent available component services for satisfying the client request that cannot be satisfied by any available single service. That’s why we need the service composition for generating aggregation of service components according to the requested task. This work presents framework for tag based Smart Content service composition and its various models also. Here we try to search heterogeneous contents present in mobile’s SD Card but not discoverable due to the variety of contents and various file types. We try to develop a tag based repository that can be searched for desired keyword which finds file(s) having that keyword. This paper presents an implementation for Smart Content Tag Repository and mainly focuses Android Mobiles. We have also presented a comparative study of various smart content aggregation and navigation technology on various platforms. Keywords—smart content, tagging, heterogeneous data discovery, content management system, cms, I. INTRODUCTION Growing amount of information dispersed across different sources is an increasing problem of state-of-the-art information management. To be solved effectively, it demands new approaches and tools, strongly focused on the content semantics and supported by automation and intelligence. (1)Smart Content may be considered as the bridge between significant use of data analytics business and actionable expert opinion business. This bridge is realized by the application of analytics and semantics to:a) Generate new products, new services with up to date user experience.b) Drive better usability of existing products, increase discoverability and to provide unified access and interoperability, integration with search resultsc) To realize new dynamic monetization models.II. SCAN (SMART CONTENT AGGREGATION AND NAVIGATION)SCAN is aiming for a solution of major problems of content organization and find-ability in information overload era. SCAN aggregates content from different sources into a single documents collection. This repository keeps record of several documents of their original formats and original locations. Every document record contains a number of data about data (metadata) which holds properties like title, description, author, creation date, etc. that can be edited. The documents content is indexed for search and text analysis. One can search the documents either by simple text queries, or by using special forms to make complex queries for searching on document text and properties. The queries can be saved for repeatable use. Figure 1.1 Tag Cluster courtesy by SourceforgeSCAN text analysis mechanism simplifies the process of tagging. It analyzes document content and suggests the most relevant words as to-be tags. It makes manual tagging as simple as selecting the tags from the proposed candidates. It also can undertake the whole manual process of tagging, either by automated assigning the tags to the documents, or by finding the documents, relevant to a specific tag. Another text analysis application is searching the documents similar to a specific one (search by pattern). How Smart Content Aid Distributed Collaboration Various organizations outsource for content creation and review processes as they observe documentation part as bottleneck of the process. One of the challenges for distributed collaboration is the overlooked user participation and therefore structured editing tools are unfamiliar among them. It makes sense to simplify the editing process and tools for infrequent users. The users require structured editing tools explicitly designed for them. These collaboration tools need to be spontaneous and easy to learn, easily accessible from just about anywhere, and should be affordable so that a larger number of users may participate in the content management. There may be two ways that either the editor will be a plug-in to another word processing system like MS-Word or it may be accessed through a thin-client browser. Some environments allow both ways for traditional structured editing tools. Smart content modularity and enhancement allows flexibility in editing tools and process design. This gives flexibility in collaboration from various user groups within an enterprise.Smart Content and the Pull of Search Engine OptimizationDespite limited resources, the documentation group began to add search metadata to the product manuals. With DITA(8), there was already a predefined structure for topics, used to define sections, chapters, and manuals. Authors and editors could simply include additional tagged metadata that identified and classified the contents and thus expose the information to Google and other web-wide search engines. What does this mean for developing smart content and leveraging the benefits of XML tagging? Certainly the more precise the content enrichment, the more findable information is going to be. When considering the business benefits of search engine optimization, the quality of your tagging can always improve over time. But as a simple value proposition, getting started is the critical first step.Using Cloud for Online EngagementA paper provides a guide to digital marketers, experienced IT persons and business analysts decided to deploy these technologies outside the server room (9). They set out to answer the following questions:1. What do we mean by the cloud? There is a great deal of hype, sales, and marketing messaging around “the cloud.” We explore what it really is and the opportunities it represents for digital marketers. 2. What are the options of development while working with a cloud platform partner? The decision around deploying to the cloud is to choose hosting among inside or outside of a server room. 3. What do organizations need to look for in a Web Engagement Management (WEM) solution in the cloud? If deploying into the cloud is an attractive option for an organization, the organizations should build this key attribute of their selection criteria when choosing a solution.Software Architecture ProposalThe proposal for potential software architecture of how this whole thing could work from the content authoring through the application specific interpretation of a portable content bundle.A CMS seems to be the perfect tool for this job. The creation of a back-end Content Management System is proposed with a restricted set of content types. CMS like this can be used by authors to create descriptive content, organized flow, upload media and to organize the navigation. Functionality of CMS can do things like handling all the resizing/ cropping of images, trans-code video and audio to a standard set of formats, present structured input fields for the creation of repeatable content. A CMS can also provide a simple web-based rendering of the content and navigation allow the content author to “try out” and tweak the tour prior to bundling a platform independent content package of the tour.(11)Algorithm: An effective searching algorithm is used to tag files using the occurrences of the words in the specific file. Top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, Page Rank, adaBoost, kNN, Naive Bayes, and CART(18). Derived algorithm as follows:1. Put the words you want to search for in a hashtable, with the words as keys and the values initialized to 0.2. Iterate over the words of the text, each time checking to see if the word is a key in the hashtable, if so, then increment the value for that key.3. Iterate over the hashtable finding the values which are non-zero, the keys for these are matched words, the values are the counts.a. If key already exists increment the value by 1b. If key does not exist initialize value to 04. List the keys having top 10 values.5. Store these keys in project repository as tags.6. Save the file location corresponding to keys.Working ModelFigure 2.1 Tag Search Process Figure 2.2 Auto-tag ProcessKey features of “MOBISCAN”:Indexing:• Analyzing content accordingly to specific language rules for stems extraction and stopwords filtering.• Caching parsed documents• Guessing basic metadata properties (title, description) from a document contentSearch:• Basic full-text quick search • Advanced search on both documents text and propertiesTagging and text analysis:• Manual tags assigning and editing• Text analysis functions for extraction and prompting the relevant tags• Automated documents tagging based on text analysis mechanism• Optional transparent auto-tagging for new or modified documents (on indexing stage)• Automated assigning a tag to the relevant documents (“tag auto-population”)• Navigating the document collection with the “tags cloud”• Finding the groups of related tags and highlighting them dynamicallyCONCLUSION AND FUTURE SCOPE:It can be concluded from various literature survey and case studies that the work can be integrated with semantic web, ontology, cloud, and other business structures to manage and fetch the documents on the go and there is lesser need of documentation at the end. Some future scope has been discussed below:SCAN platform can be easily extended with plug-ins for new document formats, document locations (RSS feeds, web-sites, e-mail, etc.) and language analyzers. Whole new areas of functionality can be added with user interface extensions. An example of such extensions is the plug-in to browse the repository with a calendar (grouping the documents by their creation dates). The future work may contain: 1. Multi-lingual technologies and applications, XML, standards, integration, content migration, mobile, search, open source, SaaS, semantic technologies, social software, SharePoint, XBRL, and relevant consumer technologies.2. Business-oriented applications of the SCAN technology; design and development of the server version for small and medium enterprise networks.3. Integration with Cloud Computing; by implementing and integrating their ideas and works with content technology. On the front end of the content lifecycle, the ability to create digital documents using software hosted in the cloud has existed for several years now (e.g. Google and ZohoDocs). More recently, technologies have been introduced that allow multiple authors to collaborate simultaneously on the same document (i.e. Google Wave.) In the future, Cloud Content Management platforms could provide services that would allow users to scan documents shoot photos and record audio or video then tag and send the resulting digital files to a managed, active repository hosted in the cloud. In fact, these technologies already exist as mobile applications on smart phones; the next logical step would be to provide that functionality on other computing devices. (10)There is, of course, the potential for Cloud Content Management solutions to provide even more process-based control of active content. Common content collaboration patterns now treated as ad hoc workflows could be codified into standardized, yet flexible rule-based processes. For example, if an individual could subscribe to a specific content tag, He/she could be automatically notified and served a link whenever a new, publicly-accessible file bearing that tag appeared in the Cloud Content Management system. In effect, this would automate the sharing of that file, replacing the all-too-common practice of manually emailing the file to a poorly constructed and maintained distribution list. It is important, however, that future work to automate content collaboration patterns not be done at the expense of the simplicity and ease of use.