Data & Information Design

Information DesignI’m not sure why it’s taken me so long to find Giorgi Lupi. Fortunately, serendipity came to my aid and I stumbled across her almost by accident. And what a find! Anyone who does anything with data and information should read her postings, starting with this one:…

I’ve picked out a few nuggets:

Embrace complexity. What made cheap marketing infographics so popular is probably their biggest contradiction: the false claim that a couple of pictograms and a few big numbers have the innate power to “simplify complexity.”

One size does not fit all. Business intelligence tools and dataviz tools for marketers have led many to believe that the ideal way to make sense of information is to load data into a tool, pick from among a list of suggested out-of-the-box charts, and get the job done in a couple of clicks. This common approach is actually nothing more than blindly throwing technology at the problem, sometimes without spending enough time framing the question that triggered the exploration in the first place. This often leads to results that are not only practically useless, but also deeply wrong, because prepackaged solutions are rarely able to frame problems that are difficult to define, let alone solve.

Sketching with data?…in a way, removing technology from the equation before bringing it back to finalize the design with digital tools ?introduces novel ways of thinking, and leads to designs that are uniquely customized for the specific type of data problems we are working with.

What a refreshing perspective on data and information design. It’s a fairly long article – about a 10-minute read, but well worth it, in fact worth reading at least twice because there’s so many insightful ideas here. If there’s an underlying message here, it’s that that we should devote the time to enhancing our human knowledge and skills for understanding complexity, and not relying on technology to do it all for us.

Feel free to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInBuffer this pagePin on Pinterest

Big Data, Data Analytics and AI

Big Data

I was asked by Managing Partners Forum (MPF) recently to give a brief overview of the current status and industry trends in Big Data and Data Analytics, topics I’ve been keeping an eye on for several years. The slides are available on Slideshare. The following is shortened abstract from the presentation.

One of the issues I have with with Big Data is just that – the term “Big Data”. It’s fairly abstract and defies a precise definition. I’m guessing the name began as a marketing invention, and we’ve been stuck with it ever since. I’m a registered user of IBM’s Watson Analytical Engine, and their free plan has a dataset limit of 500MByte. So is that ‘Big Data’? In reality it’s all relative. To a small accountancy firm of 20 staff, their payroll spreadsheet is probably big data, whereas the CERN research laboratory in Switzerland probably works in units of terabytes.

Eric Schmidt (Google) was famously quoted in 2010 as saying “There were 5 exabytes of information created between the dawn of civilisation through 2003, but that much information is now created in 2 days”. We probably don’t need to understand what an ‘exabyte’ is, but we can get a sense that it’s very big, and what’s more, we begin to get a sense of the velocity of information, since according to Schmidt it’s doubling every 2 days, and probably less than that since we’ve moved on by 6 years since his original statement.

It probably won’t come as a surprise to anyone that most organisations still don’t know what data they actually have, and what they’re creating and storing on a daily basis. Some are beginning to realise that these massive archives of data might hold some useful information that can be potentially deliver some business value. But it takes time to access, analyse, interpret and apply actions resulting from this analysis, and in the mean-time, the world has moved on.

According to the “Global Databerg Report” by Veritas Technologies, 55% of all information is considered to be ‘Dark’, or in other words, value unknown. The report goes on to say that where information has been analysed, 33% is considered to be “ROT” – redundant, obsolete or trivial. Hence the ‘credibility’ gap between the rate at which information is being created, and our abilities to process and extract value from this information before it becomes “ROT”.

But the good news is that more organisations are recognising that there is some potential value in the data and information that they create and store, with growing investment in people and systems that can make use of this information.

The PwC Global Data & Analytics Survey 2016 emphasises the need for companies to establish a data-driven innovation culture – but there is still some way to go. Those using data and analytics are focused on the past, looking back  with descriptive (27%) or diagnostic (28%) methods. The more sophisticated organisations (a minority at present)  use a forward-looking predictive and prescriptive approach to data.

What is becoming increasingly apparent is that C-suite executives who have traditionally relied on instinct and experience to make decisions, now have the opportunity to use decision support systems driven by massive amounts of data.  Sophisticated machine learning can complement experience and intuition. Today’s business environment is not just about automating business processes – it’s about automating thought processes. Decisions need to be made faster in order to keep pace with a rapidly changing business environment. So decision making based on a mix of mind and machine is now coming in to play.

One of the most interesting bi-products of this Big Data era is ‘Machine Learning‘ – mentioned above. Machine learning’s ability to scale across the broad spectrum of contract management, customer service, finance, legal, sales, pricing and production is attributable to its ability to continually learn and improve. Machine learning algorithms are iterative in nature, constantly learning and seeking to optimise outcomes.  Every time a miscalculation is made, machine learning algorithms correct the error and begin another iteration of the data analysis. These calculations happen in milliseconds which makes machine learning exceptionally efficient at optimising decisions and predicting outcomes.

So, where is all of this headed over the next few years? I can’t recall the provenance of the quote “never make predictions, especially about the future”, so treat these predictions with caution:

  1. Power to business users: Driven by a shortage of big data talent and the ongoing gap between needing business information and unlocking it from the analysts and data scientists, there will be more tools and features that expose information directly to the people who use it. (Source: Information Week 2016)
  2. Machine generated content: Content that is based on data and analytical information will be turned into natural language writing by technologies that can proactively assemble and deliver information through automated composition engines. Content currently written by people, such as shareholder reports, legal documents, market reports, press releases and white papers are prime candidates for these tools. (Source: Gartner 2016)
  3. Embedding intelligence: On a mass scale, Gartner identifies “autonomous agents and things” as one of the up-and-coming trends, which is already marking the arrival of robots, autonomous vehicles, virtual personal assistants, and smart advisers. (Source: Gartner 2016)
  4. Shortage of talent: Business consultancy A.T. Kearney reported that 72% of market-leading global companies reported that they had a hard time hiring data science talent. (Source: A.T Kearney 2016)
  5. Machine learning: Gartner said that an advanced form of machine learning called deep neural nets will create systems that can autonomously learn to perceive the world on their own. (Source: Ovum 2016)
  6. Data as a service: IBM’s acquisition of the Weather Company — with all its data, data streams, and predictive analytics — highlighted something that’s coming. (Source: Forrester 2016)
  7. Real-time insights: The window for turning data into action is narrowing. The next 12 months will be about distributed, open source streaming alternatives built on open source projects like Kafka and Spark. (Source: Forrester 2016)
  8. Roboboss: Some performance measurements can be consumed more swiftly by smart machine managers aka “robo-bosses,” who will perform supervisory duties and make decisions about staffing or management incentives. (Source: Gartner 2016)
  9. Algorithm markets: Firms will recognize that many algorithms can be acquired rather than developed. “Just add data”. Examples of services available today, including Algorithmia, Data Xu, and Kaggle (Source: Forrester 2016)

The one thing I have taken away from the various reports, papers and blogs I’ve read as part of this research is that you can’t think about Big Data in isolation. It has to be coupled with cognitive technologies – AI, machine learning or whatever label you want to give it. Information is being created at an ever-increasing velocity. The window is getting ever narrower for decision making. These demands can only be met by coupling Big Data and Data Analytics with AI.

Feel free to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInBuffer this pagePin on Pinterest

Watson Analytics

Watson logoI recently had an introductory presentation to IBM’s Watson Analytical Engine and was mightily impressed by what I saw.

IBM Watson is a technology platform that uses natural language processing and machine learning to reveal insights from large amounts of unstructured data. Unstructured data could typically include  news articles, research reports, social media posts and enterprise system data.

You can set up a freemium account on Watson and get immediate access to the full range of features. As with most freemium  services, there are some limits, these come in the form of file size restrictions and data storage. You can only upload flat files that are no more than 100,000 rows and 50 columns and there is data storage limit is 500 MB. If you want more than this you have to consider the Personal or Professional editions.

Watson 2To get started you will need to set up an IBM id (e.g. your email) and agree to the Ts & Cs. Nothing ominous here, and you can opt out of any IBM emails. Once you’re email is validated, sign-in to your newly created account


Once your account has been validated, sign-in and you’ll see the main Watson interface:

Watson 3



To get started I recommend watching the video.

There is a temptation to dive straight in and work your way through the various tools and features. However, not everything is intuitive, and it’s well worth spending some time looking at the various tutorials and help files.  I recommend:

I had a few problems when uploading some of my own “test” datasets, which as I mentioned earlier are limited to 100,000 rows and 50 columns and 500Mb for the free account. If you just want to have a play with the various features, it’s probably better to use one of the tried and tested datasets available from the Watson Analytic Community

A word of warning – you can get totally immersed in the Watson environment, and I’ve probably lost a day or two somewhere in trying out the technology. However, if your job involves data and decision making, I recommend giving it a go.

Remember too, this is a decision support tool and does not a decision-making tool. You still have to engage your brain when looking at the visualisations, and you do have to have some understanding of your data. And don’t go away thinking that the “Predictions” facility is going to give you the winning numbers for this week’s lottery – but by all means try!

Feel free to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInBuffer this pagePin on Pinterest

Murdermap Mashup

Murdermap Mashup

Spotted originally by my colleague Conrad Taylor, a geospatial application that plots more than 400 homicide cases reported by court reports and the Old Bailey’s archives. Something for the ‘gruesome violence’ mashup category maybe. You can even do deep dive query’s according to the type of murder weapon used, e.g. ligature, knife, gun, etc.

According to the website, the ‘murdermap’ project is dedicated to covering every single case of murder and manslaughter in London from crime to conviction. It aims to create the first ever comprehensive picture of homicide in the modern city by building a database stretching from the era of Jack the Ripper in the late 19th Century to the present day and beyond.

Information is obtained from the police, media coverage, court records and original reporting – and by making the map freely available the site’s owners hope to reveal the stories behind the crime figures.

I’m not quite sure of the utility of this data, other than to criminology researchers, though I guess it might be useful for the housing market, e.g. “am I moving to/living in an area where I’m more likely to be shot or stabbed?” Come to think of it, I’ll check that out!

“Maybe it shows there is a fate worse than death – being mashed up afterwards.” CT

Feel free to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInBuffer this pagePin on Pinterest

Semantic, linked and smart data – predictions for 2014

See the full article on Scoop.itData & Informatics

Quite a lot to digest here, though the overall sentiment is positive for development and innovation around open and linked data. Actual products as opposed to proofs, pilots and concepts.

There is also renewed optimism that the Semantic Web can deliver on its original vision, Semantic Web 2.0 (my term) utilising ‘cognition-as-as-service’ (CaaS), and building bridges between ‘Big Data’ and the Semantic Web in order to rurn unstructured chaos into higher level insights.

The following abstract caught my eye:

One less obvious problem is one of information retrieval. Keyword search is now fundamentally broken. The more information is out there, the worse keyword search performs. Advanced query systems like Facebook’s Graph Search or Wolfram Alpha are only marginally better than keyword search. Even conversation engines like Siri have a fundamental problem. No one knows what questions to ask. We need a web in which information (both questions and answers) finds you based on how your attention, emotions and thinking interconnects with the rest of the world.

Sounds good if a little utopian.

Overall, some useful insights in this piece.

Original source:

Feel free to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInBuffer this pagePin on Pinterest

Introduction to Linked Data (presentation)

Dave Reynolds png

I was pleased to attend a presentation on linked data at the BCS Data Management Specialist Group on Tuesday (26th July), given by Dave Reynolds, co-founder of Epimorphics Ltd, and  one of the data experts I have frequently turned to for advice when scoping the requirements for the Knowledge Hub project. (Dave is a members of the Data & Apps Advisory Group for the Knowledge Hub).

The presentation included metadata management, e-Commerce uses, inference and information extraction, text mining, syntax (various flavours – RDF/XML, Turtle, RDfa), and knowledge representation through Ontologies (e.g. Web Ontology Language, OWL).

Dave explained a fairly complex topic (well, complex for those not yet fully immersed in modelling information solutions using linked data) in a simple but engaging style, using his slides to show examples of linked data constructs. Well worth a look for anyone who wants to get a deeper understanding of the topic (if nothing else, check out the strengths/weaknesses towards the end of the presentation).

The slides are available from SlideShare: Introduction to linked data, and a copy embedded below.

Feel free to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInBuffer this pagePin on Pinterest

Knowledge Hub Data and Apps Workshop

This blog post is to thank all of the participants (presenters and delegates) to the Knowledge Hub Data & Apps workshop that was held in London yesterday (27 April 2011). The workshop was used to establish the foundations for the “KHub Data and Apps Advisory Group”, who we are hoping will help us to shape the forthcoming data/apps developments for the Knowledge Hub.

As readers of my previous posts about the Knowledge Hub may be aware, the first (Beta) release will go live next month (May – exact date TBD). This represents the completion of Sprint 9 of 22, which delivers the collaboration tools and facilities (blogs, wikis, library, events, people-finder, library, web conferencing, activity streams etc.). [NB. Sprints are the functional elements delivered as part of an agile development process].

The remainder of the Sprints will be delivering key data intelligence/data management features, including:

1. Semantic Matching Engine

  • Will match aggregated conversations, communities and topics to people;
  • Will suggest connections between people
  • Will recommend content according to explicit and implicit profile data

2. Data library/catalogue

  • Can upload data/datasets in semi-structured and machine readable formats (e.g. Excel, CSV,  XML)
  • Can identify and catalogue external (e.g. open and/or linked) datasets
  • Ability to create/edit metadata for each dataset (e.g. for provenance, licensing etc.)
  • Datasets can be permissioned.
  • Datasets will be indexed by the KHub search engine

3. Mashup Engine

  • Allows users to combine or compare data (meaningful comparisons will require a common schema)
  • Data can be ‘mashed’ using KHub-sourced data and external data sources.
  • Support for data visualisations
  • Features similar to
  • Will use open source mapping services
  • Potential to provide index of SPARQL end-points

4. App Store

  • Supports any app compliant with the OpenSocial standard
  • Mashups developed on KHub can be simply added to the App Store
  • Will include reviews and star ratings
  • Support for free and commercial (licensed) apps
  • Apps will be able to use data from both Khub (via an API) and/or external sources

Data Repository

  • Requirements to be refined, but intention is to be able to support triple-stores (RDF/SPARQL) and XQuery/XML)

All of the above is scheduled to be developed and released between June and October this year. The Data & Apps Advisory Group will be instrumental in shaping these features and capabilities, as well as providing advice on the underlying support and operational procedures, and skills/training needs.

Initial outputs from the workshop are available on the Knowledge Hub Community of Practice (Data and Apps Advisory Group Theme).

Terms of Reference for the Data & Apps Advisory Group is in the attached PDF. If anyone with the appropriate skills and knowledge wishes to be involved in this group, then please let me know (add your expression of interest into the comments section of this blog).

I will post an update to this blog once the full report from the workshop is available.

Data & Apps Advisory Group ToR
Feel free to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInBuffer this pagePin on Pinterest

Ordnance Survey datasets and products available for free use and re-use from 1 April 2010.

On 23 December 2009, the Government published a consultation paper on policy options for geographic information from Ordnance Survey. The purpose of this consultation was to seek views about how to best implement proposals made by the Prime Minister on 17 November 2009, to make certain Ordnance Survey datasets available for free with no restrictions on re-use. This was part of the PM’s vision for the role of public data and information in the delivery of Smarter Government that would empower citizens with better public services and a thriving private sector market based on the data that government produces.

A response to the Government consultation on making Ordance Survey (mapping) datasets available for use and re-use is available on the CLG website.

Key points from the consultation are:

A package of datasets will be made freely available to the public and will be released under the product name OS OpenDataâ„¢.

The datasets that are released as part of OS OpenData will continue to be maintained by Ordnance Survey to a high and consistent standard. To ensure the product set remains relevant and continues to fulfil its objectives, it is envisaged that this product set will be reviewed periodically by an expert panel appointed by government and reporting to CLG Ministers.

The OS OpenData will include

• OS Street View®
• 1:50 000 Gazetteer
• 1:250 000 Scale Colour Raster
• OS LocatorTM
• Boundary-LineTM
• Code-Point® Open
• Meridian™ 2
• Strategi®
• MiniScale®
• OS VectorMap™ District (available 1 May 2010)
• Land-Form PANORAMA®

The OS VectorMapâ„¢ District dataset is a new product and is available in both raster and vector formats. It is designed to be a flexible and customisable product specifically designed for use on the web. It will enable developers to select, customise and modify maps to their specific requirements.

OS OpenData products will be available from 1 April 2010 in hard media and as an on-line service at

In addition OS OpenData will include an on-line viewing service of a selection of the OS OpenData topographic products.

This initiative continues the trend in making public data public (over 30000 datasets now available through the portal) and will no doubt spawn the development of a whole new raft of innovative mashups, widgets and apps by social innovators. Exciting times!

Feel free to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInBuffer this pagePin on Pinterest