Social-media users who post comments about brands and products may be considered more real and authentic than other sources, but rumour-spreading is a problem on Chinese social sites. How do you manage this uncertainty?
Through a combination of human filters and analytic engines built from the ground up. There are a number of international companies that are currently in the space of social data analytics and many claim to be able to analyse Chinese content. However one question to ask is 'are you sure your data engine can analyse the Chinese language?'
In the US in particular, a company may claim it analyses 48 languages and Chinese is one of them. The way they do that is they built their engine tailored to the English language, and machine-translate all the other languages into English and then analyse that. The result is lower accuracy, and slang definitions will get lost because those can't be machine-translated. Also, single Chinese characters need context around a phrase, not keywords like in English, in order to give the right meaning.
You're in the social-data business, so can you spew me some numbers and facts?
In the vertical of social data analytics, the publishers like Sina and Renren are at the top, and provide data either through open APIs [application programming interfaces] or contracts with data-sourcing distributors. These data distributors like GNIP, Datasift and Socialgist clean up the data, filter it of noise and prepare it for companies, such as ourselves, Salesforce and Social Touch, who will face the brand clients with enriched data.
618 million people are on the internet in China with approximately 85 per cent actively engaged in social media. In a nutshell, there is a huge data footprint out there. 71 per cent are urban, 29 per cent are rural. With internet penetration, the overall data touchpoints across social media are going to increase.
And of the 618 million, half of the entire internet population (300 million) in China is shopping online. The fastest-growing online activities, according to CNNIC's Jan 2014 data, are around group buying (+69 per cent) and travel booking (+62 per cent). The size of the market was around US$296 billion in 2013, with estimates that it's going to exceed the US market by 2015 or some say 2017.
The top social networks in China, by the number of monthly active users, are Qzone (625 million), WeChat (355 million), Sina Weibo (143 million), Renren (45 million) and Momo (40 million). When it comes to data analytics, WeChat is a bit of a thorn in our eyes because it is a closed, private platform. We are trying to figure out how to work with Tencent to see how we can get some breadcrumbs of data.
To put it into perspective, WeChat's user base is 79 per cent of WhatsApp's. And this is coming from one single market that is China. For Sina Weibo, a CNNIC source states that 65 per cent are creating posts. However, a recent study by a university in Hong Kong found out that about 10 per cent of all Sina Weibo posts are actually original content, with the rest being read or shared by passive users.
What are some challenges in mining social-media data in China?
The challenges are simple to list out: cultural and language barriers, the complexity of the social-media landscape. The way people used to get access to data was to use a whole bunch of automated crawlers, go into social platforms and take snapshots of the social feeds to analyse. There are literally thousands of such products in China, but when you are talking about millions and millions of posts, and now over time. Sina and Tencent are forced into finding new revenue streams, and one of those [is] by providing some of their data, but not for free.
This pushes up data costs. Chinese data sources are more expensive than Western sources, because Twitter and Facebook say 'take my raw data, go for it' because the more the data used, the more investment from advertisers and so on. In China, that's not the case, it's more like 'use my data, but here's a fee for every single data point'. You can imagine how much the costs go up because the data-sourcing company will put another margin on top.
But I think this is something that is going to change because we see Sina and Tencent very often imitating Western counterparts; they are just basically testing the grounds. We can assume that costs are going to come down.
Another perception problem for Western brands is data storage. One hesitation we face very often is if the data is stored in China, the government can deny you access or switch off the data sources.