A new legal text analysis company named Illocution has compiled an interesting lexicon of English tweets which is free to download and use.

Illocution’s lexicon lists the most common and least common bigrams (2-word combinations) that make up today’s Tweets.  Illocution analyzes a million tweets per month, so it’s probably a reliable snapshot of the English-speaking Twitterverse.

Lexicons help us to identify so-called “trash phrases” – word combinations so commonly used that they don’t carry much signal/meaning.  These high-frequency/low meaning phrases are often sources of noise that we can filter out of our analysis.   Lexicons are also useful in helping us target the high-signal/high-volume phrases to prioritize when looking for needs (after considering the domain and other factors).

Copied below is a list of the top 100 most-mentioned two-word phrases used on Twitter during the period 12/2011 to 2/2012.   Note that hashtags, @names and links are not in this list – they were deleted from the sample tweets before processing.

Below, I’ve highlighted the n-grams that are obvious markers for needs/complaints/questions, etc.  Notice how frequently people use the phrases,  “I love”, “I want”, “I need”, “I think”, and “I hate”.  Out of millions of possible 2-word combinations, these phrases rank in the top 100.

This analysis provides clear evidence that people spend a lot of time talking about their needs, desires and opinions on Twitter.

What is your organization doing to find needs expressed in social media that you can meet – right now?

 

rank    gram                 freq count    freq%       tweet count     tweet%
———————————————————————————
    1:  in the                      61,882       0.2249      433,605      14.5872
    2:  to be                       38,798       0.1410      285,429       9.6023
    3:  of the                      38,560       0.1402      287,309       9.6656
    4:  i love                      38,176       0.1388      269,417       9.0637  
    5:  on the                      32,884       0.1195      238,739       8.0316
    6:  if you                      31,371       0.1140      299,150      10.0639
    7:  to the                      30,447       0.1107      227,295       7.6466
    8:  i just                      30,384       0.1104      294,692       9.9139  
    9:  i don’t                     29,489       0.1072      282,865       9.5161
   10:  for the                     28,167       0.1024      205,523       6.9141
   11:  i have                      27,870       0.1013      269,376       9.0623
   12:  going to                    26,240       0.0954      216,162       7.2721
   13:  i want                      24,791       0.0901      195,554       6.5788  
   14:  to get                      24,321       0.0884      183,412       6.1703
   15:  i need                      23,757       0.0863      181,887       6.1190 
   16:  want to                     23,333       0.0848      201,774       6.7880
   17:  in my                       22,671       0.0824      152,183       5.1197
   18:  on my                       22,202       0.0807      147,351       4.9571
   19:  is a                        21,841       0.0794      169,960       5.7177
   20:  have a                      21,319       0.0775      174,108       5.8573
   21:  need to                     21,090       0.0767      176,239       5.9290
   22:  i hate                      20,956       0.0762      199,094       6.6979  
   23:  i was                       20,928       0.0761      203,952       6.8613
   24:  in a                        20,883       0.0759      153,850       5.1758
   25:  i think                     20,099       0.0731      188,764       6.3503  
   26:  at the                      19,759       0.0718      149,926       5.0438
   27:  to go                       19,528       0.0710      137,493       4.6255
   28:  for a                       19,488       0.0708      144,655       4.8664
   29:  i can                       19,170       0.0697      160,669       5.4052
   30:  i am                        19,083       0.0694      160,207       5.3896
   31:  is the                      19,045       0.0692      153,710       5.1711
   32:  when i                      18,757       0.0682      193,264       6.5017
   33:  and i                       18,484       0.0672      159,723       5.3734
   34:  rt i                        18,243       0.0663      200,219       6.7357
   35:  rt rt                       17,844       0.0649      155,462       5.2300
   36:  but i                       17,671       0.0642      150,712       5.0702
   37:  go to                       17,552       0.0638      124,625       4.1926
   38:  this is                     17,401       0.0632      147,500       4.9622
   39:  right now                   17,345       0.0630       69,145       2.3262
   40:  the best                    17,291       0.0628      120,267       4.0460
   41:  have to                     16,986       0.0617      147,550       4.9638
   42:  to do                       16,585       0.0603      105,056       3.5343
   43:  i got                       16,072       0.0584      142,080       4.7798
   44:  like a                      15,875       0.0577      104,296       3.5087
   45:  i can’t                     15,819       0.0575      134,239       4.5160
   46:  if i                        15,758       0.0573      177,054       5.9564
   47:  i will                      15,546       0.0565      131,799       4.4339  
   48:  i know                      15,511       0.0564      144,668       4.8669 
   49:  when you                    15,486       0.0563      161,924       5.4474
   50:  all the                     15,230       0.0554      124,212       4.1787
   51:  i’m at                      15,047       0.0547      119,909       4.0339
   52:  of my                       14,987       0.0545      111,370       3.7467
   53:  to my                       14,037       0.0510      103,702       3.4887
   54:  with the                    13,989       0.0508       96,407       3.2433
   55:  love you                    13,932       0.0506       71,155       2.3938
   56:  a good                      13,569       0.0493       89,105       2.9976
   57:  with my                     13,222       0.0481       82,653       2.7806
   58:  to me                       13,152       0.0478       77,369       2.6028
   59:  i feel                      13,007       0.0473      109,739       3.6918 
   60:  out of                      12,856       0.0467       90,943       3.0595
   61:  do you                      12,600       0.0458      106,886       3.5958
   62:  you are                     12,478       0.0454       99,898       3.3607
   63:  to see                      12,456       0.0453       95,072       3.1984
   64:  how to                      12,328       0.0448       99,411       3.3444 
   65:  will be                     12,298       0.0447       94,428       3.1767
   66:  be a                        12,261       0.0446       86,509       2.9103
   67:  i get                       11,972       0.0435      100,867       3.3933
   68:  with a                      11,920       0.0433       89,069       2.9964
   69:  you can                     11,863       0.0431      102,885       3.4612
   70:  i wanna                     11,846       0.0431       95,987       3.2292
   71:  so much                     11,592       0.0421       75,002       2.5232
   72:  that i                      11,483       0.0417      105,736       3.5571
   73:  to sleep                    11,415       0.0415       56,539       1.9021
   74:  i’m not                     11,392       0.0414      101,679       3.4207
   75:  me and                      11,343       0.0412      101,729       3.4223
   76:  for me                      11,285       0.0410       60,976       2.0513
   77:  you know                    11,152       0.0405      107,947       3.6315
   78:  you have                    11,139       0.0405      104,274       3.5080
   79:  it was                      11,112       0.0404       80,737       2.7161
   80:  i wish                      11,000       0.0400      111,581       3.7538 
   81:  back to                     10,973       0.0399       68,410       2.3014
   82:  a new                       10,959       0.0398       75,215       2.5304
   83:  and the                     10,927       0.0397       84,607       2.8463
   84:  feel like                   10,923       0.0397       86,133       2.8977
   85:  so i                        10,882       0.0396      105,857       3.5612
   86:  i miss                      10,784       0.0392       68,308       2.2980 
   87:  a video                     10,667       0.0388       77,032       2.5915
   88:  one of                      10,566       0.0384       97,403       3.2768
   89:  the same                    10,498       0.0382       67,070       2.2564
   90:  o o                         10,398       0.0378       28,221       0.9494
   91:  my life                     10,238       0.0372       58,512       1.9684
   92:  it is                       10,109       0.0367       75,708       2.5469
   93:  good morning                10,104       0.0367       48,747       1.6399
   94:  on a                        10,074       0.0366       77,647       2.6122
   95:  more for                     9,856       0.0358       20,540       0.6910
   96:  the world                    9,848       0.0358       62,684       2.1088
   97:  happy birthday               9,791       0.0356       59,355       1.9968
   98:  to make                      9,781       0.0356       75,778       2.5493
   99:  you want                     9,774       0.0355       73,538       2.4739
  100:  i really                     9,730       0.0354       93,392       3.1419 

A recent briefing by Bain and Co.  gives a great top-down view of the state of social business in the enterprise today.  It makes a strong case that early adopters of social media are experiencing significant bottom line results, but we are still early in this journey.

Bain also cautioned readers about relying too much on sentiment analytics due to the infancy of the field.   We agree (note: NeedTagger uses a very different type of technology which isn’t the same as sentiment analysis.  We’ll publish a post on the differences in soon).

Bain surveyed more than 3,000 of their clients to identify what makes social media effective and whether it’s worth the growing investment many companies are making this year.

Here are some of the takeaways:

  • the average billion-dollar company spends $750,000 a year on social media
  • some leaders spent in the tens of millions per year on their social media programs
  • customers who engage with companies over social media spend 20 percent to 40 percent more money than other customers (see figure below)
  • socially-engaged customers demonstrate a deeper emotional commitment to the companies, as evidenced by an average 33 points higher Net Promoter® score (NPS®), a common measure of customer loyalty .
  • Bain believes that the greatest long-term value of social media is in “closing the data loop”, i.e., collecting data about customer engagement and then using that data to drive the business forward more intelligently.

putting-social-media-to-work-figure-01.jpg

 

Looking forward, the top-of-mind questions asked by Bain’s clients include:

  • What is the business case for investing further in social media?  Where and how much should we invest?
  • Fundamentally, how much is consumer behavior changing?  What are the biggest opportunities and threats?
  • How aggressively are my competitors investing in these tools, and are they capturing differential advantage?
  • What are the best practices in deploying social media strategies?  What are the pitfalls to avoid?
  • Should we build or buy our own “community” or partner with one of today’s leading platforms? Or both? Where should we place our bets?
  • How should we organize and coordinate our efforts? Across brands? Across business units? Across geographies?
  • How should we measure results? How do we know whether we are creating real business impact?

It’s a great read that makes a CEO-level case for investing in social media monitoring technology.

 

As most social marketing professionals know, the signal-to-noise ratio in social media streams can be awful.   For example:

  • By some estimates, over half of Twitter accounts are either ciphers or bots.
  • Content farms, auto-blogs, tweetbots and splogs continue to churn-out duplicate content at an amazing rate.
  • Over 90 percent of social media users have experienced spam.
  • The Wall Street Journal recently reported that spam makes up about 1.5% of Twitter posts and 3-4% of Facebook posts.
  • Social media is rife with user-duplicated content caused by cross-posting, retweeting and forwarding.

Yet even in the face of these facts, we continue to ingest and process data streams without filtering for quality, and then make decisions based upon the data we extract – even though we know the quality is suspect.  Should we really keep making decisions this way?

Noise is also a serious time waster.  As any social community manager will tell you, social media noise can be a real productivity issue for anyone involved in social media marketing, customer service and social media outreach.   And filtering takes time:  most social managers are not linguistics experts, so getting their filters right takes a lot of trial and error using the relatively crude keyword filters available in today’s leading social media monitoring systems (SMMS).

What’s That Noise?

Noise in social media has three main sources:

  • fake/duplicate accounts:  a large number of social media accounts are not owned or managed by a real person.  Fake accounts mess with your reach and follower/friend statistics.
  • spam: unsolicited & poorly targeted marketing messages, phishing scams and malware injection attempts.  Impacts your sentiment analytics and keyword/brand trends.
  • duplicate content:   splogs, content farms, autoblogs, tweetbots, retweeting, cross-posting, forwarding, et al.   Can seriously skew your reach and mention statistics.

Unfortunately, most leading social media analytics and social media monitoring software (SMMS) vendors do not offer effective filters to protect you from the issues above.   And you won’t find social media gurus and strategists talking about the data quality issue – probably because (a) no one has offered a bullet-proof solution; (b) it’s a boring topic;  or, (c)  this is technical problem that most marketing-trained social strategists simply aren’t trained/experienced to address.

Solutions Are On The Way

The good news is that there are many efforts underway to help businesses extract more signal from social media streams.  For example,

  1. influence scoring systems such as PeerIndex, Klout and Kred make it easier to weed-out less influential accounts from your analysis and engagement streams.  Unfortunately, they also weed-out new accounts of people who are forming their initial impression of what social media – and you – can do for them.
  2.  Twitter has made huge strides in controlling spam since they started focusing on the problem in 2008, reducing the spam rate from over 20% to less than 2% today.   They currently have a team of over ten specialists who work full time to combat spam and user security threats.   In January, 2012, they acquired Dasient, a leading real time web security company.
  3. Facebook is currently blocking over 200 million malicious actions each day.  A site integrity team of 30 engineers exists, and another team of 46 people works on site security.  Yet another team of 300 deals with user issues.
  4. consultancies and agencies with platforms have added advanced filtering capabilities to their service offerings.  Converseon and Dachis Group are good examples.
  5. intelligent real time data processing vendors such as OpenAmplify, Solariat and Datasift provide data processing services that can be configured to filter your streams – if you have a detailed understanding of the language in the domain you are analyzing.  And, you will need to get their data into a business application to use it.
  6. insight as a service” providers (credit to  for the label) have arrived on the scene, including Acteea9LensesJBara and Totango.  Data analysis SaaS providers  Host Analytics8thbridge and Dachis Group have also created insight analysis offerings to complement their existing software solutions.
  7. SMMS vendors Visible and Attensity have proven text processing and sentiment analysis baked into their offerings.  Radian6 recently added Insights, which offers a range of third party semantic processing and NLP filtering add-ons for their analytics package.
  8. need-detection services such as NeedTagger are helping non-technical marketing and customer service professionals see past the noise to find people in need for their products, services and content.   These vendors combine next-generation language filters, user-friendly applications and marketing analytics into a single business service.

Don’t Wait To Begin Filtering

With over 1 billion social media accounts attracting the interest of spammers and content farms, companies must start filtering and paying attention to the quality  of their data soon – or “social media analytics” could become a bit of a joke.   In addition, social media desks will continue to swell in size until noise can be driven out of feeds on a continuous improvement basis.

Bottom-line, noise filtering technology improves the quality of your marketing decisions and helps your social media teams stay focused on the highest-quality opportunities.

What is your company doing to reduce the noise in your social media streams?

 

NeedTagger is a simple SaaS tool for business professionals who want to filter social media streams for the needs they can meet with their existing content – right now.    Contact us if you’d like to try us out.