The Secret Code To Data Analysis.

5 Ways to Use the Instagram Close Friends List for Business

In December 2018, Instagram officially launched its new story feature, “Close Friends.” This feature allows you to create a list of “Close Friends” you can exclusively share stories with on a day to day basis.

This “Close Friends” feature was initially made for followers to share more private moments, one of many attempts Instagram is taking to make their app more about human connectivity.

Every marketer and entrepreneur knows that most, if not all, Instagram features made for pleasure can always be used for business.

So we found five ways to use Instagram close friend list for business. First, here’s a quick guide to creating your first close friend’s list.

Who Should You Add to Your “Close Friends” List?

As a business, you’ll need to be a bit more innovative with who you’ll be adding to your “close friend” list.

Instagram’s close friend’s list is an opportunity for you to start crafting your engagement and building rapport with real fans and customers. Since it’s so perfect for posting exclusive content, that means you’ll need to decide who gets access and who doesn’t.

Who should be added to your close friend list?

We did the math and found out that it all depends on your goals, and since you only have one close friend list, you’ll have to pick one goal and stick to it.

Here are some suggestions for the type of goals you should pick for your close friend’s list:

Brand ambassadors or Affiliates

Manage your brand ambassador and use your list to update, encourage, and engage with them. If you have affiliates, let them know when you’ll be having exclusive webinars or new ways for them to make more money with your brand.

Superfans or VIP customers

If your goal is to grow your fan base then create buzz about an exclusive VIP program by adding super fans or giveaway winners to your close friend list. They can receive discounts, heads up on surprise sales and more.

Employees

If your goal is focused on building employee engagement and advocacy, then use your close friend’s list to share company exclusive content and updates with your team.

Instagram Pro Tip

Still not sure who you should be adding to your close friend list?

Run a contest or giveaway with a prize and give winners access to your close friend list. Allow people who want to be added to your inner circle to come to you!

Five Ways Your Business Can Use Close Friends List

It’s time to decide what kind of content you’ll be sharing.

1. Exclusive Promotional Discounts Codes

Let’s say that you need to get a certain number of sales by the end of the month. Create a teaser for followers to join your close friends list on Instagram for exclusive discounts.

Ask followers to comment on a post in your feed and tag a friend, and when you’ve selected followers try to limit the number of people you’ll add, this will:

Give your brand more exposure as followers will be tagging friends, so you’ll be sure to see an increase in followers.

Create a sense of urgency to join. When people feel like something is scarce, their more likely to act based on their fear of missing out.

Limit the amount of discounts but still entice other people willing to buy at pay full price.

Instagram Pro Tip

Calculate how many discounts you’ll need to hand out to meet sales for the month, then hand out 20-40% times that amount. Some people may opt to use the discount code, while most might not complete a purchase.

Recommended For You Webcast, March 5th: How AI Can Find Opportunities and Shorten Your Sales CyclesRegister Now 2. Exclusive Product Launches

Share news on a product before it launches of new collections or items before they hit the store. If you’re using your close friend’s list for brand ambassadors, this could be the perfect opportunity to reach out to you about promoting your products.

Give them the first pickings so they can show off your products on social media. Not only great for marketing, but it makes your brand ambassadors feel special.

3. Exclusive Freebies & Downloads

If you’re an entrepreneur, you can add clients or followers in your community that you’d like to offer exclusive freebies.

Freebies work in your favor, It also helps followers to experience your credibility as a brand first hand.

Something as simple as an ebook is a long term investment, as your giving away a sample of your work for free to capture potential customers you’d like to upsell to in the future.

Not sure what kind of freebies to give your followers?

Here are five freebies you can give followers to grow your email list, blog traffic, and sales:

Offer access to a free e-book or the first five chapters of your book.

Offer exclusive access to free spot in your workshop or webinar.

Offer free templates such as emails, checklist, or designs. Early access to video series you’ll be hosting soon.

Offer free (15-minute) consultation services.

4. Secret Sessions/Interviews With Brand Influencers or Industry Expert

Maybe you’ve started your close friend’s list, but your engagement game isn’t as strong as you’d like it to be.

It happens to the best of us, but it’s time to start thinking outside the box.

Start pushing for content that’s going to draw your follower’s attention and make move from your story to your store.

Get your main followers jealous by host an exclusive interview or session with a famous influencer or known expert in your industry for your close friend list. For it to work, you’ll have to pick someone in high demand, a macro influencer who can draw in a big crowd.

Days before you’ll need to start promoting your live session ahead of time, to get the ball rolling use your feed.

You can also add a countdown sticker to your stories letting people know the date and time and that it’s exclusively for those who want to join your close friend list.

Be sure to ask your influencer or expert to share the secret on their own Instagram account to spread the word. The more people joining your session, the merrier.

Need some help crafting your Instagram stories? Use this guide to take your Instagram Stories to the next level with stickers.

Instagram Pro Tip

Use this secret influencer session to make sales by asking followers to make a small purchase from your store to be automatically added to your list.

5. Sneak Peek & Behind the Scenes

Everyone loves to take a peek behind the curtain; they want to see how something is made, and the synergy between your team.

Today’s marketing people want to see behind the perfected Instagram post; they want to experience real moments with a brand. According to Harvard Business Review, customers who have developed a bond with businesses are extremely valuable and are less price-sensitive, as their likely to pay 31-50%more for your products and services.

You aren’t just sharing behind the scenes content you’re growing loyalty with the quality of your products and the people behind it as well as building a relationship with life long customers.

Man Repealer is a brand on Instagram that’s known to capture the fun and witty BTS (behind the scenes) with their team and interns.

Here are some fun behind the scene ideas you can share with your close friend’s list:

A photo of you or your team at work

Photo-shoot of products or styling

Take a picture of your work space

How you source or create your products

Instagram Pro Tip

Not sure what kind of BTS to share with your followers? Ask them, use the question stick in your stories and see what followers are really interested in learning about you.

The Rundown

Once again, Instagram features have proven to help your business grow and connect deeper with followers. Here’s a quick recap on five ways your business can use close friends list:

Exclusive Promotional Discounts Codes

Exclusive Product Launches

Exclusive Freebies & Downloads

Secret Sessions/Interviews With Brand Influencers or Industry Expert

Sneak Peek & Behind the Scenes

As amazing as these content ideas for your close friends are, you’ll still need a bit of creativity to make it yours. If you’re fumbling to come up with ideas, here are 30 Instagram marketing ideas, tips, and examples to guide you along the way.

Author: Victoria Taylor

Victoria is a Marketing Generalist at Wishpond specializing in all things digital marketing and social media marketing. In love with blogging and Taco Tuesdays. Follow her on Twitter @vicknwsbest.… View full profile ›

The death of privacy

We have come to the end of privacy; our private lives, as our grandparents would have recognised them, have been winnowed away to the realm of the shameful and secret. To quote ex-tabloid hack Paul McMullan, "privacy is for paedos". Insidiously, through small concessions that only mounted up over time, we have signed away rights and privileges that other generations fought for, undermining the very cornerstones of our personalities in the process. While outposts of civilisation fight pyrrhic battles, unplugging themselves from the web – "going dark" – the rest of us have come to accept that the majority of our social, financial and even sexual interactions take place over the internet and that someone, somewhere, whether state, press or corporation, is watching.

The past few years have brought an avalanche of news about the extent to which our communications are being monitored: WikiLeaks, the phone-hacking scandal, the Snowden files. Uproar greeted revelations about Facebook's "emotional contagion" experiment (where it tweaked mathematical formulae driving the news feeds of 700,000 of its members in order to prompt different emotional responses). Cesar A Hidalgo of the Massachusetts Institute of Technology described the Facebook news feed as "like a sausage… Everyone eats it, even though nobody knows how it is made".

Sitting behind the outrage was a particularly modern form of disquiet – the knowledge that we are being manipulated, surveyed, rendered and that the intelligence behind this is artificial as well as human. Everything we do on the web, from our social media interactions to our shopping on Amazon, to our Netflix selections, is driven by complex mathematical formulae that are invisible and arcane.

Most recently, campaigners' anger has turned upon the so-called Drip (Data Retention and Investigatory Powers) bill in the UK, which will see internet and telephone companies forced to retain and store their customers' communications (and provide access to this data to police, government and up to 600 public bodies). Every week, it seems, brings a new furore over corporations – Apple, Google, Facebook – sidling into the private sphere. Often, it's unclear whether the companies act brazenly because our governments play so fast and loose with their citizens' privacy ("If you have nothing to hide, you've nothing to fear," William Hague famously intoned); or if governments see corporations feasting upon the private lives of their users and have taken this as a licence to snoop, pry, survey.

We, the public, have looked on, at first horrified, then cynical, then bored by the revelations, by the well-meaning but seemingly useless protests. But what is the personal and psychological impact of this loss of privacy? What legal protection is afforded to those wishing to defend themselves against intrusion? Is it too late to stem the tide now that scenes from science fiction have become part of the fabric of our everyday world?

Novels have long been the province of the great What If?, allowing us to see the ramifications from present events extending into the murky future. As long ago as 1921, Yevgeny Zamyatin imagined One State, the transparent society of his dystopian novel, We. For Orwell, Huxley, Bradbury, Atwood and many others, the loss of privacy was one of the establishing nightmares of the totalitarian future. Dave Eggers's 2013 novel The Circle paints a portrait of an America without privacy, where a vast, internet-based, multimedia empire surveys and controls the lives of its people, relying on strict adherence to its motto: "Secrets are lies, sharing is caring, and privacy is theft." We watch as the heroine, Mae, disintegrates under the pressure of scrutiny, finally becoming one of the faceless, obedient hordes. A contemporary (and because of this, even more chilling) account of life lived in the glare of the privacy-free internet is Nikesh Shukla's Meatspace, which charts the existence of a lonely writer whose only escape is into the shallows of the web. "The first and last thing I do every day," the book begins, "is see what strangers are saying about me."

Our age has seen an almost complete conflation of the previously separate spheres of the private and the secret. A taint of shame has crept over from the secret into the private so that anything that is kept from the public gaze is perceived as suspect. This, I think, is why defecation is so often used as an example of the private sphere. Sex and shitting were the only actions that the authorities in Zamyatin's One State permitted to take place in private, and these remain the battlegrounds of the privacy debate almost a century later. A rather prim leaked memo from a GCHQ operative monitoring Yahoo webcams notes that "a surprising number of people use webcam conversations to show intimate parts of their body to the other person".

Max Mosley: 'Your private life belongs to you. If anyone takes it from you it's theft and it's the same as theft of property.' Photograph: Tom Jenkins for the Guardian

It is to the bathroom that Max Mosley turns when we speak about his own campaign for privacy. "The need for a private life is something that is completely subjective," he tells me. "You either would mind somebody publishing a film of you doing your ablutions in the morning or you wouldn't. Personally I would and I think most people would." In 2008, Mosley's "sick Nazi orgy", as the News of the World glossed it, featured in photographs published first in the pages of the tabloid and then across the internet. Mosley's defence argued, successfully, that the romp involved nothing more than a "standard S&M prison scenario" and the former president of the FIA won £60,000 damages under Article 8 of the European Convention on Human Rights. Now he has rounded on Google and the continued presence of both photographs and allegations on websites accessed via the company's search engine. If you type "Max Mosley" into Google, the eager autocomplete presents you with "video," "case", "scandal" and "with prostitutes". Half-way down the first page of the search we find a link to a professional-looking YouTube video montage of the NotW story, with no acknowledgment that the claims were later disproved. I watch it several times. I feel a bit grubby.

"The moment the Nazi element of the case fell apart," Mosley tells me, "which it did immediately, because it was a lie, any claim for public interest also fell apart."

Here we have a clear example of the blurred lines between secrecy and privacy. Mosley believed that what he chose to do in his private life, even if it included whips and nipple-clamps, should remain just that – private. The News of the World, on the other hand, thought it had uncovered a shameful secret that, given Mosley's professional position, justified publication. There is a momentary tremor in Mosley's otherwise fluid delivery as he speaks about the sense of invasion. "Your privacy or your private life belongs to you. Some of it you may choose to make available, some of it should be made available, because it's in the public interest to make it known. The rest should be yours alone. And if anyone takes it from you, that's theft and it's the same as the theft of property."

Mosley has scored some recent successes, notably in continental Europe, where he has found a culture more suspicious of Google's sweeping powers than in Britain or, particularly, the US. Courts in France and then, interestingly, Germany, ordered Google to remove pictures of the orgy permanently, with far-reaching consequences for the company. Google is appealing against the rulings, seeing it as absurd that "providers are required to monitor even the smallest components of content they transmit or store for their users". But Mosley last week extended his action to the UK, filing a claim in the high court in London.

Mosley's willingness to continue fighting, even when he knows that it means keeping alive the image of his white, septuagenarian buttocks in the minds (if not on the computers) of the public, seems impressively principled. He has fallen victim to what is known as the Streisand Effect, where his very attempt to hide information about himself has led to its proliferation (in 2003 Barbra Streisand tried to stop people taking pictures of her Malibu home, ensuring photos were posted far and wide). Despite this, he continues to battle – both in court, in the media and by directly confronting the websites that continue to display the pictures. It is as if he is using that initial stab of shame, turning it against those who sought to humiliate him. It is noticeable that, having been accused of fetishising one dark period of German history, he uses another to attack Google. "I think, because of the Stasi," he says, "the Germans can understand that there isn't a huge difference between the state watching everything you do and Google watching everything you do. Except that, in most European countries, the state tends to be an elected body, whereas Google isn't. There's not a lot of difference between the actions of the government of East Germany and the actions of Google."

All this brings us to some fundamental questions about the role of search engines. Is Google the de facto librarian of the internet, given that it is estimated to handle 40% of all traffic? Is it something more than a librarian, since its algorithms carefully (and with increasing use of your personal data) select the sites it wants you to view? To what extent can Google be held responsible for the content it puts before us?

It isn't Mosley who has pushed European courts into giving a definitive answer to these questions, but, rather, an unknown lawyer from northwest Spain. In 2009, Mario Costeja González found that a Google search of his name brought up a 36-word document concerning a case from the late 90s in which banks threatened to seize his home. The information was factually incorrect – he'd actually paid off the debts in question. More than this, it was, he argued, irrelevant. He is now a lawyer with a successful practice and any former money worries ought not to feature on the internet record of his life. Google fought the case within Spain and then all the way to the European Court of Justice. Costeja González won, the article was taken down, the victory labelled "the right to be forgotten".

Google's response to the ruling has been swift and sweeping, with 70,000 requests to remove information processed in the weeks following the judgment. A message now appears at the bottom of every search carried out on Google, warning: "Some results may have been removed under data protection law in Europe." It seems that Google has not been judging the quality of these requests and relying on others to highlight content that has been taken down erroneously. Search results to newspaper articles on Dougie McDonald, a Scottish football referee accused of lying, were taken down. And a Robert Peston piece for the BBC about Merrill Lynch CEO Stan O'Neal's role in the 2008 financial crisis was removed; as of August 2014, it is still missing from Google searches.

To understand how much protection the law offers those wishing to defend their privacy against the triumvirate of state, press and data-harvesting corporations, I turn to one of the country's top privacy lawyers, Ruth Collard at Carter Ruck. We meet in Carter Ruck's offices near St Paul's Cathedral. I sit under the beady gaze of the firm's late founder – the target of much Private Eye bile – until Collard, more genial than I'd expected, with short, neat hair, comes to meet me. We stroll to a coffee shop and I ask her about the Costeja González case.

"I think it's a very surprising decision, I really do," she says, "but it was Google's collection of data, the arrangement and prioritisation of it that influenced the judgment, as the court found they were a 'controller' of the information."

I ask about the freedom of speech implications of the judgment – surely every politician will be applying to Google, trying to scrub away the traces of the sordid affair, the expenses scandal? She nods emphatically. "Almost immediately after the judgment, there were reports of a politician trying to clean up his past. We have been contacted by clients who have read about the judgment and are interested. So far, the ones that have contacted me, I have not thought they had a case to make. It was clear from the judgment that a balance has to be struck – between the interest of the subject in keeping information private and the interest of internet users in having access to it. The effect will be removing information that may have been once completely justified in being there, but is now outdated or irrelevant."

We go on to discuss the changing face of privacy in the internet age. She firstly notes how relatively young privacy law is in this country – only since 1998 have there been laws in place to protect privacy. Cases brought before this, such as that of 'Allo 'Allo star Gordon Kaye, who in 1990 was photographed in his hospital bed badly battered following a car crash, had to prove that a breach of confidence had taken place. Collard points me to the recent Paul Weller case, in which the Modfather sued after being pictured in the Daily Mail out walking with his 16-year-old daughter and 10-month-old twins.

"A lot of these cases are Mail cases," she says with a twinkle. "It was the fact that the photos showed the children's faces that was found to be significant by the court. Weller brought the claim on behalf of all three children and it succeeded [although the Mail is appealing]." In his judgment, Justice Dingemans argued that the children's faces represented "one of the chief attributes of their respective personalities… These were photographs showing the expressions on faces of children, on a family afternoon out with their father. Publishing photographs of the children's faces, and the range of emotions that were displayed, and identifying them by surname, was an important engagement of their Article 8 rights."

Collard tells me: "The Mail argued very hard on a number of grounds, one of which was that the children didn't have a reasonable expectation of privacy, given that the teenager a couple of years earlier had done some modelling for Teen Vogue and the babies had had photos of them tweeted by their mother. However, she had been careful when tweeting not to show their faces. This thing about facial expressions is new and it will be interesting to see where it goes."

This year has not only seen a spate of news stories surrounding the issue of privacy, but also a series of literary and artistic approaches to the subject. One of the most interesting was Privacy, a play at the Donmar Warehouse that came out of a close collaboration between director Josie Rourke and playwright James Graham. It was a groundbreaking production, from the moment the audience were urged to leave their telephones on as they entered the theatre. Over the course of the play, by turn intrusively intimate and angrily political, the audience (and their smartphones) found themselves at the centre of an attempt to chart the status of contemporary privacy – and the seemingly muted public reaction to its loss.

Within minutes of entering the theatre, I was squirming in my seat. An actor on stage was publicly analysing the results of a study designed by the psychometrics department of Cambridge University that used my Facebook profile to reveal my innermost secrets, questioning almost every aspect of my personality, from my political views to my sexuality. It was terrifying and compelling all at once. As the play ratcheted up to a coruscating finale, we, the audience, were made to see the enormous value of the rights we'd handed over as the mere cost of life in the 21st century (who, we were asked, had read the iTunes privacy policy? It's word for word as long as The Tempest).

Privacy director Josie Rourke and playwright James Graham: 'A lot of normal people, even if they did have the knowledge of how much they're giving away, would just go: that's fine, because I'm not interesting enough to be spied on.' Photograph: Linda Nylind for the Guardian

I meet Rourke and Graham in Donmar's offices on Dryden Street in Covent Garden. We discuss the play, how charged tremors ran around the audience after each subsequent revelation and, above all, the sense of outrage that propels the narrative as it turns its fire on Facebook, Apple and finally on the American government. Are they worried, I ask, that there seems to be so little anger on behalf of the public? That it is beginning to feel (to me at least) as if we have given up our right to a private life with barely a whimper? Graham nods. "I think a lot of normal people, even if they did have the knowledge of how much they're publishing and how much they're giving away, and how much a corporate or government agency might know about them, would just go, 'That's fine, because I'm not interesting enough to be spied on.'"

There were moments in the play that were profoundly uncomfortable: for the audience and, I would imagine, for Rourke and Graham. Both director and playwright featured on stage and we delved deeply into their private lives, into their secret selves (and into our own). I ask about the dynamic between the two of them during the writing of the play and where they decided to draw the line between probing and invasive.

"We thought it through," Graham answers. "From the very beginning, we decided that we never wanted any of these interactions to be exposing or judgmental. We wanted them to be kind and benevolent and ask people to examine their data streams."

Rourke nods in agreement. "In order to research the show, we kept pushing at the boundaries of each other's privacy, really to see what that felt like. I don't want to make it sound like a piece of performance art, but we kept asking ourselves – oh look, we've found out that it's possible to do this. Are we prepared to do that to each other? The distinction between secret and private has been the guiding philosophical principle. We're not looking to get your secrets. We're asking you to test this thing which is less tangible and less transactable, which is your privacy."

A few days after our meeting, Rourke puts me in touch with Michal Kosinksi, the Cambridge academic who, with David Stillwell, has designed youarewhatyoulike.com, the psychometric algorithm that produces from your Facebook "likes" a map of your soul. I think of Ruskin, who in 1864 said: "Tell me what you like and I'll tell you what you are." I also think of Orwell's thought police and Philip K Dick's The Minority Report, where criminals are identified and arrested before they commit crimes. This is one of the problems about advances in technology: we are preconditioned to view them through a dystopian lens, with Orwell, Ballard, Burgess and others staring over our shoulders, marvelling, but fearful.

While I wait for my results, I ask Kosinski whether the potential for misuse worries him. "Most technologies have their bright and dark side," he replies, buoyantly. "My personal opinion is that a machine's ability to better understand us would lead to improved consumer experience, products, etc… But imagine that we published a clone of youarewhatyoulike.com that simply predicted which of your friends was gay (or Christian or liberal or HIV-positive, etc); lynches are not unlikely to follow…"

I'm left baffled by my results. According to the algorithm, I'm 26 (I'm actually 35). The program is unsure whether I'm male or female (I'm male). I'm borderline gay or, as Kosinski puts it in his analysis: "You are not yet gay, but very close." I'm most likely to be single, extremely unlikely to be married (I'm not sure what my wife will say to all this). The algorithm correctly predicts my professional life (art, journalism, psychology) and my politics (liberal) but claims that I exhibit low neuroticism. It should sit next to me on a turbulent flight. I realise that I'm viewing the results of a kind of double of myself, the public persona I present through social media (and over which I presume some sort of control), nothing like the real me. For that, I need a psychiatrist.

Alongside Rourke, Graham, William Hague and director of Liberty, Shami Chakrabarti, one of the real-world figures represented in the play Privacy is academic and psychoanalyst Josh Cohen. Cohen's book, The Private Life (2013), is an intelligent and highly literary exploration of the changing nature of privacy in the age of Facebook and Celebrity Big Brother. Skipping from Katie Price to Freud to Booker-winning author Lydia Davis, Cohen paints a convincing picture of a culture fighting a desperate psychological battle over the private self. He argues that both our ravenous hunger for celebrity gossip and the relentless attempts by the wealthy to protect their privacy have recast the private life as "a source of shame and disgust". The tabloid exposé and the superinjunction both "tacitly accede to the reduction of private life to the dirty secrets hidden behind the door".

Psychoanalyst Josh Cohen: 'Privacy, precisely because it ensures that we are never fully known to others, provides a shelter for imaginative freedom, curiosity and self-reflection.' Photograph: readmesomethingyoulove.com

And yet what neither the press nor the lawyers recognise when they treat privacy as they would secrecy – as something that can be revealed, possessed, passed on – is that the truly private has a habit of staying that way. Cohen argues that the private self is by definition unknowable, what George Eliot calls "the unmapped country within us". In an email conversation, Cohen gives me a condensation of this thesis: "When we seek to intrude on the other's privacy, whether with a telephoto lens, a hacking device or our own two eyes, we're gripped by the fantasy of seeing their most concealed, invisible self. But the frustration and disappointment is that we only ever get a photograph of the other, an image of their visible self – a mere shadow of the true substance we really wanted to see. The most private self is like the photographic negative that's erased when exposed to the light."

There is something strangely uplifting in this idea – that no matter how deep they delve, the organs of surveillance will never know my true self, for it is hidden even from me.

I ask Cohen about the differences between our "real" selves and those we project online. I think of the younger, gayer, less neurotic incarnation of myself that appears on Facebook. "I agree that the online persona has become a kind of double," he says. "But where in Dostoevsky or Poe the protagonist experiences his double as a terrifying embodiment of his own otherness (and especially his own voraciousness and destructiveness), we barely notice the difference between ourselves and our online double. I think most users of social media and YouTube would simply see themselves as creating a partial, perhaps preferred version of themselves."

I remember something James Graham had said to me: that early on in the writing of Privacy, he'd sat at Josie Rourke's kitchen table and read his tweets out loud to her. "I had to stop it was so awful… Just seeing the person there, listening as I said what I thought about my food, about politics, about what was on the television, it felt very exposing, like I'd sold myself badly."

This is the horror of social media – that it gives us the impression we are in control of our virtual identities, putting out messages that chime with our "real" selves (or some idealised version of them). In fact, there is always slippage and leakage, the subconscious asserting its obscure power. The internet can, as Cohen tells me, "provide a way of exploring and playing the multiplicity and complexity of the self". It can also prove to us just how little control we have over how we appear. As William Boyd put it in Brazzaville Beach: "The last thing we discover in life is our effect."

There is, of course, a flipside to the dystopian view of profit-hungry corporations and totalitarian governments relentlessly reaping our private selves. Josh Cohen describes the lifelogging movement as bearing "an overriding tone of utopian enthusiasm". Lifelogging involves the minute-by-minute transmission of data about one's life, whether by photographs, web-journals or the sort of Quantified Self technologies – wearable watches, data-gathering smartphone apps – developed by German firm Datarella (and many others, Google Glass not least among them). An early proponent of lifelogging was conceptual artist Alberto Frigo, who in 2003 decided to record every object he would hold with his right hand for the next 36 years. He put the pictures on his website, 2004-2040.com.

Frigo's project started with photographs but has developed into a labyrinthine mapping of his thoughts and dreams, the music he is listening to and the world around him. The website is now a wormhole, a place in which it is possible to lose yourself in the beautiful but useless ephemera of a single existence. Frigo tells me that his aim is to create for a future audience "some kind of a Rosetta Stone of this time, where different aspects of a person's life, recorded with different media, can be compared and interpreted."

2004-2040.com has come to dominate Frigo's life. He has insisted on radical honesty, writing down detailed dreams, a sophisticated map of his mental state. For his wife, this was problematic. "She did not accept that I dreamed of other girls," he tells me. "This slowly led to our separation." It also led to a hiatus in Frigo's lifelogging, when he carried on in private, recording but not publishing his material. "My ex-wife did not wish me to publish my dreams and I did not feel like presenting my project without one part. After the divorce, it took me sometime before I went online again."

Josh Cohen told me about the psychic risks of lifelogging. For some, he said, "shadowing your transient, irretrievable life is a permanent digital life, and the really frightening spectre here is that the digital recording becomes more 'real', more authoritative than your memory." I asked Frigo about this. How would he look back on it once it had all finished? How will he feel to have this stunning, often baffling pictorial record of his life? "At the end of the project," he says, "I think I would feel that it is time to retire. I would love to go back to the Italian alps, when I am originally from, and have not so much to do with technology anymore, just have a little garden and look back, write a book perhaps, but mostly take care of a little land which I unfortunately never had the possibility to own as my father went off to Canada when I was little… I might well end up in a hospital connected to a machine for the rest of my life, who knows? Perhaps, in 2040 I might not reach my 60th birthday and the 1,000,000 photos of all the objects my right hand has used, the DNA code of my life, will remain incomplete, and no one will care about them or me."

Your privacy has a value. There are even companies such as RapLeaf.com that will tell you what your personal information is worth. The basic facts? Very little. More detailed information – for example, you own a smartphone, are trying to lose weight or planning a baby – are worth much more. Big life changes – marriage, moving home, divorce – bring with them fundamental changes in our buying patterns as we seek, through the brands with which we associate ourselves, to recast the narratives of our lives. Through analysis of buying patterns, US retailer Target predicted one of its customers was pregnant (and sent her coupons for maternity wear) before the teenager had broken the news to her disapproving parents.

Perhaps the reason people don't seem to mind that so much of their information is leaking from the private to the public sphere is not, as some would have it, that we are blind and docile, unable to see the complex web of commercial interests that surround us. Maybe it's that we understand very clearly the transaction. The internet is free and we wish to keep it that way, so corporations have worked out how to make money out of something we are willing to give them in return – our privacy. We have traded our privacy for the wealth of information the web delivers to us, the convenience of online shopping, the global village of social media.

Let me take you back to August 2006, the Chesterfield Hotel, Mayfair. My little brother (Preston from the Ordinary Boys) was marrying a girl he'd met on the telly (Chantelle Houghton). Celebrity Big Brother was still in its Channel 4 pomp, attracting 6 million viewers and upwards, the focus of water-cooler debate and gossip mag intrigue. The wedding was, briefly, an event. I remember a moment between the ceremony and the reception when we were queuing up in our gladrags to have our pictures taken for the OK! magazine spread. I felt a sudden, instinctive lurch – the thought of my phiz besmirching every hairdresser's salon and dentist's waiting room. I wasn't on Facebook, Twitter hadn't been invented, Friends Reunited? No thanks. I ran – to the bemusement of my family and the photographer.

Now, though, I post pictures of my breakfast on Instagram, unspool my soul in 140-character soundbites on Twitter, allow – even encourage – Facebook and Google and Apple to track my every move through the smartphone that has become less a piece of technology and more an extension of (and often a replacement for) my brain. I write articles on subjects I'd previously kept secret from my nearest and dearest. I let a Sunday newspaper take a (relatively tasteful) picture of me and my children when I was promoting my last novel. We have all – to a greater or lesser extent – made this same transaction and made it willingly (although my children didn't have much say in the matter).

We weren't private creatures in centuries past, either. In a 1968 talk on privacy in the electronic age, sociologist Marshall McLuhan argued that it was the coming of a new technology – books – and the "closed-off architecture" needed to read and study that had forged the sense of the private self. It may be that another new technology – the internet – is radically altering our sense of what (if anything) should remain private. We live in a liberal democracy, but, with recent lurches to the right, here and abroad, you don't need to be Philip K Dick to imagine the information you gave up so glibly being used against you by a Farage-led dictatorship.

More immediately, there is the normalising effect of surveillance. There is a barrier or check on our behaviour when we know we are being watched: deviancy needs privacy. This was the thinking behind Jeremy Bentham's Panopticon, a model for a jail where a single watching guard could survey a whole prison of inmates (the model, by the way, for Zamyatin's One State). Soon, it didn't matter whether the guard was on duty or not, the mere possibility of surveillance was enough to ensure compliance. This is where we find ourselves now, under surveillance that may seem benign enough but which nonetheless asserts a dark, controlling power over us, the watched.

The message seems to be that if you really want to keep something private, treat it as a secret, and in the age of algorithmic analysis and big data, perhaps best to follow Winston Smith's bitter lesson from Nineteen Eighty-Four: "If you want to keep a secret, you must also hide it from yourself."

Here lies our greatest risk, one insufficiently appreciated by those who so blithely accept the tentacles of corporation, press and state insinuating their way into the private sphere. As Don DeLillo says in Point Omega: "You need to know things the others don't know. It's what no one knows about you that allows you to know yourself." By denying ourselves access to our own inner worlds, we are stopping up the well of our imagination, that which raises us above the drudge and grind of mere survival, that which makes us human.

I asked Josh Cohen why we needed private lives. His answer was a rallying cry and a warning. "Privacy," he said, "precisely because it ensures we're never fully known to others or to ourselves, provides a shelter for imaginative freedom, curiosity and self-reflection. So to defend the private self is to defend the very possibility of creative and meaningful life."

Alex Preston's most recent novel is In Love and War, published by Faber at £14.99. To buy a copy for £11.99 with free UK p&p call 0330 333 6846 or go to guardianbookshop.co.uk

Text Classification is Your New Secret Weapon

Natural Language Processing is Fun! Part 2

This article is part of an on-going series on NLP: Part 1, Part 2, Part 3, Part 4. You can also read a reader-translated version of this article in 普通话.

Giant update: I’ve written a new book based on these articles! It not only expands and updates all my articles, but it has tons of brand new content and lots of hands-on coding projects. Check it out now!

In this series, we are learning how to write programs that can understand text written by humans. In Part 1, we built a Natural Language Processing pipeline where we processed English text by methodically parsing out grammar and structure.

This time, we are going to learn about text classification — the secret weapon that NLP developers use to build cutting edge systems with relatively dumb models.

The kind of results you can get with text classification compared to the development effort is off the charts. Let’s go!

The NLP pipeline that we set up in Part 1 processes text in a top down way. First we split text into sentences, then we break sentences down into nouns and verbs, then we figure out the relationships between those words, and so on. It’s a very logical approach and logic just feels right, but logic isn’t necessarily the best way to go about extracting data from text.

A lot of user-created content is messy, unstructured and, some might even say, nonsensical:

Extracting data from messy text by analyzing it’s grammatical structure is very challenging because the text doesn’t follow normal grammatical rules. We can get often get better results using dumber models that work from the bottom up. Instead of analyzing sentence structure and grammar, we’ll just look for statistical patterns in word use.

Let’s look at user reviews, one of the most common types of online data that you might want to parse with a computer. Here is one of my real Yelp reviews for a public park:

From the screenshot, you can see that I gave the park a 5-star review. But if I had posted this review without a star rating, you would still automatically understand that I liked the park from how I described it.

How can we write a program that can read this text and understand that I liked the park even though I never directly said “I like this park” in the text? The trick is to reframe this complex language understanding task as a simple classification problem.

Let’s set up a simple linear classifier that takes in words. The input to the classifier is the text of the review. The output is one of 5 fixed labels — “1 star”, “2 stars”, “3 stars”, “4 stars”, or “5 stars”.

If the classifier was able to take in the text and reliably predict the correct label, that means it must somehow understand the text enough to extract the overall meaning of whether or not I liked the the park. Of course the model’s level of “understanding” is just that it churns some data through a statistical model and gets a most likely answer. It’s not similar to human intelligence. But if the end result is the same most of the time, then it doesn’t really matter.

To train our text classification model, we’ll collect a lot of user reviews of similar places (parks, businesses, landmarks, hotels, whatever we can find…) where the user wrote a text review and assigned a similar star rating. And by lots, I mean millions of reviews! Then we’ll train the model to predict a star rating based on the corresponding text.

Once the model is trained, we can use it to make predictions for new text. Just pass in a new piece of text and get back a score:

With this simplistic model, we can do all kinds of useful things. For example, we could start a company that analyzes social media trends. Companies would hire us to track how their brand is perceived online and to alert them of negative trends in perception.

To build that, we’d just scan for any tweets that mentioned our customer’s business. Then we’d feed all those tweets into the text classification model to predict if each user likes or dislikes the business. Once we have numerical ratings representing each user’s feelings, we could track changes of average score over time. We could even automatically trigger an action whenever someone posts something very negative about the business. Free start-up idea!

On it’s face, using text classification to understand text sounds like magical thinking. With a traditional NLP pipeline, we have to do a lot of work to understand the grammatical structure of text. With a classifier, we’re just throwing huge buckets of text into a wood chipper and hoping for the best. Isn’t human expression more nuanced and complex than that? This is the kind of over-hyping and over simplification that makes machine learning look bad, right?

There’s several reasons why treating text as a classification problem instead of as an understanding problem tends to work really well — even when using relatively simple linear classification models.

First, people constantly create and evolve language. Especially in an online word full of memes and emoji, writing code to reliably parse tweets and user reviews is going to be pretty difficult.

With text classification, the algorithm doesn’t care whether the user wrote standard English, an emoji, or a reference to Goku. The algorithm is looking for statistical relationships between input phrases and outputs. If writing ಠ_ಠ correlates more heavily with 1-star and 2-star reviews, the algorithm will pick that up even though it has no idea what a “look of disapproval” emoticon is. The classifier can still figure out what characters mean in the context of where they appear and how often they contribute to a particular output.

Second, website users don’t always write in the specific language that you expect. An NLP pipeline trained to handle American English is going to fall apart if you give it German text. It’s also going to do poorly if your user decides to write their reviews with Cockney Rhyming Slang — which is still technically English.

Again, a classification algorithm doesn’t care what language the text is in as long as it can at least break apart the text into separate words and measure the effects of those words. As long as you give the classifier enough training data to cover a wide range of possible English and German user reviews, it will learn to handle both just fine.

And finally, a big reason that text classification is so great is because it is fast. Because linear text classification algorithms are so simple (compared to more complex machine learning models like recurrent neural networks), they can be trained quickly. You can train a linear classifier with gigabytes of text in minutes on a regular laptop. You don’t even need any fancy hardware like a GPU. So even if you can get a slightly better accuracy score with a different machine learning algorithm, sometimes the tradeoff isn’t worth it. And research has shown that often the accuracy gap is nearly zero anyway.

While text classification models are simple to set up, that’s not to say they are always easy to get working well. The big catch is that you need a lot of training data. If you don’t have enough training data to cover the wide range of the ways that people write things, the model won’t ever be very accurate. The more training data you can collect, the better the model will perform. The real art of applying text classification well is in finding clever ways of automatically collecting or creating training data.

We’ve seen that we can use text classification to automatically score a user’s review text. That’s a type of sentiment analysis. Sentiment analysis is where you look at text that a user wrote and you try to figure out if the user is feeling positive or negative.

There’s lots of other practical uses of text classification. One that you probably use every day as a consumer without knowing it is the email spam filtering feature built into your email service. If you have a group of real emails marked as “spam” or “not spam”, you can use those to train a classification model that automatically flags spam emails in the future:

Along the lines of spam filtering, you can also use text classification to identify abusive or obscene content and flag it. A lot of websites use text classification as a first-line defense against abusive users. By also taking the model’s confidence score into consideration, you can automatically block the worst offenders while sending the less certain cases to a human moderator to evaluate.

You can expand the idea of filtering beyond spam and abuse. More and more companies use use of text classification to route support tickets. The goal is to parse support questions from users and route them to the right team based on the kind of issue that the user is most likely reporting:

By using classification to automate the busy work of triaging support tickets, the team is freed up to spend more time actually answering questions.

Text classification models can also be used to categorize pretty much anything. You can assume that any time you post on Facebook, behind the scenes it is classifying your post into categories like “family-related” or “related to a scheduled event”:

That not only helps Facebook know which content to show to which users, but it also lets them track the topics that you are most interested in for advertising purposes.

Classification is also useful for sorting and labeling documents. Imagine that your company has done thousands of consulting projects for clients but that your boss wants them all re-organized according to a new government-mandated project coding system. Instead of reading through every project’s summary document and trying to decide which project code is the best match, you could classify a random sampling of them by hand and then build a classification model to automatically code the remaining ones:

These are just a few ideas. The uses of text classification are endless. You just have to figure out a way to reframe the problem so that the information you are trying to extract from the text can be mapped into a set of discrete output classes.

You can even build systems where one classification model feeds into another classification model. Imagine a user support system where the first classifier guesses the user’s language (English or German), the second classifier guesses which team is best suited to handle their request and a third classifier guesses whether or not the user is already upset to choose a ticket priority code. You can get as complex as you want!

Now that you are convinced of the awesomeness of dumb text classification models, let’s learn exactly how to build them!

My favorite tool for building text classification models is Facebook’s fastText. It’s open source and and you can run it as a command line tool or call it from Python. There great alternatives like Vowpal Wabbit that also work well and are more flexible, but I find fastText easier to use.

You can install fastText by following these instructions.

Step 1: Download Training Data

To build a user review model, we need training data. Luckily, Yelp provides a research dataset of 4.7 million user reviews. You can download it here (but keep in mind that you can’t use this data to build commercial applications).

When you download the data, you’ll get a 4 gigabyte json file called reviews.json. Each line in the file is a json object with data like this:

{"review_id": "abc123","user_id": "xyy123","business_id": "1234", "stars": 5,"date":" 2015-01-01", "text": "This restaurant is great!","useful":0,"funny":0,"cool":0} Step 2: Format and Pre-process Training Data

The first step is to convert this file into the format that fastText expects.

fastText requires a text file with each piece of text on a line by itself. The beginning of each line needs to have a special prefix of __label__YOURLABEL that assigns the label to that piece of text.

In other words, our restaurant review data needs to be reformatted like this:

__label__5 This restaurant is great!__label__1 This restaurant is terrible :'(

Here’s a simple piece of Python code that will read the reviews.json file and write out a text file in fastText format:

Running this creates a new file called fasttext_dataset.txt that we can feed into fastText for training. We aren’t done yet, though. We still need to do some additional pre-processing.

fastText is totally oblivious to any English language conventions (or the conventions of any other language). As far is it knows, the words Hello, hello and hello! are all totally different words because they aren’t exactly the same characters. To fix this, we want to do a quick pass through our text to convert everything to lowercase and to put spaces before punctuation marks. This is called text normalization and it makes it a lot easier for fastText to pick up on statistical patterns in the data.

This means that the textThis restaurant is great! should becomethis restaurant is great !.

Here’s a simple Python function that we can add to our code to do that:

Don’t worry, there’s a final version of the code below that includes this function.

Step 3: Split the data into a Training set and a Test set

To get an accurate measure of how well our model performs, we need to test it’s ability to classify text using text that it didn’t see during training. If we test it against the training data, it is like giving it an open book test where it can memorize the answers.

So we need to extract some of the strings from the training data set and keep them in separate test data file. Then we can test the trained model’s performance with that held-back data to get a real-world measure of how well the model performs.

Here’s a final version of our data parsing code that reads the Yelp dataset, removes any string formatting and writes out separate training and test files. It randomly splits out 90% of the data as test data and 10% as test data:

Run that and you’ll have two files,fasttext_dataset_training.txt and fasttext_dataset_test.txt. Now we are ready to train!

Here’s one more tip though: To make your model robust, you will also want to randomize the order of lines in each data file so that the order of the training data doesn’t influence the training process. That’s not absolutely required in this case since the data from Yelp is already pretty random, but it’s definitely worth doing when using your own data.

Step 4: Train the Model

You can train a classifier using the fastText command line tool. You just call fasttext, pass in the supervised keyword to tell it train a supervised classification model, and then give it the training file and and an output name for the model:

fasttext supervised -input fasttext_dataset_training.txt -output reviews_model

It only took 3 minutes to train this model with 580 million words on my laptop. Not bad!

Step 5: Test the Model

Let’s see how accurate the model is by checking it against our test data:

fasttext test reviews_model.bin fasttext_dataset_test.txtN 474292P@1 0.678R@1 0.678

This means that across 474,292 examples, it guessed the user’s exact star rating 67.8% of the time. Not a bad start.

You can also ask fastText to check how often the correct star rating was in one of it’s Top 2 predictions (i.e. if the model’s top two most likely guesses were “5”, “4” and the real user said “4”):

fasttext test reviews_model.bin fasttext_dataset_test.txt 2N 474292P@2 0.456R@2 0.912

That means that 91.2% of the time, it recalled the user’s star rating if we check its two best guesses. That’s a good indication that the model is not far off in most cases.

You can also try out the model interactively by running the fasttext predict command and then typing in your own reviews. When you hit enter, it will tell you its prediction for each one:

fasttext predict reviews_model.bin -this is a terrible restaurant . i hate it so much .__label__1this is a very good restaurant .__label__4this is the best restaurant i have ever tried .__label__5

Important: You have to type in your reviews in all lower case and with spaced our punctuation just like the training data! If you don’t format your examples the same way as the training data, the model will do very poorly.

Step 6: Iterate on the model to make it more accurate

With the default training settings, fastText tracks each word independently and doesn’t care at all about word order. But when you have a large training data set, you can ask it to take the order of words into consideration by using the wordNgrams parameter. That will make it track groups of words instead of just individual words.

For a data set of millions of words, tracking two word pairs (also called bigrams) instead of single words is a good starting point for improving the model.

Let’s train a new model with the -wordNgrams 2 parameter and see how it performs:

fasttext supervised -input fasttext_dataset_training.txt -output reviews_model_ngrams -wordNgrams 2

This will make training take a bit longer and it will make the model file much larger (since there is now an entry for every two-word pair in the data), but it can be worth it if it gives us higher accuracy.

Once the training completes, you can re-run the test command the same way as before:

fasttext test reviews_model_ngrams.bin fasttext_dataset_test.txt

For me, using -wordNgrams 2 got me to 71.2% accuracy on the test set, an improvement of nearly 4%. It also seems to reduce the number of obvious errors that the model makes because now it cares a little bit about the context of each word.

There are other ways to improve your model, too. One of the simplest but most effective ways is skim your training data file by hand and make sure that the preprocessing code is formatting your text in a sane way.

For example, my sample text pre-processing code will turn the common restaurant nameP.F. Chang into p . f . chang. That appears as five separate words to fastText.

If you have cases like that where important words that represent a single concept are getting split up, you can write custom code to fix it. In this case, you might add code to look for common restaurant names and replace them with placeholders like p_f_chang so that fastText sees each as a single word.

Step 7: Use your model in your program!

The best part about fastText is that it’s easy to call a trained model from any Python program.

There are a few different Python wrappers for fastText that you can use, but I like the official one created by Facebook. You can install it by following these directions.

With that installed, here’s the entire code to load the model and use it to automatically score user reviews:

And here’s what it looks like when it runs:

☆☆☆☆☆ (100% confidence)This restaurant literally changed my life. This is the best food I've ever eaten!☆ (88% confidence)I hate this place so much. They were mean to me.☆☆☆ (64% confidence)I don't know. It was ok, I guess. Not really sure what to say.

Those are really good prediction results! And let’s see what prediction it would give my Yelp review:

☆☆☆☆☆ (58% confidence)This used to be a giant parking lot where government employees that worked in the country building would park. They moved all the parking underground and built an awesome park here instead. It's literally the reverse of the Joni Mitchell song.

Perfect!

This is why machine learning is so cool. Once we figured out a good way to pose the problem, the algorithm did all the hard work of extracting meaning from the training data. You can then call that model from your code with just a couple of lines of code. And just like that, your program seemingly gains superpowers.

Now go out and build you own text classifier!

The unique way of learning and buying

Search This Blog