Widely Viewed Content Report, 2022 Q3

By Integrity Institute chief research officer and co-founder Jeff Allen

Summary

  • Reporting from the WSJ on internal discussions at Facebook show internally, Facebook considers the top content lists this quarter to have “no low quality content”
  • Contrary to that, we see little change in the number of posts that fail basic media literacy checks
    • Content from anonymous accounts, content taken from other sources, and content that was spammed across a network of Facebook accounts, Groups, and Pages stayed about the same
  • The 20th most viewed link likely violates Facebook’s “Traffic Diversion” policy
    • The URL uses a technique called “cloaking” to show a minimal article with no ads except for users that come directly from Facebook
    • Users from Facebook are  redirected to a new domain with heavy ad load
    • The URL was shared across several distinct Page networks on Facebook
    • This meets the general pattern of traffic arbitrage operations
      • Domains paying Facebook Pages to link to their articles
  • The large Group network behind most of the inauthentic content from the past year did not appear on the list this quarter, but it is rebuilding itself after Facebook's actions
    • The “12am Thoughts” Group network was behind traffic to 3 domains flagged as inauthentic by Facebook
    • The Groups are still up and running and appear to be rebuilding their operation
    • We caught it compromising new accounts
    • The operation uses bought or hacked accounts to share links into Groups that are administered by authentic, foreign user accounts
  • Insider took 10 of the top 20 link spots
    • Insider uses a network of 20 Pages to spam links on Facebook
    • None of the articles from Insider reflected original reporting
      • Summary of TikTok and YouTube videos that went viral
      • Summaries of original reporting from other outlets

As always, you can see our methodology here, our dashboard here, and the data used in this analysis here.

Analysis of WVCR 2022-Q3

The latest Widely Viewed Content Report was released with a bit of fanfare. The Wall St. Journal published an article detailing the internal efforts to improve the “quality” of the top content in the list, and how by internal measures, the top content lists are now entirely free of “low quality” content.
While this is in general good, and the teams working internally to improve the quality should feel good about their work, we do not see any significant change in the quality of content that made it into the top links and top posts lists. The majority of content there continues to fail basic media literacy checks. A dip in unoriginal content is compensated by a rise in content using spam networks. And we’ve found content that might violate Facebook’s policies that Facebook may have overlooked.
 
The count and percentage of the Top 20 Links and Top 20 Posts that passed or failed our media literacy checks.
 
 
The count and percentage of the specific failures from our media literacy checks.
 
Facebook needs to be more transparent about what they deem ”low quality” and “high quality”. For example, Google Search publicly publishes their “Search Quality Evaluator Guidelines” that detail how they evaluate the quality of internet content. And in fact, the content assessments we have been using for the past year and a half now were directly inspired by Google’s.
In addition, reporting in the Wall St. Journal highlighted that while the top 20 pieces of content no longer had “low quality” content, the top 100 and top 500 did. Which again stresses how important it is for Facebook to make the Widely Viewed Content Report into a real piece of transparency, by increasing the number of items in the top list (let’s talk thousands not dozens), release it for more countries, and ideally release it more frequently as well. We won’t know what the top content was during the US midterms until around February 15th or so. Not great.
  • Checkout our WVCR tracking dashboard here.
  • Checkout the full dataset here, and the 2022-Q3 data specifically here.
  • And for full details on our methodology, you can find our original write up here.
And with all that out of the way, let’s look in detail at the content!

Possible Traffic Arbitrage Operation

The 20th most viewed link in the top list is "https://nerdyfun.netlify.app/.netlify/functions/post/9664". This is a pretty weird link to make it to the top 20! If you go to netlify.app, you’ll see it’s a web development tool and service that allows you to string together APIs and automate web tasks. Their website says they are used by Unilever, Verizon, and Mattel. So what the heck is a niche, technical web tool doing in the top 20? Well, the “nerdyfun” part of nerdyfun.netlify.app means that we are dealing with a Netlify user, “Nerdyfun”. And what they appear to be using Netlify for is to run a “cloaking” operation.
When you copy and paste "https://nerdyfun.netlify.app/.netlify/functions/post/9664" into your browser, you will see a very basic article page, a vertical scroll image gallery of 20 images, with no ads (Quick note: this article was likely copied from Bored Panda). However, if you visit this url coming from Facebook – if you click on the link from a post like this one https://www.facebook.com/108972587363340/posts/611194110474516 – then you will get a totally different experience. Instead of loading up the basic article, it redirects you to "Pupperish", https://www.pupperish.com/glow-down-challenge, which has a very high ad load. In between each image, there are 1-4 ads, which means that by the time you’ve scrolled through the whole article, you’ve seen maybe 40 ads or more. This technique is called “cloaking” or “masking”. The website looks one way for general traffic, and a totally different way for specific and targeted traffic.
If you know Python, you can see this for yourself.
To see the basic minimalistic article page with no ads run:
import requests
r = requests.get(
    'https://nerdyfun.netlify.app/.netlify/functions/post/9664'
)
print(r.content)
To see the ad heavy version, run:
import requests
r = requests.get(
    'https://nerdyfun.netlify.app/.netlify/functions/post/9664',
    headers={"referer": "https://l.facebook.com"} # This tells nerdyfun “I just came from Facebook”
)
print(r.content)
 
 
If a user sees 40 ads after looking at the image gallery, then at fairly typical ad rate for US audiences of $5 CPM (Meaning, $5 for every 1000 ad impressions), Pupperish could end up making about 20 cents. Which is a lot to be making per user, and means that they could probably pay the Facebook Pages that link to it maybe 5-10 cents per click. Which is a great return for the people running the Page networks that boosted it. Even if the revenue is 10% of that, the operation still works and will be profitable.
There were several, very likely distinct, Page networks that were sharing this link. One network was run out of the Netherlands and Belgium. One network was run out of Australia and Germany. Other networks were run out of Malaysia and Nepal and Iraq. And there were others as well. The fact that the networks were run from such different locations, with no clear central hub, suggests that these are probably distinct networks.
 
 
When you combine all this together – a link that is using cloaking to hide a crazy high ad load, a set of distinct Page networks all driving traffic to it with distinct UTM codes, and a likely copied article that previously went viral – it strongly suggests a traffic arbitrage operation. It’s impossible to validate this without internal Facebook data, but if we were on the inside, we would definitely be flagging this to the integrity teams that cover Pages and external traffic for a deeper investigation, since it would be a violation of the “Traffic Diversion” policy for Facebook Pages.

What happened to all the violating content?

In the previous report, for Q2 2022, the amount of violating and misinfo content in the top 20 content lists hit 22.5%. In the current report, it’s at 0, at least when you don’t include the above link. Which would suggest they’ve done a lot to prevent violating content from reaching large audiences. Our guess though is that the story is simpler. And the tl;dr hypothesis is: A network of Groups that was used to spread links shared by hacked accounts has been cleaned up. But the actors behind the network are already rebuilding it, and we wouldn’t be surprised to see links from this network hit the top lists again soon. 
Almost all of the inauthentic behavior that has appeared in the WVCR for the past year has actually traced back to a single network of Groups. In the 2022-Q2 report, Facebook flagged the following domains for inauthentic behavior: shwehadthin.xyz, nayenews24.info, lainute.com. All three of these domains got their traffic from the same network of Groups, the “12am Thoughts” network. When you look up where the domains were shared on Facebook, like we did and stored here, here, and here, you see the same Groups pop up: “12am Thoughts”, “3am Thoughts”, “King & Queen”, “Our Relationships”, and a handful of others. (Important note: for the shwehadthin.xyz, you also see a lot of Myanmar based Groups were sharing it in 2020, which is extremely alarming and is the reason why we didn’t write about this network at the time. Inauthentic behavior linked to Myanmar should always set off alarm bells). The Groups do not hide that they are networked. The admins, which look like authentic user accounts, are the same across the Groups.
 
 
Most of the posts shared into these Groups fit the profile of typical audience building spam: stolen memes with a link to another Group or Page in the network. You can see the same accounts, which again appear to be authentic user accounts, posting across the network.
 
 
Most of their posts don’t have links, however! And it’s now a bit tricky to find posts with links in between all the meme spam, presumably since Facebook teams have cleaned out all the inauthentic link sharing. If we dig through it though, we can find examples like this (https://www.facebook.com/groups/910323979476943/posts/1409342509575085/).
 
 
So they seem to share link posts from various people and Pages into the Group. If you look closely at the four example posts, they also followed this pattern of sharing a Page post into the Group, and they are currently all sharing the same Page, “Jaee The Artist”. So, why is “Jaee The Artist” making all these posts that members are so aggressively sharing into the “12am Thoughts” network? Because they’ve compromised it! You can see partners / friends of “Jaee The Artist” complaining about how the account has been stolen!
 
 
Since our investigation, the "Jaee The Artist" Page has been deleted. Possibly by Facebook, having identified it as now being inauthentic, or by the Group network, who may have bought the Page without knowing it had been hacked.
So, the MO of this group appears to be:
  1. Build a network of very large Groups through viral meme spam
  2. Buy or hack Pages and user accounts
  3. Have the bought or hacked user accounts and Pages make posts which link to domains the group is monetizing
  4. Share the posts with links to the monetized domain into the large groups
This is a pretty solid plan to exploit Facebook. The assets that take time to build, Groups with memberships in the millions, can be kept clean of the inauthentic activity, hacked accounts spamming links to monetized domains. If the group has been diligent in keeping their operation secure, it could be impossible, even with internal Facebook data, to link the Group admins to the inauthentic activity. And they have repeatedly used it to get enough views on links to get them into the top 20 on Facebook, so it has overall been very successful.
So the real question is, how did Facebook remove these links from the current Top 20 links list? Was it because they cleared out the group's current set of compromised accounts they were using to spam links? Or did they change how News Feed ranking worked such that these viral meme spam Groups have a harder time getting traction in users' Feeds? If their strategy was the former, then we should expect this Group network to again pop back up in Top 20 lists as the whack-a-mole continues, and maybe it would already show up in the Top 100 list if they were to release that. If their strategy was the latter, then it’s possible this tactic could be taken out for good.

Insider Dominates the Top Links

Insider (including Business Insider) owns 10 of the top 20 most viewed links. This is remarkable because we have never seen this level of concentration of top links in one publisher, and it’s also the first time we’ve even seen Insider make it into the list period. So, it’s worth asking, how did Insider accomplish this? With this level of dominance of the top links, whatever Insider is doing is clearly what News Feed incentivizes publishers to do, and so provides a clear view of the incentive structure Facebook creates within the online information ecosystem.
So, what publishing practices does Facebook reward, and thus directly incentivize legitimate publishers to adopt?

Insider Now Has A Huge Network of Pages

Insider has a lot of Facebook Pages. It is among the larger networks we’ve seen from a legitimate publisher (ie, not a foreign run spam operation). 20 different Insider run Facebook Pages were used in sharing the various links that made it to the top list.
The full list includes: Careers Insider, Culture Insider, Insider, Insider Beauty, Insider Business, Insider Design, Insider Entertainment, Insider Finance, Insider Home, Insider Life, Insider News, Insider Presents, Insider Retail, Insider Sports, Insider Style, Insider Tech, Inventions Insider, Movies Insider, Personal Finance Insider, Tech Insider Presents.
There are roughly 3 “centers” to the cluster. One around general news, one around tech news, and one around celebrity gossip and light news. The Pages in the network are not all identical, but the content does heavily overlap.
 
 
The Pages also repeatedly share the links over and over again. For example, this article was shared on “Insider Home” 14 times.
So, step one in getting articles into the Top 20 on Facebook: Create a bunch of Facebook Pages and spam your articles across them.

None of the Top Links from Insider Reflect Original Reporting

Insider is a legitimate publisher, and we are not accusing them of stealing content. However, the 10 articles of theirs that made it to the top lists fit one of two categories:
  1. A highlight of a point or two from an interview or analysis that another outlet produced
  2. A summary of a video that went viral on TikTok or YouTube
Now, every news outlet does this to some extent. And if 5 of the articles by Insider that made it to the top were original reporting and 5 were summaries of original reporting from other outlets, we’d probably say “fair enough”. But it’s not just lacking in original reporting; there is no original reporting to be found.
For example, the above example about Millie Bobby Brown originated with an interview of Ms. Brown by Allure Magazine on August 10th. On August 12th at 12:07PM, Variety published an article highlighting the anecdote from it that a casting director made her cry. And six hours later on August 12th, Insider went live with their article highlighting the same fact, that a casting director made her cry because she was “too mature”.
And this is a pattern we see again and again with the top links from Insider. This Insider article, number 10 on the list, is a summary of this TikTok video. This Insider article is a summary of this YouTube video. This Insider article is a summary of this piece on Good Morning America which was based on this TikTok video. It’s the “turtles all the way down” side of the online publishing industry.
And so, step two in getting articles in the top 20 links on Facebook: Do not waste your time doing the hard work of original reporting, just find articles and videos that went viral yesterday, and write articles that highlight or summarize the most viral component of them. (A methodology note: We labeled these articles from Insider as "original", since they were not stolen or plagiarized and did represent original summaries. This is the primary reason that the percentage of "Unoriginal" content dropped in the latest report.)

What Does This Say About How Facebook Incentivises Publishers?

It’s important to state that none of the content in the list is bad. Not on its face. There’s nothing wrong with celebrity gossip, or tabloid stories, or viral videos.
The problem is that social media platforms create strong incentives for publishers. And if a social media platform is designed poorly, if it is easily exploitable, and if it rewards bad publishing practices, then even with content that is fine, it can hurt information ecosystems.
If all you care about is getting traffic from Facebook, and you are creating original content to try and attract traffic from Facebook, then you are a chump. Plain and simple. From the Facebook traffic point of view, Allure Magazine wasted their time interviewing Millie Bobby Brown. With 1/100th the effort, they could have let some other outlet spend time on the interview and then simply summarized the more sensational parts of it. From the Facebook traffic point of view, every Reel creator is a chump. You should just freeboot whatever went viral yesterday on TikTok and repost it. Or write an article that summarizes it.
If all you care about is getting traffic from Facebook, and you are trying to build a real connection with your audience, then you are a chump. Instead of creating a brand Page that users will recognize and value and form a relationship with, you should create dozens of throwaway Facebook Pages, each targeting some different niche audience, and spam your content across it.
There are things Facebook could do to flip around their impact on the online information ecosystem. PageRank, the trillion dollar algorithm that built Google and whose patent has expired, continues after a year and a half of data collection to do a remarkable job of sorting the accounts that fail basic media literacy checks from the ones that pass, even with insider.com failures included. (insider.com has fairly high PageRank but mostly fails our media literacy checks).
 
 
As long as the content isn’t violating on its face, then Facebook has little care about how you produced it or how you distributed it, even if your practices are cannibalizing publishers that are doing the hard work of original reporting and actual journalism. And as our fellow Karan Lala said, “Spam isn’t just a nuisance; it can bring down democracies”. Many of the failures that allow Facebook to be easily exploited by lazy publishers are the same failures that allow Facebook to be easily exploited by actors wishing to spread misinfo, hate, and authoritarianism.

The Curious Case of the Missing Queen

Finally, a quick note of what wasn’t on the top content lists: the passing of Queen Elizabeth II. She died on September 8th, so well within the Q3 time range of this list, but no story about her makes it into the top 20.
Which is weird! Very very weird. Queen Elizabeth was a pretty big deal and her passing was one of the biggest news items of the year. This isn’t us speaking as elitists or anything. Celebrities are definitely a big deal. And Tiger Woods, Brad Pitt, and Millie Bobby Brown all having stories in the top 20 isn’t particularly surprising. They’re big, but they aren’t “English royalty big”.
This chart shows how many pageviews the Wikipedia articles for Queen Elizabeth II, Brad Pitt, Tiger Woods, and Millie Brown all got over the past five years.
 
 
The death of Queen Elizabeth II drove over 10x more public attention than any story about Tiger Woods or Brad Pitt, like ever.
And if you think that’s mostly because of international interest, and US audiences didn’t care, you are wrong. Here is the Google Trends chart of US searches for them over the past five years.
 
Google Trends chart of search volume for Queen Elizabeth II, Tiger Woods, Brad Pitt, and Millie Bobby Brown over the past five years.
 
Again, Queen Elizabeth drove about 3x more public attention specifically among US audiences than any Tiger Woods story in the past 5 years.
So it’s strange. And given statements by Mark Zuckerberg and Facebook about how there is too much political content on Facebook, we should be extremely concerned that it isn’t simply an oddity. This could potentially reflect Facebook suppressing all politics and civic content, including one of the biggest political stories of the year, because they can’t figure out how to rank political news without amplifying embarrassing hyper partisan hot takes and misinformation.
This is why, in our briefing on what transparency we need from platforms around how their ranking systems work, we specifically call out that platforms should “Disclose if they have different processes for ranking different content topics, or how content topic classifiers impact ranking”. If Facebook has special demotions for political content, they have the right to do that, but the public should know.
Because Facebook continues to only release a minuscule number of the top articles, rather than an amount large enough to comprehensively assess Facebook’s information ecosystem, we are a bit in the dark here.
Jeff Allen

Jeff Allen is the co-founder and chief research officer of the Integrity Institute. He was a data scientist at Facebook from 2016 to 2019. While at Facebook, he worked on tackling systemic issues in the public content ecosystems of Facebook and Instagram, developing strategies to ensure that the incentive structure that the platforms created for publishers was in alignment with Facebooks company mission statement.

Previous
Previous

Misinformation Amplification in the Nigerian Election

Next
Next

Preventing and Reacting to Burnout: A Guide for Integrity, Trust & Safety Workers and Managers