Case study with Data blogs, from 300 to 1000

We looked at how we could help increase the size of a list of Top300 Data blogs fast.
Initial idea from Marshall Kirkpatrick.
Project turn around 1 day. Here is the initial list of 300 bloggers:

Here is how we proceeded:

  • We loaded the initial list in our application
  • We mapped the initial community (graph above)
  • We analyzed the different clusters and identify a few sub communities around: BI, SOA/CLoud and Machine Learning
  • We also mine the content the initial bloggers’ list and got the following “expressions” as the one most frequently used
  • We expanded the different clusters and came up with a list close to 1000 (graph below)
  • This list has two major clusters and 4 smaller ones
    • SOA/Cloud/Architecture
    • BI
    • BI-Datamining
    • BI-Visualization/Dashboarding
    • Machine Learning/NLP/ SNA/ Comp Linguistics/ AI (oops a taboo word)
    • Developper blogs around data
  • Half of the bloggers form a very well connected community around those subjects. The others are not

Below are the top 500 ranked by influence.

1http://radar.oreilly.com251http://blogs.forrester.com/business_process
2http://googleblog.blogspot.com252http://arubawayne.blogspot.com
3http://www.readwriteweb.com/cloud253http://wiseanalytics.typepad.com
4http://www.datacenterknowledge.com254http://www.tom-carden.co.uk
5http://aws.typepad.com255http://www.openstack.org/blog
6http://www.cloudave.com256http://www.panorama.com
7http://www.information-management.com257http://mybiasedcoin.blogspot.com
8http://www.enterpriseirregulars.com258http://developer.yahoo.com/blogs/hadoop
9http://www.flowingdata.com259http://blog.scottlowe.org
10http://www.itbusinessedge.com260http://anand.typepad.com/datawocky
11http://www.cloudera.com261http://www.leebyron.com
12http://www.highscalability.com262http://www.chrisharrison.net
13http://cloudcomputing.sys-con.com263http://blog.asmartbear.com
14http://www.dbms2.com264http://mlinpractice.blogspot.com
15http://www.allthingsdistributed.com265http://apperceptual.wordpress.com
16http://www.perceptualedge.com/blog266http://www.robhirschfeld.com
17http://www.hunch.net267http://blog.gardeviance.org
18http://www.smartdatacollective.com268http://www.elementallinks.com/author/bmichelson
19http://www.kdnuggets.com269http://www.windley.com
20http://glinden.blogspot.com270http://www.zoliblog.com
21http://aws.typepad.com/aws271http://loosebolts.wordpress.com
22http://www.infosthetics.com272http://www.ramonchen.com
23http://www.dataminingblog.com273http://lizmcmillan.ulitzer.com
24http://blog.softwareinsider.org274http://doubleclix.wordpress.com
25http://blog.programmableweb.com275http://quantombone.blogspot.com
26http://perspectives.mvdirona.com276http://onionesquereality.wordpress.com
27http://www.lingpipe-blog.com277http://www.tibcoblogs.com/cep
28http://www.gapminder.org278http://blog.sasinct.com
29http://datamining.typepad.com279http://businessfoundation.typepad.com
30http://www.thenoisychannel.com280http://www.ryangoodman.net/blog
31http://www.intelligententerprise.com281http://shivabizint.wordpress.com
32http://nlpers.blogspot.com282http://www.juiceanalytics.com/writing
33http://www.ocdqblog.com283http://obibb.wordpress.com
34http://cloud.gigaom.com284http://www.predictiveanalyticsworld.com/blog
35http://datamining.typepad.com/data_mining285http://searchdatamanagement.bitpipe.com
36http://gevaperry.typepad.com286http://www.datagovernance.com
37http://www.jtonedm.com287http://www.textanalyticsnews.com
38http://www.timoelliott.com/blog288http://mark.reid.name/iem
39http://www.dataists.com289http://blog.recordedfuture.com
40http://www.informationisbeautiful.net290http://blog.data-miners.com
41http://www.information-management.com/news291http://www.fivethirtyeight.com
42http://devcentral.f5.com/weblogs/macvittie292http://blog.ouseful.info
43http://www.redmonk.com/jgovernor293http://datadoghouse.typepad.com/data_doghouse
44http://cwebbbi.spaces.live.com294http://blog.initiate.com
45http://www.decisionstats.com295http://blogstats.wordpress.com
46http://www.daniel-lemire.com296http://blog.blprnt.com
47http://stage.vambenepe.com297http://www.data-miners.com/blog
48http://obiee101.blogspot.com298http://blog.okcupid.com
49http://searchdatamanagement.techtarget.com299http://www.messymatters.com
50http://terrytao.wordpress.com300http://hlplab.wordpress.com
51http://www.r-bloggers.com301http://windowoffice.tumblr.com
52http://blogs.zdnet.com/SAAS302http://www.ebizq.net/blogs/connectedweb
53http://www.johndcook.com/blog303http://blogs.boomi.com/bod
54http://julianhyde.blogspot.com304http://1raindrop.typepad.com
55http://languagelog.ldc.upenn.edu/nll305http://briefingsdirect.blogspot.com
56http://www.cloudswitch.com306http://www.theagileexecutive.com
57http://www.dataqualitypro.com307http://www.saasbuzz.com
58http://abbottanalytics.blogspot.com308http://www.appirio.com/blog
59http://searchbusinessanalytics.techtarget.com309http://www.thestoragearchitect.com
60http://www.analyticbridge.com310http://www.tomhcanderson.com
61http://blog.revolutionanalytics.com311http://geneticargonaut.blogspot.com
62http://nosql.mypopescu.com312http://blog.minethatdata.com
63http://www.daniel-lemire.com/blog313http://neverknewthat.wordpress.com
64http://www.dataspora.com/blog314http://blogs.oracle.com/datawarehousing
65http://www.elasticvapor.com315http://www.monashreport.com
66http://jamesdixon.wordpress.com316http://blogs.microsoft.co.il/blogs/bei
67http://mervadrian.wordpress.com317http://blogs.sas.com/sasdummy
68http://www.dashboardinsight.com318http://bardoli.blogspot.com
69http://www.redmonk.com/cote319http://www.dashboardspy.com/experts
70http://www.rationalsurvivability.com/blog320http://www.legrandbi.com
71http://www.dataflux.com321http://www.informationarbitrage.com
72http://geomblog.blogspot.com322http://businesssintelligence.blogspot.com
73http://www.b-eye-network.com/blogs/imhoff323http://www.paloinsider.com
74http://googleresearch.blogspot.com324http://practicalquant.blogspot.com
75http://www.drewconway.com/zia325http://richardlees.blogspot.com
76http://www.dataspora.com326http://www.willgorman.com
77http://blogs.oracle.com/frankbuytendijk327http://www.b-eye-network.com/blogs/grimes
78http://www.mysqlperformanceblog.com328http://blogs.gartner.com/dave_mccoy
79http://blog.rightscale.com329http://mjfb-books.blogspot.com
80http://fscavo.blogspot.com330http://briefingsdirectblog.blogspot.com
81http://www.arnoldit.com/wordpress331http://www.coolinfographics.com/blog
82http://www.cloudtweaks.com332http://www.bitcurrent.com
83http://www.dataflux.com/dfblog333http://jeffjonas.typepad.com/jeff_jonas
84http://oraclebizint.wordpress.com334http://www.riccomini.name
85http://www.powerpivotpro.com335http://slycoder.wordpress.com
86http://oraclebi.blogspot.com336http://gameswithwords.fieldofscience.com
87http://liliendahl.wordpress.com337http://theprodigalacademic.blogspot.com
88http://www.techcrunchit.com338http://lucene.grantingersoll.com
89http://natishalom.typepad.com339http://terrierteam.blogspot.com
90http://cloudcomputing.blogspot.com340http://blog.veronis.fr
91http://junkcharts.typepad.com341http://www.mendicantbug.com
92http://www.biblogs.com342http://www.drmaciver.com
93http://searchdatacenter.techtarget.com343http://cloud-computing.learningtree.com
94http://nickbarclay.blogspot.com344http://intacct.blogspot.com
95http://www.lemire.me/blog345http://devcentral.f5.com/weblogs/psilva
96http://www.datawrangling.com346http://www.cloudenterprise.info
97http://dbmsmusings.blogspot.com347http://www.theagileadmin.com
98http://palblog.fxpal.com348http://blog.gogrid.com
99http://www.jackofallclouds.com349http://www.sixteenventures.com/blog
100http://www.searchenginecaffe.com350http://www.joyeur.com
101http://smoothspan.wordpress.com351http://www.insidehpc.com
102http://kevinljackson.blogspot.com352http://www.jpmorgenthal.com/morgenthal
103http://www.bartongeorge.net353http://blog.theloosecouple.com
104http://www.mwdadvisors.com/blog354http://technoracle.blogspot.com
105http://www.mspmentor.net355http://blogs.cisco.com/sp
106http://siebel-essentials.blogspot.com356http://blog.heroku.com
107http://www.geekswithblogs.net/darrengosbell357http://www.softwarestrategiesblog.com
108http://behind-the-enemy-lines.blogspot.com358http://www.oncloudcomputing.com/en
109http://www.igvita.com359http://soa-eda.blogspot.com
110http://www.privatecloud.com360http://charltonb.typepad.com/weblog
111http://www.kaushik.net/avinash361http://datacenterdesign.blogspot.com
112http://www.ibridge.be362http://blog.sciodev.com
113http://petewarden.typepad.com/searchbrowser363http://www.azurejournal.com
114http://www.anyall.org364http://www.informaniac.net
115http://www.irgupf.com365http://radfordneal.wordpress.com
116http://www.semanticweb.com366http://www.dataminingdownunder.com
117http://blogs.cisco.com/datacenter367http://machine-learning.blogspot.com
118http://blog.sforce.com/sforce368http://pedrocgd.blogspot.com
119http://www.column2.com369http://jkobielus.blogspot.com
120http://www.cloudscaling.com/blog370http://www.beyeblogs.com/karthikonbi
121http://blogs.msdn.com/b/windowsazure371http://bibrain.blogspot.com
122http://govcloud.ulitzer.com372http://optimizermagic.blogspot.com
123http://www.cloudbzz.com373http://www.thebiblog.com
124http://www.redmonk.com/sogrady374http://www.saasbusinessintelligenceforum.com
125http://onertipaday.blogspot.com375http://knowledge.ciber.nl/weblog
126http://rnm1978.wordpress.com376http://atomai.blogspot.com
127http://looksmarter.blogspot.com377http://pminsight.blogspot.com
128http://hekatonkheires.blogspot.com378http://bi-tch.blogspot.com
129http://www.rittmanmead.com/blog379http://blogerp.typepad.com
130http://my.safaribooksonline.com380http://datamarket.azure.com
131http://www.guardian.co.uk/news/datablog381http://blog.enterpriseadvocates.com
132http://www.eagereyes.org382http://www.cloud9analytics.com
133http://petewarden.typepad.com383http://topologicalmusings.wordpress.com
134http://oakleafblog.blogspot.com384http://www.bigdatamatters.com
135http://oracledmt.blogspot.com385http://www.simplecomplexity.net
136http://www.powerpivotblog.nl386http://cloudintegration.wordpress.com
137http://michaeltarallo.blogspot.com387http://www.measuringmeasures.com/blog
138http://www.intelligententerprise.com/blog388http://www.datafactotum.com
139http://yaroslavvb.blogspot.com389http://content.stamen.com
140http://www.norvig.com390http://blog.devdonkey.org
141http://www.datadoodle.com391http://www.radicalcartography.net
142http://www.jilldyche.com392http://maureenogara.sys-con.com
143http://www.ysearchblog.com393http://lauriemccabe.wordpress.com
144http://earningmyturns.blogspot.com394http://www.bytemining.com
145http://blogs.gartner.com/andrea_dimaio395http://www.infovegan.com
146http://www.cloudpundit.com396http://dbmoore.blogspot.com
147http://www.chaotic-flow.com397http://rpbouman.blogspot.com
148http://cloudcomputing.ulitzer.com398https://www.cloudave.com
149http://pro.gigaom.com399http://www.sunlightlabs.com
150http://www.measuringmeasures.com400http://rbaltman.wordpress.com
151http://blogs.oracle.com/warehousebuilder401http://mlstat.wordpress.com
152http://blogs.informatica.com/perspectives402http://www.drmaciver.com/blog
153http://blog.technologyevaluation.com403http://researchonsearch.blogspot.com
154http://timmanns.blogspot.com404http://www.enterprisesearchblog.com
155http://www.obiee-blog.info405http://www.vetta.org
156http://www.powerpivotgeek.com406http://blog.wordnik.com
157http://blogs.msdn.com/bi407http://isquared.wordpress.com
158http://datalligence.blogspot.com408http://jochenleidner.posterous.com
159http://datadoghouse.typepad.com409http://blog.semantichacker.com
160http://www.texttechnologies.com410http://www.livewebir.com/blog
161http://www.asterdata.com/blog411http://www.lexalytics.com/lexablog
162http://www.peterjamesthomas.com412http://lada.si.umich.edu:8080/wordpress
163http://www.mndoci.com413http://resnotebook.blogspot.com
164http://denglishbi.spaces.live.com414http://www.blogamundo.net/dev
165http://www.ocdqblog.com/home415http://www.storagerap.com
166http://www.cloudera.com/blog416http://briefingsdirect.posterous.com
167http://www.conflate.net/inductio417http://blogs.talis.com/nodalities
168http://thelousylinguist.blogspot.com418http://vburke.wordpress.com
169http://fountnhead.blogspot.com419http://techbuddha.wordpress.com
170http://cloudonomics.ulitzer.com420http://itblagger.wordpress.com
171http://blogs.oracle.com/xmlpublisher421http://blog.elementallinks.net
172http://www.enterprise-dashboard.com422http://www.itwriting.com/blog
173http://dashboardspy.wordpress.com423http://blog.appirio.com
174http://oracleolap.blogspot.com424http://markclittle.blogspot.com
175http://www.juiceanalytics.com425http://thecloudguytim.wordpress.com
176http://intelligent-enterprise.informationweek.com/blog426http://www.appistry.com/blogs
177http://jeffjonas.typepad.com427http://www.transparentuptime.com
178http://blog.jonudell.net428http://www.bladewatch.com
179http://www.hilarymason.com429http://www.virtualization.com
180http://www.anyall.org/blog430http://soa-talk.blogs.techtarget.com
181http://databeta.wordpress.com431http://www.cloudstoragestrategy.com
182http://open.blogs.nytimes.com432http://blogs.vmware.com/vcloud
183http://nuit-blanche.blogspot.com433http://blog.virtacore.com
184http://www.cloudcomputing.info/en434http://rvsoapbox.blogspot.com
185http://www.cloudofdata.com435http://www.soacenter.com
186http://mike.teczno.com436http://www.infosysblogs.com/cloudcomputing
187http://blogs.microsoft.co.il/blogs/barbaro437http://gevaperry.typepad.com/main
188http://www.business-intelligence-quotient.com438http://duckdown.blogspot.com
189http://www.bifacts.com439http://www.jeffjonas.typepad.com
190http://blogs.sas.com/sascom440http://www.marktab.net/datamining
191http://blogs.sas.com/cokins441http://blog.rapleaf.com/dev
192http://gilkalai.wordpress.com442http://togelius.blogspot.com
193http://lucatrevisan.wordpress.com443http://www.scienceblogs.com/developingintelligence
194http://blog.revolution-computing.com444http://medal.cs.umsl.edu/blog
195http://www.technologyreview.com/blog445http://www.neuralmarkettrends.com
196http://blogs.msdn.com/b/bi446http://blog.rgrossman.com
197http://www.tableausoftware.com/blog447http://dataminingwarehousing.blogspot.com
198http://www.michaelnielsen.org/blog448http://sqlrs.blogspot.com
199http://natishalom.typepad.com/nati_shaloms_blog449http://peoplesoftbits.wordpress.com
200http://www.nicholasgoodman.com/bt/blog450http://kferrier.blogspot.com
201http://blog.sematext.com451http://www.wills-blog.com
202http://pedroalves-bi.blogspot.com452http://blogs.forrester.com/information_management
203http://databasecolumn.vertica.com453http://www.renditionx.com/mastering
204http://blog.codalism.com454http://www.shangyezhineng.com/blog
205http://www.infra20.com455http://spotfireblog.tibco.com
206http://blogs.cisco.com/news456http://www.sqlserverbiblog.com
207http://www.thinkstrategies.com/blog457http://blog.datainspirations.com
208http://blog.tridentcap.com458http://blogs.hexaware.com/business_intelligence
209http://www.onstrategies.com/blog459http://blogs.sas.com/bipie
210http://service-architecture.blogspot.com460http://sas-bi.blogspot.com
211http://www.thenumerati.net461http://www.dashboardzone.com
212http://matlabdatamining.blogspot.com462http://www.dallasmarks.org/blog
213http://www.dashboardspy.com463http://paulsteynberg.blogspot.com
214http://www.gerardnico.com/weblog464http://blog.pentaho.com
215http://business-intelligence.kdejonge.net465http://bisolutions.blogspot.com
216http://minethatdata.blogspot.com466http://www.blogerp.com
217http://pjsrandom.wordpress.com467http://www.datalossdb.org
218http://zyxo.wordpress.com468http://datapuzzle.typepad.com
219http://www.chandoo.org/wp469http://www.smartdatacollective.com/blog
220http://learndiscoverer.blogspot.com470http://performancepoint07.blogspot.com
221http://kenoconnordata.wordpress.com471http://www.actuate.com/home
222http://www.structureddata.org472http://blog.tanelpoder.com
223http://eon.businesswire.com473http://www.benfry.com/writing
224http://www.brentozar.com474http://www.zerohedge.com
225http://ebiquity.umbc.edu/blogger475http://www.predictivesignals.com
226http://decisionstats.wordpress.com476http://blog.cloudeventprocessing.com
227http://blogs.wsj.com/numbersguy477http://www.calculatedriskblog.com
228http://blogs.the451group.com/opensource478http://aristippus303.wordpress.com
229http://jshurwitz.wordpress.com479http://www.chartporn.org
230http://nlbi.blogspot.com480http://www.sqlstream.com/blog
231http://dealarchitect.typepad.com/deal_architect481http://scorecardstreet.spaces.live.com
232http://chucksblog.emc.com/chucks_blog482http://www.metaoptimize.com/blog
233http://sarahburnett.blogspot.com483http://roman.stanek.org
234http://www.sna-projects.com/blog484http://www.fulltablescan.com
235http://xianblog.wordpress.com485http://www.analysisintelligence.com
236http://lifeanalytics.blogspot.com486http://charliem.wordpress.com
237http://www.cerebralmastication.com487http://blogs.sas.com/getgrowkeep
238http://blog.doloreslabs.com488http://ml.typepad.com/machine_learning_thoughts
239http://www.propublica.org/nerds489http://haas.openbi.com
240http://www.iq.harvard.edu/blog/sss490http://www.socializationofdata.com
241http://blog.smola.org491http://mikecurr55.wordpress.com
242http://tm.durusau.net492http://www.identityresolutiondaily.com
243http://blogs.zdnet.com/Gardner493http://blogs.msdn.com/b/naturallanguage
244http://www.greenm3.com494http://pseudo.posterous.com
245http://www.kinlane.com495http://willwhim.wordpress.com
246http://www.vosibilities.com496http://infoblog.stanford.edu
247http://vimsu99.blogspot.com497http://irepeat.wordpress.com
248http://www.machinedlearnings.com498http://www.emergingcl.com
249http://mgarner.wordpress.com499http://www.visualrevenue.com/blog
250http://www.ebizq.net/blogs/ebizq_forum500http://blog.amyiris.com

9 thoughts on “Case study with Data blogs, from 300 to 1000

Leave a Reply

Your email address will not be published. Required fields are marked *