14 February, 2010

Indian Language Wikipedias - Statistics - 2010 January

Here is a detailed statistical analysis of Indian language Wikipedias for the month of 2010 January .

The PDF of this analysis is available at http://shijualexonline.googlepages.com/2010_01_january_en.pdf. Please use the PDF for referring to the tables as the size of the tables in blog post is quite lengthy.

The data for this report is taken from the statistical analysis of all the WikiMedia wikis prepared and maintained by Erik Zachte (Website: http://infodisiac.com/). Special thanks to Erik for all the support extended by him while I was compiling this report. The data is collected on the last day of every month. That is, the data for the month of 2010 January is collected at 2010 January 31 23:59 PM GMT.

The statistical analysis of the following Indian language Wikipedias is included in this blog post.

I have also included the statistical analysis of some other language wikipedias from Indian subcontinent even though these languages are not spoken in India. I am very much interested in the wiki activity of these languages.

I know that there is no meaning in comparing an inactive wiki with less number articles like Assamese or Oriya, to highly active wikipedias like Hindi or Telugu. But for this month, let me treat all of them together. Next month onwards I would like to treat them as two separate entities.

I hope this initiative will improve the interaction between different Indian Language Wikipedias/wikipedians. We (Malayalam Wikipedians - http://ml.wikipedia.org) have been maintaining a similar comparison study of the major Indian Language wikipedias for the past two years. This analysis has helped us to understand the status of Malayalam Wikipedia as compared to other Indian Language Wikipedias. I hope this analysis will also help other Indian language wikipedias.

Please feel free to add your suggestions/analysis as comment to this post. I have divided this report into two different sections.

  1. Statistical analysis of Wikipedias

  2. Localization status of Mediawiki software



Following are the different topics covered under each section.

Wikipedia Statistics

Article statistics

User Statistics

MediaWiki Statistics

Localization statistics

Number of Articles

Wikipedia Language

Number of Articles

2009

November

2009

December

2010

January

Assamese

261

261

263

Bengali

20,754

20,918

21,016

Bhojpuri

2,480

2,481

2,481

Bishnupriya Manipuri

23,424

24,733

24,738

Gujarathi

11,255

11,904

12,579

Hindi

52,144

52,645

53,216

Kannada

7,596

7,741

7,846

Kashmiri



375

Malayalam

11,459

11,635

11,871

Marathi

25,737

26,034

26,544

Odia (Oriya)

553

553

553

Pali



2,316

Punjabi

1,490

1,492

1,505

Sanskrit

3,883

3,887

3,914

Sindhi



349

Tamil

20,095

20,472

20,959

Telugu

44,098

44,238

44,333

Urdu



12,547





Burmese



2,938

Nepal Bhasha/Newari



61,487

Nepali



3,079

Sinhala



2,153

Hindi Wikipedia continues to have the highest number of articles among all the Indian language Wikipedias. Telugu Wikipedia is in the second place. The number of articles in Gujarathi Wikipedia is almost doubled during the past 6 months. Among the active Wikipedias, Kannada and Malayalam are growing very slowly (growth in terms of article number) compared to other active Indian language wikipedias.

When you analyze the different topics in this blog post, you will understand that some of the big wikipedias do not have enough number of edits corresponding to the number of articles. Also the number of stub articles in some big wikis are very high.

Let us hope the stub articles will get more content as more active contributors arrive.

Back to Top

Number of Edits

Wikipedia Language

Number of Edits

2009

November

2009

December

2010

January

Assamese

8,926

9,134

9,290

Bengali

5,51,486


5,86,472

Bhojpuri

52,553

53,203

54,099

Bishnupriya Manipuri

4,18,566

4,29,198

4,36,153

Gujarathi

63,578

67,769

72,492

Hindi

5,51,162

5,67,029

5,81,447

Kannada

1,22,964

1,26,504

1,29,848

Kashmiri



13,075

Malayalam

5,33,391

5,51,307

5,69,056

Marathi

4,45,205

4,58,769

4,74,113

Odia (Oriya)

19,805

20,052

20,321

Pali



48,865

Punjabi

16,980

17,426

18,176

Sanskrit

67,151

68,557

70,132

Sindhi




Tamil

4,59,441

4,71,678

4,83,481

Telugu

4,69,481

4,76,825

4,83,390

Urdu



2,70,868

Burmese



30,503

Nepal Bhasha/

Newari



458,066

Nepali



40,363

Sinhala



74,493

More number of edits from multiple contributors will enhance the quality of articles in a Wikipedia.

Hindi, Bengali, and Malayalam Wikipedias have the highest number of edits in the Wikipedia. But most of the big wikis do not have the number of edits corresponding to the number of articles.

Back to Top

Break up of edits (2009 February – 2010 January)

Wikipedia Language

Bot edits

User Edits (Registered and Anonymous users)

Assamese

55

45

Bengali

68

32

Bhojpuri

92

8

Bishnupriya Manipuri

96

4

Gujarathi

37

63

Hindi

49

51

Kannada

53

47

Kashmiri

83

17

Malayalam

37

63

Marathi

59

41

Odia (Oriya)

92

8

Pali

94

6

Punjabi

55

45

Sanskrit

82

18

Sindhi

29

71

Tamil

51

49

Telugu

51

49

Urdu

53

47




Burmese

41

59

Nepal Bhasha/

Newari

91

9

Nepali

52

48

Sinhala

12

88



You can use wiki bots to automate a set of tasks that are repetitive and boring. For example, adding inter wiki links, fixing double redirects, and so on. But for some wikipedias majority of the contributions are from the bots.

From the above table, you can conclude that for some of the big wikipedias, the wiki activity is not due to human editors. Edits by bots are more than 80% in some cases. Some wikis are even using the wiki bots to create one-liner articles just for the sake of increasing the article count. This approach needs to be changed.

We need to learn how to use the bots effectively to increase the quality of wiki articles, rather than creating one liner articles. Using bots to create the title-only articles will effectively block many future wikipedians from wiki editing. Most of the people outside wikipedia will start wikipedia edit by creating an article. By using bots to create thousands of one-liner articles we are loosing many of the future wikipedians.

Back to Top

Edits per article

Wikipedia language

2009

November

2009

December

2010

January

Assamese

19.3

20.0

20.2

Bengali

16.7

17.0

17.2

Bhojpuri

5.5

11.7

18.7

Bishnupriya Manipuri

9.3

9.1

9.3

Gujarathi

4.3

4.4

4.5

Hindi

7.0

7.2

7.3

Kannada

12.8

12.9

13.1

Kashmiri

28.3

28.8

29.3

Malayalam

26.6

27.1

27.4

Marathi

13.1

13.3

13.5

Odia (Oriya)

21.8

22.1

22.5

Pali

17.7

18.0

18.3

Punjabi

8.0

8.3

8.6

Sanskrit

14.7

15.0

15.2

Sindhi

23.5

23.8

24.1

Tamil

16.3

16.5

16.5

Telugu

8.0

8.0

8.1

Urdu

15.9

16.0

16.2





Burmese

6.3

6.4

6.6

Nepal Bhasha/Newari

3.0

2.9

2.9

Nepali

9.7

10.0

9.9

Sinhala

23.2

21.4

21.8



The number of edits in an article is an indicator of article quality and wiki activism. It shows how much attention that an article gets. More edits to an article by multiple contributors indicate that the quality and the neutrality of an article is better than an article with only a few edits by only one or two editors. (You can always debate this. But this is true in most of the cases.) Among the active big wikipedias Malayalam Wikipedia tops the list with an average of 29 edits for each of its articles.

Back to Top

Number of new articles/day

Wikipedia Language

2010

January

Assamese

Not Available

Bengali

5

Bhojpuri

Not Available

Bishnupriya Manipuri

Not Available

Gujarathi

Not Available

Hindi

18

Kannada

Not Available

Kashmiri

Not Available

Malayalam

8

Marathi

17

Odia (Oriya)

Not Available

Pali

Not Available

Punjabi

Not Available

Sanskrit

Not Available

Sindhi

Not Available

Tamil

16

Telugu

3

Urdu

5



Burmese

3

Nepal Bhasha/

Newari



45

Nepali

Not Available

Sinhala

3



The above table shows the average number of articles created in a wiki daily.

Increasing the number of articles in a wikipedia is very important. Then only the wikipedia will get a large user base (readers). Hindi, Tamil, and Marathi Wikipedia tops this list. Hope all the Indian wikipedia communities will make sure these “new articles” will have sufficient content, so that wiki user does not need to navigate away to English Wikipedia or Google to get the information about a topic.

A large number of articles with good encyclopaedic content will attract more potential editors to the wikipedia. So please ensure that you are adding at least the basic facts and figures in all the newly created articles.



Back to Top

Average size of an article (bytes)

Wikipedia language

2009

November

2009

December

2010

January

Assamese

2506

2506

1492

Bengali

1342

1383

1407

Bhojpuri

118

119

119

Bishnupriya Manipuri

1084

1086

1090

Gujarathi

1056

1098

1099

Hindi

1182

1235

1275

Kannada

1923

2526

2806

Kashmiri

424

422

420

Malayalam

2690

2725

2740

Marathi

768

777

800

Odia (Oriya)

236

236

236

Pali

141

141

141

Punjabi

741

740

759

Sanskrit

184

187

197

Sindhi

4092

4080

4070

Tamil

2118

2441

2574

Telugu

832

883

915

Urdu

1535

1554

1550





Burmese

2986

3037

3033

Nepal Bhasha/

Newari

707

805

882

Nepali

1256

1282

1259

Sinhala

5892

5430

5452



A wikipedia article will benefit a user when the articles have sufficient content. But many of the big Wikipedias have less content in most of its articles.

It is extremely happy to see the long articles in Sinhala wikipedia. Sinhala Wikipedia has more than 2000 articles now. Many of the Indian language Wikipedias should learn from Sinhala wikipedia the art of adding more content to the existing articles.



Back to Top

Database size (in Mega Bytes)

Wikipedia language

2009

November

2009

December

2010

January

Assamese

1.5

1.5

1.5

Bengali

81

84

86

Bhojpuri

4.8

4.8

4.8

Bishnupriya Manipuri

65

65

65

Gujarathi

32

35

37

Hindi

165

174

181

Kannada

42

53

59

Kashmiri

0.77

0.77

0.78

Malayalam

88

90

93

Marathi

63

64

67

Odia (Oriya)

1.2

1.2

1.2

Pali

4.7

4.7

4.7

Punjabi

4

4

4.1

Sanskrit

6.6

6.6

6.8

Sindhi

2.8

2.8

2.9

Tamil

119

138

148

Telugu

97

103

107

Urdu

40

42

42





Burmese

24

26

26

Nepal Bhasha/Newari

107

128

144

Nepali

9.2

9.5

9.8

Sinhala

34

39

40



This is the total size of a wikipedia. No wonder Hindi Wikipedia tops the list, as it is also the biggest wiki with highest number of articles.

It is extremely happy to see that Tamil wikipedia with article number half of Hindi wikipedia is having the database size almost in the same range of Hindi Wikipedia. This shows that even though the number of articles in Tamil wikipedia is less as compared to Hindi or Telugu, it has more content in its articles.

Back to Top

Percentage of articles with size greater than 500 bytes

Wikipedia language

2009

November

2009

December

2010

January

Assamese

41

41

41

Bengali

56

57

57

Bhojpuri

2

2

2

Bishnupriya Manipuri

85

86

86

Gujarathi

19

19

20

Hindi

42

43

43

Kannada

54

55

55

Kashmiri

12

12

12

Malayalam

84

84

84

Marathi

26

26

27

Odia (Oriya)

2

2

2

Pali

1

1

1

Punjabi

16

16

16

Sanskrit

4

5

5

Sindhi

61

60

60

Tamil

81

82

82

Telugu

22

22

22

Urdu

55

55

55





Burmese

67

67

67

Nepal Bhasha/Newari

60

62

63

Nepali

55

56

54

Sinhala

78

81

81



The above table shows the percentage of articles with size more than 500 bytes.

More than 80 percentage of the articles in Tamil, Bishnupriya Manipuri, and Malayalam wikipedias have article size crossing 500 bytes. For some big wikipedias, more than 50 % of its articles have size less than 500 bytes; which reveals that many of those articles are not much help to the reader. The reader still need to depend on English Wikipedia or Google to acquire the required information. More effort is required from the respective wiki communities to make the Wikipedia useful to the respective language community.

Back to Top

Percentage of articles with size greater than 2000 bytes (2 kilobytes)

Wikipedia language

2009

November

2009

December

2010

January

Assamese

22

22

22

Bengali

14

14

15

Bhojpuri

1

1

1

Bishnupriya Manipuri

1

1

1

Gujarathi

5

5

5

Hindi

9

10

10

Kannada

15

16

17

Kashmiri

5

5

5

Malayalam

34

35

35

Marathi

6

7

7

Odia (Oriya)

1

1

1

Pali

0

0

0

Punjabi

8

8

8

Sanskrit

1

1

1

Sindhi

33

33

33

Tamil

24

25

25

Telugu

8

8

8

Urdu

17

17

17





Burmese

38

39

39

Nepal Bhasha/Newari

2

7

11

Nepali

10

10

10

Sinhala

53

53

53



The above table shows the percentage of articles with size more than 2000 bytes. Among active Wikipedias Malayalam, and Tamil tops the list. You can make your own analysis by comparing this table with the previous table.

Back to Top

Number of active wikipedians

Wikipedia language

2009

November

2009

December

2010

January

Assamese

1

1

1

Bengali

25

35

32

Bhojpuri

1

1

1

Bishnupriya Manipuri

6

6

4

Gujarathi

7

9

8

Hindi

50

62

51

Kannada

24

22

22

Kashmiri

0

0

1

Malayalam

56

50

65

Marathi

22

25

36

Odia (Oriya)

0

0

0

Pali

0

1

0

Punjabi

1

1

4

Sanskrit

4

5

6

Sindhi

1

0

2

Tamil

45

55

53

Telugu

38

34

26

Urdu

24

20

20





Burmese

1

1

4

Nepal Bhasha/Newari

3

3

3

Nepali

8

5

5

Sinhala

45

37

7



The number of active wikipedians is the strength of a wikipedia. They define the quality of a wikipedia. For that, the Wikipedia should have number of active Wikipedians corresponding to its total number of articles. Then only edits will happen in more number of articles. Hindi, Tamil, and Malayalam tops the list with most number of active wikipedians.

Back to Top

Page views per month (All figures in Lakh page views/month)

Wikipedia language

2009

November

2009

December

2010

January

Assamese

0.87

0.93

0.86

Bengali

22

28

24

Bhojpuri

0.09

0.09

0.11

Bishnupriya Manipuri

13

15

14

Gujarathi

4.6

5.4

4.9

Hindi

41

49

41

Kannada

8.08

9.16

7.72

Kashmiri

0.52

0.57

0.51

Malayalam

27

28

26

Marathi

23

28

24

Odia (Oriya)

0.41

0.42

0.40

Pali

0.85

0.83

0.82

Punjabi

1.24

1.28

1.33

Sanskrit

2.00

2.11

2.17

Sindhi

0.57

0.61

0.54

Tamil

24

26

24

Telugu

37

41

33

Urdu

11

10

10





Burmese

1.58

1.60

1.77

Nepal Bhasha/Newari

13

14

15

Nepali

1.44

1.47

1.55

Sinhala

2.82

3.02

3.50



Number of page views represent how many times the readers/wiki editors have opened the wiki pages. This parameter is some what related to the number of articles in a Wiki. If there are more number of articles wiki visitors will also be more. Many of today's readers can be future wiki editors. This is one of the main reasons to increase the number of articles (of course with quality content).



Back to Top

Media Wiki Localization status (percentage)

Language

Most often used messages

MediaWiki messages

Extensions used by Wikimedia

All extensions

Assamese

98.08

43.83

1.86

1.61

Bengali

100.00

82.36

46.09

22.25

Bhojpuri

0.21

0.08

0.00

0.00

Bishnupriya Manipuri

100.00

52.51

0.11

0.30

Gujarathi

100.00

40.79

5.91

6.59

Hindi

99.36

97.22

29.43

26.44

Kannada

100.00

59.63

3.55

3.21

Kashmiri





Malayalam

100.00

97.90

98.00

51.77

Marathi

98.72

75.88

26.19

37.13

Odia (Oriya)

4.48

1.39

0.25

0.30

Pali

0.21

0.08

0.00

0.00

Punjabi

56.08

30.26

0.42

0.42

Sanskrit

97.65

27.22

0.00

0.34

Sindhi

73.13

24.91

0.11

0.07

Tamil

92.32

74.71

1.02

1.77

Telugu

100.00

100.00

65.41

52.57

Urdu

71.64

38.77

1.75

1.12






Burmese

29.00

10.45

0.07

0.02

Nepal Bhasha/Newari

32.84

12.35

0.04

0.01

Nepali

96.59

68.77

0.98

0.90

Sinhala

100.00

100.00

28.59

20.06



GerardM a translate wiki administrator, has been passing the below message to most of the Indian language wikipedias. He has also send mails regarding this to wikimediaindia mailing list a couple of times.

We expect that with the implementation of Localisation update the usability of MediaWiki for your language will improve. We are now ready to look at other aspects of usability for your language as well. There are two questions we would like you to answer: Are there issues with the new functionality of the Usability Initiative Does MediaWiki support your language properly. The best way to answer the first question is to visit the translatewiki.net...

Localization of the Wiki software is very important when we are trying to reach to prospective Wikipedians in any language. A website with interface and all system messages in the local language has edge over a website with English only content among that particular language community. Here comes the role of Localization of MediaWiki software. We use translate wiki for coordinating the localization efforts of all the languages. Two administrators of translate wiki, Siebrand and GerardM are available in this list also.

Some of the Malayalam Wikipedians (including me) have understood the importance of localizing the Media wiki messages to Malayalam long back (more than 2 years ago). From the above table you can understand that Malayalam is in the forefront in localizing the mediaWiki and other related system messages. Now a days Malayalam Wikipedian Praveen Prakash is coordinating the localization efforts of Media Wiki messages for Malayalam.

I request the respective community to give top priority in localizing the Mediawiki messages to your language. When you do so you are localizing the interface and system messages to your respective language. Apart from helping the Wiki projects of your language you are also helping a native user to use Media wiki software with the native language support.

Back to Top