Blog for LIS 2600: September 2013

Thursday, September 26, 2013

September 23's Muddiest Point

Concerning the Entity Relational Model, is there any differences between the Chen, Crow’s Feet, and UML Notation beyond their graphical features? Does it matter which one a person can use, or do they each have specific uses?

Week 5 Readings

Articles

Gilliland, A. J. (2008). Setting the Stage. In Introduction to Metadata, Second Edition. Retrieved September 24, 2013, from http://www.getty.edu/research/publications/electronic_publications/intrometadata/setting.html

A. J. Gilliland’s (2008) take on metadata is all-encompassing. Rather than give a fixed definition of according to her profession, she chooses to acknowledge many different peoples’ impressions on “metadata” – from the metadata involving indexes, bibliographic records, and abstracts of libraries (Gilliland “Setting”) to the information encoded into HTML META tags encountered by the average Internet resource provider (Gilliland “Setting”). Such a method has its pros and cons. On the one hand, by including all of the interpretations, she provides a good basis for understanding the flexibility of metadata, its uses beyond one particular field, and what links and differentiates each profession’s metadata – an overview of the concept. On the other hand, though, she is providing a lot of information, maybe too much to adequately investigate particular aspects of metadata.
One comment caught my attention, though; Gilliland notes that “it would seem to be a desirable goal” to join together various materials linked by provenance or subject but disseminated across museum, archives, and library repositories (Gilliland “Setting”). Would this be desirable? I’m not sure myself; if it could be done, it would make finding, organizing, and storing materials easier. It, however, would assume a “one size fits all” approach, disregarding the differences between the professions that would require different interests for each.

Miller, E. J. (1999, June 6). An Overview of the Dublin Core Data Model. Dublin Core Metadata Initiative. Retrieved from http://dublincore.org/1999/06/06-overview/
While I was able to understand the underlying purpose of the article of examining the Dublin Core Data Model, I had a hard time focusing on the article as a whole. E. J. Miller (1999) got a little wordy at times. For example, in the first paragraph of the section “Semantic Refinement,” Miller states that the Dublin Core Metadata Initiative “additional recognized early on that various communities may choose to utilize richer semantic definitions” and that “a requirement evolved from this recognition” (Miller, 1999, Semantic Refinement, para. 1) – I think this could have been cut down more. Doing so would have allowed him to be more direct in explaining the Initiative’s reactions to users using semantic definitions outside the Dublin Core Element Set – maybe using the extra space to elaborate further. In addition, misspellings abound. To name a few, in the quote given above, he forgets to add “y” to “additional” to have “additionally;” under “DCMI Requirements,” he misspells “data model,” “representing,” and “required” in the same sentence – “…a formal datamodel able to support the requirements of the DCMI and a corresponding means of syntactic represetnating this information is requireed” (ibid., DCMI Requirements, para. 2); and under “Compound Values,” I believe he meant to use “whether” instead of “weather” in the last sentence [“These characteristics are independent of weather this person…”] (ibid., Compound Values, para. 1). The content is valuable, but such sloppy writing detracts its value. I do wonder, though, what the state of writing will become with digitization. Will it become better or worse or about the same? Is grammar linked to digitization? Will metadata have an effect on writing as well, adding new dimensions and structures to it?

Meloni, J. (2010, July 19). Using Mendeley for Research Management. The Chronicle of Higher Education. Retrieved from http://chronicle.com/blogs/profhacker/using-mendeley-for-research-management/25627

            The article itself provides an interesting view on Mendeley. Being a Zotero user (presumably a rival company), J. Meloni (2010) investigates the management tool by applying for an account herself, experiencing Mendeley first-hand (Meloni, 2010, para. 2). Thus – although biased – she provides commentary only a user can, especially one who has tried other platforms.
            While Mendeley does seem like a viable management tool, I can see one major problem arising. When describing the key features of Mendeley, Meloni (2010) notes how a user can “view the most read authors, journals, and publications within [their] field or other fields” (ibid., Key Features, para. 1). While such discoveries ensure that a person can remain updated on the most popular readings and trends for a particular field, they also limit what one can read. The “most read” refers to what other users tend to read the most; their interests would not necessarily match every person’s interests. In this model, a user may have a harder time finding documents or authors who are not popular with the users but still provide key information on topics.
            Overall, though, I can see how Mendeley and Zotero can develop further. Meloni (2010) relates how she was able to use the “Import from Zotero” feature to seed her Mendeley account with her Zotero data, syncing the tools to improve her research capabilities (ibid., What About, para. 2). If a person can combine different modules together, they could create a new form of digital tools – fluid features which can combine and work together, increasing efficiency and the ability of the user to acquire whatever they need.

Thursday, September 19, 2013

September 16's Muddiest Point

In the Unicode, there’s a capacity for 31 bits, but only 16 of those bits are used as the plane to represent one character. What happens to the remaining 15? With such a division in bits, how do the bits interact with/relate to each other? Are there possibilities for problems to occur?

Week 4 Readings

Articles

Coulson, F. (n.d.). Tutorial on Database Normalization. Phlonx. Retrieved from http://www.phlonx.com/resources/nf3/

I found that Fred Coulson’s structuring of his tutorial helped me understand a little easier the database normalization. Specifically, he explains that he himself “find[s] it difficult to visualize these concepts using words alone, so [Coulson] shall rely as much as possible upon pictures and diagrams” (Coulson, Introduction, para. 1). This was a good method on his part; the examples were merely words before and would hold no meaning if the reader did not already know about the tools used. Images, however, elaborate on the tutorial and ensure that everyone has a similar basic understanding of the topic.

In addition, after reading the article, I have started recognizing when others discuss the concepts mentioned. For example, Professor Langmead in LIS 2220 recently talked about the nature of records and invited discussion on what databases were in that context. When I heard such terminology being voiced, I immediately took out my notes from the website so that I could understand what was being discussed. When she noted the relational database, I knew its function in the relational database management system (RDBMS) (Coulson, First Normal Form, para. 11) and that it has both an obvious primary key – columns that identify each row – (ibid., First Normal Form, para. 12) and a concatenated primary key, or a primary key made up of more than two columns (ibid., First Normal Form, para. 12). I didn’t recognize the normalization process when Prof. Langmead first referenced it, but after looking at my notes I now understand how it determines the nature of databases, especially its three normal forms forbidding repeating elements and dependencies on concatenated keys and non-key attributes (Coulson, Introduction, para. 7). This situation illustrates how, in the field of library and information sciences, there are no distinct situations where certain concepts are only discussed once; these ideas will appear in every field. I just have to be aware so that I can use or participate in debates about them.

Database. (2013, September 15). Retrieved September 17, 2013, from Wikipedia: http://en.wikipedia.org/wiki/Database

What caught my attention in the article was the terminology section. According to Wikipedia, the word “database” can mean (formerly) the data itself and its supporting data structures (Database, 2013, Terminology, para. 1), a causal reference to the database management system overall and the data which it manipulates (ibid., para. 2), or – “outside the world of professional information technology” – any collection of data, such as a spreadsheet or card index (ibid., para. 3). Such meanings for the term provided are interesting. Data, as well as a sense of an overarching system encompassing that data, connects all of the definitions. The differences lie not only in the details – what it exactly involves – but also in the implications of the power of data. The first definition situates data as the main force in the database, supported by the structures, while the last two definitions portray data as something used and controlled within the configuration. I wonder if this might have something to do with whoever uses each meaning. The last two are said to be casual interpretations, implying that those who use that form of “database” view data as something they could use and manipulate. The first, however, has no defined user except that it “formerly” meant that – perhaps implying that when “database” was first used, people were afraid of or had greater respect for how data can affect others.

Another topic of interest was the number of databases listed. I never realized so many types existed – how there are parallel databases for improving performance through parallelization, probabilistic databases to employ fuzzy logic, cloud databases relying on cloud technology, and others like them, all to perform different jobs and fulfill a wide variety of needs (Database, 2013, Database type examples, para. 1). And these are just examples. If they are the current models now, who is to say that more cannot be made? That others will create databases for uses we have not thought of yet? Or someone may develop current databases to encompass new structures or complete their uses at greater levels? The possibilities are endless.

Entity-relationship model. (2013, September 18). Retrieved September 18, 2013, from Wikipedia: http://en.wikipedia.org/wiki/Entity-relationship_model

As a former English double-major, the linguistics nature of the entity-relationship model fascinates me. That entities can be understood as nouns (Entity-relationship, 2013, The building blocks, para. 3) and relationships – “captur[ing] how entities are related to one another” – can be reduced to verbs (ibid., para. 4) is interesting. I do not know if I fully understand the purpose, but it seems like such a method helps categorize information within a database – in which case, using language structures as the organizing element seems to reveal more about human nature. People depend a lot on language, more than I originally thought before reading the article.

However, even though we depend so much on communication, it does not seem too stable. Under the limitations section, one limit of the entity-relationship model was that it presumes that information content can easily be represented in a relational database but it itself only describes the relational structure for the information (Entity-relationship, 2013, Limitations, point 1). In terms of linguistics, such a restriction suggests that a different level of language is used in the model. We know language as a fertile, complex force that can have just a couple of words represent both simple and complex ideas. Inside the database, though, it is reduced to its structure rather than meaning, depending on fewer words than needed to describe ideas. This sounds a little ironic, since information is communication in a sense. This would mean that, when working with the models, I would need to be careful on how I manipulate and categorize the content.

Friday, September 13, 2013

September 9's Muddiest Point

Moore’s Law – I understand that it predicts that the processing capacity of computing power will double every 18 months. Does anything else improve/double? Does miniaturization have anything to do with the Law?

Week 3 Readings

Articles

Galloway, Edward A. (2004, May 3). Imaging Pittsburgh: Creating a shared gateway to digital image collections of the Pittsburgh region. First Monday, 9(5). Retrieved from http://firstmonday.org/ojs/index.php/fm/article/view/1141/1061

When reading the article, I was struck by the benefits for using digitization which Edward A. Galloway voiced. Many of the advantages appear to be for users – the website that provides direct access to the collections had “greatly increased public access to significant collections of historic material documenting the growth and development of Pittsburgh and the surrounding western Pennsylvania region during the nineteenth and early twentieth centuries” (Galloway, 2004, Project Summary, para. 2) and users could gain a deeper understanding of overall events, localities, infrastructure, land use, and populations (Galloway, 2004, The Online collection, para. 2). As such, the focus is on the users – they learn more and get more information through the projects. While such benefits are explicit, that does not mean that the content partners of the project do not gain advantages as well. While Galloway notes only one benefit – income and financial funding (Galloway, 2004, Characteristics, para. 2) – they would gain more than that. By attracting more people to the site, they would get potential visitors for their main sites; the attention of other institutes to collaborate on projects with; and developing expertise in digitization, communication, and partnerships. Such results are good in the long term.

Webb, Paula L. (2007, June). YouTube and libraries: It could be a beautiful relationship. College & Research Libraries News, 68(6), 354-355. Retrieved from http://crln.acrl.org/content/68/6/354.full.pdf

Overall, the article presents an interesting point of view. Usually when I access YouTube, I only see it from a consumer point of view – I try to find clips and videos to watch for fun rather than for business pursuits. Paula L. Webb, though, analyses it from a career point. She addresses the librarians as her core group and attempts to transform YouTube into a tool for libraries to use, describing how to sign-up (Webb, 2007, 354), the advantages (Webb, 2007, 354-355), and suggestions on how librarians can use the media at their hands (Webb, 2007, 355). Such a stance implies confidence, that those involved in the library sciences should not be afraid of or dismiss the internet, but embrace it, which I think works better than remain afraid of change.

One claim, though, pushed me to investigate. When describing the advantages of using YouTube, Webb (2007) notes that some of the regulations include a maximum file size of 100 MB and at most 10 minutes worth of footage per video (p. 354). At first I (having never operated a YouTube account before) thought she meant that the site regulated this due to its own limitations; that it could not store any more megabytes to last more than 10 minutes. But this could not be true; in undergrad, I had taken a couple of film courses that required watching movies outside of class. In some cases, I could find whole movies on YouTube, such as His Girl Friday – a download of around 1 hour and 31 minutes total in one viewing – and Hedwig and the Angry Inch – over 1 hour and 31 minutes as well, but with a fee. I realized then that the regulations had more to do with copy-right infringement than technological limitations (the movies listed would pass – His Girl Friday was first produced in 1940 and the later has fees involved). It was enlightening, and makes me wonder about what sorts of tension must exist between digitization, technological advantages, and commercial reality.

Data compression. (2013, September 9). Retrieved September 10, 2013, from Wikipedia: http://en.wikipedia.org/wiki/Data_compression

The article (at the time of my first viewing) was very informative, though heavy at times, in explaining data compression. I got confused at some points in its explanation, such as in the descriptions about the theories, including Machine Learning and Data Differencing (Data compression, 2013, Theory). After Monday’s course, though, I think I have a better understanding.

Looking over the text again, I am drawn towards the examination of lossy data compression. The article not only describes it, but links it to devices. In particular, it claims that users can use lossy image compression in digital cameras “to increase storage capacity with minimal degradation of picture quality” (Data compression, 2013, para. 2). This reminded me of the first assignment. The images we will be working on for the assignment will deteriorate when we use digital cameras (or, presumably, scanners). That is understandable; we are taking pictures of objects, so they are not exact replicas of their subjects. By how much, though, will the images deteriorate? What is “minimal degradation”? Will we notice?

del-Negro, Rui. (2013). Data compression basics. DVD-HQ. Retrieved from http://dvd-hq.info/data_compression_1.php

Rui del-Negro (2013) writes in a clear fashion using lots of examples, so I was able to follow along most of the time. The “Note…” sections were especially interesting. They provided more details that I would not have thought of, such as how the Huffman coding has encompassed other prefix-free entropy codings beyond its original form (del-Negro, 2013, Entropy coding, para. 9). Del-Negro’s note on the reference to RLE/“squeezing algorithm” in Monstrous Regiment, written by Terry Pratchett, (Run-length encoding, para. 31) caught my attention. I am a fan of his series, so when I reread the book, I will definitely keep the encoding in mind.

One part, however, confused me. When describing the prediction algorithm, what it actually is was related in a round-about way (at least for me). I understand that the algorithm is based on studying two values only while assuming linear variation (del-Negro, 2013, Prediction, para. 9) but the specifics are unclear. From what I can read, the goal is to acquire efficient compression of an image file (del-Negro, 2013, Prediction, para. 3). The procedure involves storing errors, or subtracting predictions from the real values (whatever that is) of pixels that come after two known pixels (del-Negro, 2013, Prediction, para. 4). The results would show that the values might fit in less bits and – if the values are in a small range – it would mean that there are repeated values, meaning there are repeated sequences, which would allow a person to apply other compression techniques on the errors list (del-Negro, Prediction, para. 7). Since the description is spread out over several paragraphs, I do not know if this is the right impression.

Friday, September 6, 2013

Week 2 Readings

Required Articles
Carvajal, D. (2007, October 28). European libraries face problems in digitalizing. The New York Times. Retrieved from http://www.nytimes.com/2007/10/28/technology/28iht- LIBRARY29.1.8079170.html?_r=1&

D. Carvajal (2007) explains the attempts of European libraries to create a digital archives that would compete with Google. Named the European Digital Library, it would have “held the promise of a counterstrike to Google domination of digital archives” (Carvajal, 2007, para. 1), which, to some such as Jean-Noel Jeanneney, former leader of the Bibliotheque Nationale de France, represents the encroaching dominance of American interpretation on European literature, history, and politics (Carvajal, 2007, para. 10). However, the libraries involved in the project face a crucial challenge: funds. The European Commission refuses to pay more than 60 million euros for the project – the additional basic digitization costs 250 million euros for over four years (Carvajal, 2007, para. 5), so the organizations are having trouble finding financial support from private backings.

After reading C. E. Smith’s (2008) article, Carvajal (2007) illustrates a new, more realistic view on digitization. Smith (2008) remains ideal; according to him, Google is “liberating” books for all, allowing everyone to benefit from easier, freer access of ideas (Libraries and Access, para. 2). Thus he looks towards the future, believing in the positive results Google’s project will theoretically provide to all. Carvajal (2007), though, is concerned with the present, or at least the near future. As summarized above, she notes how European libraries and similar institutions feel threatened by Google rather than happy about what they are doing. To them, Google is their rival, a competitor in providing not only digital resources but in dictating ideas and marking international differences for future ages. As such, the two articles represents various parties involved in the aftermath of Google’s project: national and international, future and present, companies and libraries, to name a few. Such a reading is fascinating – digitization has become a greater issue than I have imagined, already dividing people up on how to use it.

Smith, C. E. (2008, January-March). A Few Thoughts on the Google Books Library Project. EDUCAUSE Quarterly, 31(1), 10-11. Retrieved from http://www.educause.edu/ero/article/few-thoughts-google-books-library-project

C. E. Smith (2008) begins his article by noting how he has noticed that a good number of people (who remains in question) are concerned over “Google’s initiative to scan thousands of books in major research libraries and make them available online” (Introduction, para.1). He, however, is not worried but enthusiastic about Google’s attempts. To him, such digitization will ensure that print materials will not become obsolete (Smith, 2008, Making the Past Accessible, para. 1), a wider range of people all over the country will have equal access to academic works (Smith, 2008, Libraries and Access, para. 2), and – since “it is, after all, the ideas that are essential” – the knowledge contained in these works will survive their former print forms and live on in the digital world (Smith, 2008, Conquering the Pre- and Post-Internet Digital Divide).

I agree with his sentiments; Google is doing a service in providing relatively free access to such works. Smith’s wording concerning print resources and archives, though, annoys me. He states that if digital copies did not exist and only one option remained, “to go where the “old stuff” is kept, in archives somewhere in the basement or a dark attic,” users will seek the sources reluctantly (Smith, 2008, Conquering the Pre- and Post-Internet Digital Divide, para. 1). In a couple of sentences, he disregards the importance of the print sources and implies that archives are isolated, out-of-date, merely storage areas. Print still has its uses. It provides the original context of the sources, which can be overlooked as a digital work. Also, as a future archivist, I view archives in a way contrasting Smith’s assumptions; while archives do store “old stuff,” they also possess wide range of materials, including current holdings and documents, and analyze the works they have.

Vaughan, J. (2005). Lied Library @ four years: technology never stands still. Library Hi Tech, 23(1), 34-49. doi:10.1108/07378830510586685#sthash.yZxRT7k6.dpuf

J. Vaughan writes his case study on the Lied Library, which has been in existence for four years at the time of the publication. Describing the library as a leading institution among other academic libraries expert in technologies (Vaughan, 2005, p. 34), he outlines its origins and developments over the past few years. He also includes an in-depth analysis of the problems it faces currently, including finding ways to pay the expenses for maintaining and updating the systems (Vaughan, 2005, p. 40), calculating resource management, controlling the spread of malicious hardware, manipulating physical spaces, and dealing with security, hardware, and software problems (Vaughan, 2005, p. 40). At the end of the article, Vaughan (2005) evaluates what he believes is the future for the Lied library, citing such possible future problems as providing computer resources to all users (p. 47) or the change in leadership when the dean retired (p. 48).

While I was reading the article, I felt that Vaughan put too much emphasis on his enthusiasm for Lied Library. I understand why he did so. It was his case study, so he would be positive about it. Additionally, I could be reading too much into it; the article is all about the library, providing it as a model for managing and organizing other libraries. How he describes it, though, is a little too much. He begins the main body of his article with “Given the title of this paper, it is appropriate to illustrate that technology never stands still, and such has been the reality at Lied Library” (Vaughan, 2005, p. 35) which reminds me a lot of old propaganda and newsreels. When he admits of any problems that occur, he is quick to mitigate them. For example, concerning security issues, Vaughan (2005) tries to lower concerns by explaining that the Systems area, computer room, and wiring closets remain safe (p. 44) and that, for the four years it has been open, the library has experienced little theft problems but “the goal is to have zero theft” (p. 45). He appears to be deflecting any thought of serious problems, trying to convince readers that the Lied Library is the best of its kind.

Background Article

IFLA Guidelines for Digitization Projects (2002)

The text provided gives guidelines that would “identify and discuss the key issues involved in the conceptualization, planning and implementation of a digitization project, with recommendations for “best practice” to be followed at each stage of the process” (McIlwaine 5). The guidelines seem to support this; it is full of details and covers many components of how to do a digital imaging project which I believe will benefit me if I should over tackle such a project. What caught my attention in the introduction was its explanation that the guidelines were collected for libraries and archives themselves alone (McIlwaine 6). Could only libraries and archives use them? At that, are they the only institutions who could use them to their full capacity? Maybe other complexes – not just IS institutions – could benefit from the guidelines. The world is becoming more technological, practically concerning digital devices. Digital imaging is not only a tool for the IS field, but could benefit others – companies, law firms, etc.